Bayesian Inference for Differential Equations: A Google Summer of Code Project

4 minute read


Do Stuff that Scares You

In the beginning of March 2019, Thomas Wiecki made a tweet about non-linear ODEs and Bayesian inference in PyMC3. I replied basically saying “hey, I like your stuff, and I want to help. Here is the catch, I’m a bit shit at being a dev. How do I move forward?”.

I received some helpful hints from Colin Carroll and Dan Simpson about making small commits to libraries and checking out unit tests. I also received a reply from the PyMC developers account asking if I had considered applying to GSoC 2019.

I hadn’t considered applying, because anything with Google in the name is an auto no for me (mostly because I know I am going to be competing with very qualified candidates and I hate competition. It scares me a little). I took a look at the proposed projects and there it was – ODE capabilities in PyMC3 – the perfect project for someone like me. Sometimes, you just feel compelled to do stuff, you know? So I emailed the listed mentors, made a submission, learned I was accepted, recovered from the shock, tweeted out my excitement, and the rest is history.

NumFOCUS asks politely that students make blog submissions about their work biweekly. This is intended to the first post in a long sequence of posts detailing my struggles, successes, and what I have learned. I don’t have comments enabled on the blog (mostly because I didn’t think anyone would ever read it), but feel free to reach out on twitter (because that is where I spend most of my time) and ask about anything that grabs your attention.

Now, a little about the work…

Bayesianism and Differential Equations?

I am a once and future applied mathematician. The bread and butter of applied math is the ODE. I’ve taken more courses on ODE/PDE than I have linear algebra, calculus, or probability. I’ve also published a couple of papers where they were the main tool of investigation.

One thing I’ve noticed in retrospect is that applied mathematicians rarely care about parameter estimation and uncertainty in those estimates. We’re usually characterizing bifurcations, or extending models, or seeing how interventions of a particular type change the dynamics of the system. Only later do we care about getting data and fitting our models.

When we do fit our models, it is usually via a least squares procedure. That works great for 99% of the problems applied mathematicians work on.

But my PhD thesis is part of that 1%.

I work with drug concentrations in patients. We often model drug metabolism as an ODE. You can imagine observing 10 patients, each with 5 concentration measurements after ingesting a drug. My goal is to understand how each patient metabolizes that drug so that I can make personalized doses.

Sounds easy, right? With that dataset of 50 observations, do a least squares fit. Well…it isn’t that easy. There is a lot of heterogeneity at the patient level, and if I were to use all the data to fit a model, I understand more about the population than I do about any given patient. In essence I see the forest for the trees when I really need to see the trees for the forest.

So why not go the other way – do a least squares fit for each patient. This might work if sampling was cheap and easy, but it isn’t. No one wants to sit in a hospital bed and get their blood drawn 5 times in a day. So obtaining data for new patients would be incredibly difficult if I were to take this approach.

So I’m a bit stuck as it stands. I can’t lump the data together because I lose the details of the individual, but focusing on the individual is expensive and impractical. So how do I learn how an individual patient metabolizes a drug?

Clever readers will recognize the approaches above as compelete-pooling and no-pooling. What I really need to do in order to personalize a dose is to use partial-pooling, and that means going Bayesian (ok, not necessarily, but you can tweet at me to ask why).

So there is a clear need to do Bayesian inference with ODEs. Currently PyMC3 doesn’t have that capability (Stan has it), and it really needs it.

I’m really excited to start working on this project. I feel like I’m going to struggle a lot, but that just means I am going to learn a lot. Struggle is nothing but the act of learning.

Keep on the look for another blog post coming out in around 2 weeks time. I’ll probably write about my meeting with my mentor and how I am integrating into the community.