GSoC 2019: It’s Over!

6 minute read

Published:

It’s the end of August, and Google Summer of Code 2019 is over. This blog post is meant to outline what I’ve accomplished, what I’ve failed to accomplish, what I’ve learned, and how I’ve felt over these last 4 months.

My Project

We’ve (I say we because you, dear reader, have been part of this journey just as much as I have) been working to add ODE capabilities to PyMC3. In principle, this doesn’t sound so hard. I mean, all we need to do is solve a different ODE related to our first ODE and then plug the result into a larger ODE so that NUTS can sample from the posterior (yo dawg, we heard you like ODEs). Using Theano to compute gradients isn’t so bad. We used Theano to compute the gradients of an ODE and then used gradient descent to fit that ODE to data. From there, it was easy to plop our code into an existing ODE inference notebook, wrap it in a class, and then sample. All of that happened May/June of this year.

When July came, we spent most of our time writing the API and writing tests. Personally, I learned a lot more in July than I learned in the previous months, mostly because OOP and writing unit tests were completely new to me. I also got way better at git **.

Then August came, we wrapped everything in a pretty bow and made a PR!

A lot of the work leading up to the PR was done in this repo. This really wasn’t meant for people to go through, but you’re free to poke around and see what I was up to during the summer. In that repo are the very beginnings of DifferentialEquation, the meat of my contribution.

About DifferentialEquation

DifferentialEquation is a Theano Op which computes the gradients of ODEs using the forward sensitivity analysis and computes solutions to the ODE via numerical integration. You use it in the same way that you use other PyMC3 functions to build models. Here is a notebook I wrote on how to use DifferentialEquation for both linear/non-linear scalar/vector ODEs.

All you have to do is:

  • Write your differential equation as a function in Python
  • Pass that function to DifferentialEquation
  • Tell DifferentialEquation at what times your data was observed, and
  • Tell DifferentialEquation how many parameters and states your system has.

Then, go build your model! The notebook I’ve linked to above is just the first step in a series of notebooks I have planned. Soon, I’d like to add a notebook on how DifferentialEquation can be used to estimate population pharamcokinetics.

Biggest Challenges

I think I was pretty clear about this in the first blog post; I was not certain I was going to succeed (but that is kind of the reason I stuck around). I was incredibly shocked to hear I got the gsoc position, and then was incredibly intimidated by the material I had to learn. The math was not the hard part (it was actually the most fun. Surprise, I’m a math nerd), but translating math to code is not always the easiest for me. I remember sitting in a Starbucks fiddling around with autograd (after reading Colin Carroll’s great blog posts on MCMC) trying to compute the sensitivities (i.e. gradients) of a tiny little ODE. I just didn’t think I understood it well enough to do it, but when I did compute those sensitivities, I was over the moon. I felt like I had actually learned something; like I had achieved a higher understanding. It was an amazing feeling.

After that, things got harder. Python is a tool for me, and so I don’t usually write classes, let alone design APIs. When I was writing the API for DifferentialEquation, I ran into a bug so infuriating that I decided to scrap the whole thing and start again. To anyone reading this, this is a legitimate way of debugging, do not @ me.

I thought once August hit I was in the home stretch, but I could not have been more wrong. Besides the various changes to the code from the PR review (which I expected), Michael (my mentor) and I noticed a problem with DifferentialEquation returning a tensor of the wrong shape. This lead to a week of poking and prodding at the API, trying to find exactly what was wrong with it. This lead to a lot of learning on my part (especially about Theano, as well as python setup.py develop why did no one tell me about that?). We eventually decided the change was not worth it at the moment, but will likely return to it at a later date.

I faced a lot of stuff I was not sure I would overcome, but luckily, I had great support. My mentor, Michael Osthege, never got frustrated with me. He always always understanding when I was having a rough time, super flexible to meet me even in the dead of the German night, and was all around a great person to show me the ropes. Thanks to Colin Carroll and Junpeng Lao for the kind words of encouragement through twitter, and thank you to the remainder of the PyMC3 devs for welcoming me into the community.

Would I Do GSoC Again?

I won’t lie, gsoc was capital H Hard. I think my project was pretty ambitious, and it might have been better to propose we create the beginnings of the API instead of the API itself. I faced a lot of stress, but I think it was worth it. I set out a goal to contribute to open source when I started my PhD. That initially meant typos, then documentation, but I didn’t anticipate contributing functionality on this level. It’s a cool feeling, and I think I would want to do it again.