Jekyll2019-07-09T06:48:34-07:00https://dpananos.github.io/feed.xmlPh.DemetriApplied Mathematician. Statistician. Data Scientist.Demetri Pananosapananos@uwo.caBayes Workshop2019-07-09T00:00:00-07:002019-07-09T00:00:00-07:00https://dpananos.github.io/posts/2019/07/blog-post-18<p>Link is <a href="https://drive.google.com/drive/folders/1XygO5U21VHObuuZty7aWJWuLD4vSTkKa?usp=sharing">here</a>.</p>Demetri Pananosapananos@uwo.caLink is here.GSoC 2019: Designing an API2019-07-08T00:00:00-07:002019-07-08T00:00:00-07:00https://dpananos.github.io/posts/2019/07/blog-post-17<p>Let’s take stock of where we are on our journey to add ODE capabilities on PyMC3.</p>
<ul>
<li>
<p>We know that HMC requires derivatives of the log likelihood, which means taking derivatives of the differential equation’s solution with repect to it’s parameters. That sounds hard, but we’ve discovered that it is actually as easy as solving more differential equations. The method by which we compute derivatives of the differential equation’s solution is called the “Forward Sensitivity Method”.</p>
</li>
<li>
<p>We’ve done a little proof of concept for the forward sensitivity method. We wrote a computation graph in theano to compute these derivatives via automatic differentiation, and then used those derivatives to do gradient descent. Since we successfully fit our differential equation to data, we become more confident in our approach.</p>
</li>
<li>
<p>We’ve dropped our computation graph into an existing notebook and were very surprised that the model sampled. We estimated the correct parameter value, the Gelman-Rubin statistic indicated our chains converged, and the number of effective samples was quite large. All in all, a great success.</p>
</li>
</ul>
<p>Now, we are at the point where we have to think about generalizing, and honestly, I’ve been dreading this point.</p>
<p>It is not news that I am a <em>bad</em> software engineer (in fact, that is precisely the reason I did GSoC). So, I am going to lean very heavily on you, dear reader, to help me through this. I’m going to be very open with what I am thinking, and I hope that you will chime in when neccesary to tell me that things can be done better, faster, safer.</p>
<p>So, here we go.</p>
<h1 id="thinking-about-an-api">Thinking About an API</h1>
<p>I don’t think an entire blog post outlining how the API works is interesting. Instead, I’m going to highlight some relevant bits, and if you have questions, you can ask me directly.</p>
<p>I’m keeping all my code in <a href="https://github.com/Dpananos/ODEGSoC/blob/master/Scripts/ode_api.py">this</a> repo. The API should take as an entry a function which computes the appropriate derivatives for the differential equation given the system’s current state, time, and parameters. Then, the <a href="https://github.com/Dpananos/ODEGSoC/blob/master/Scripts/ode_api.py#L151">computation graph</a> I wrote will be able to compute a theano compiled function which returns an augmented system (that is, the derivatives for the system and the derivatives to be used in the forward sensitivity method).</p>
<p>Before we begin integrating the augmented system, we need to construct appropriate initial conditions. In order to do that, we need to understand how the initial conditions look for the forward sensitivity method. For a differential equation</p>
<script type="math/tex; mode=display">y' = f(y,t,p)</script>
<p>with sensitivities defined by</p>
<script type="math/tex; mode=display">\dfrac{d}{dt} \left( \dfrac{\partial y}{\partial p} \right) = \dfrac{\partial f}{\partial y} \dfrac{\partial y}{\partial p} + \dfrac{\partial f}{\partial p}</script>
<p>the sensitivities (the derivative of the ODE’s solution with respect to the parameters) is a matrix, namely</p>
<script type="math/tex; mode=display">\left( \dfrac{\partial y}{\partial p} \right)_{i,j} = \dfrac{\partial y_i}{\partial p_j}</script>
<p>I tried to get my website to show the actual matrix version of the above, but no dice. Note, if $p_j$ is a model parameter, then $\partial y_i / \partial p_j = 0$. If $p_j$ is an initial condition for one of the equations in our system (i.e. $y_i(t_0) = p_j$), then the $\partial y_i / \partial p_j = 1$ (can you see why)? If we arrange the columns of $\partial y / \partial p$ so that all the model parameters are adjacent, and all the initial conditions are adjacent, that means that the identity matrix will appear somewhere in our initial condition for $\partial y / \partial p $. I’ve written a little <a href="https://github.com/Dpananos/ODEGSoC/blob/master/Scripts/ode_api.py#L30">helper method</a> to make this initial condition and then flatten it (it needs to be flattened in order to be passed to the integrator).</p>
<p>The notebook I posted 2 weeks ago used a “cached simulator” to avoid having to integrate the system twice for the same parameters in the forward and backward pass, thereby increasing the speed of the computation. I’ve made that <a href="https://github.com/Dpananos/ODEGSoC/blob/master/Scripts/ode_api.py#L87">a part of the API</a> so it is, more or less, automatic. Small digression, for some particular reason this was the HARDEST thing to implement. I struggled for more than I care to admit. To fix whatever bug I was experiencing, I rewrote the entire API in a fit of rage and frustration.</p>
<p><a href="https://github.com/Dpananos/ODEGSoC/blob/master/Scripts/test_scalar_ode_1_param.ipynb">Here</a> is a notebook with a small example showing the API working. There are a few changes I plan to make that I hope will result in some speed ups, but as it stands all you need to do is: pass your ODE function, pass the initial time that your system evolves forward from, pass an array of times you would like to evaluate your ODE at, pass the number of states your system has, and pass the number of parameters your model has (pure parameters, not the number of parameters plus the number of initial conditions you plan on estimating).</p>Demetri Pananosapananos@uwo.caLet’s take stock of where we are on our journey to add ODE capabilities on PyMC3.GSoC 2019: A Sampling Notebook2019-06-24T00:00:00-07:002019-06-24T00:00:00-07:00https://dpananos.github.io/posts/2019/06/blog-post-16<p>OK, I will keep this one short and sweet, no math. We have a sampling notebook.</p>
<p>This isn’t completely new. The notebook I wrote is based, almost entirely, on <a href="https://github.com/pymc-devs/pymc3/blob/master/docs/source/notebooks/ODE_parameter_estimation.ipynb">this</a> notebook by Sanmitra Ghosh. Sanmitra computes the requisite gradient and Jacobian analytically. If you remember, my last blog post showed how to construct a Theano computation graph to compute those quantities automatically. This + That = Sampling Notebook (did I say no math? Wow, how long did I last? Not even a paragraph? I’m awful).</p>
<p><a href="https://github.com/Dpananos/ODEGSoC/blob/master/Notebooks/Sampling%20SIR%20model.ipynb">Here is my notebook</a>. But don’t snoop around the rest of the repo! You’re going to ruin all the fun surprises I have in store (and you will definitely see my buggy code).</p>
<p>It really was a matter of plug and play, and if you are really in need of ODE capabilities on Pymc3 (like…you need them yesterday), I’m fairly certain you can fork this notebook and run with it.</p>
<p>I’m currently working on putting this all in an API. I’m thinking you would do something like</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="n">ode_model</span> <span class="o">=</span> <span class="n">ODEModel</span><span class="p">(</span><span class="n">odefunc</span><span class="p">)</span>
<span class="n">ode_op</span> <span class="o">=</span> <span class="n">ODEop</span><span class="p">(</span><span class="n">ode_model</span><span class="p">)</span>
<span class="k">with</span> <span class="n">pm</span><span class="o">.</span><span class="n">Model</span><span class="p">()</span> <span class="k">as</span> <span class="n">model</span><span class="p">:</span>
<span class="c">#Write priors here</span>
<span class="n">solution</span> <span class="o">=</span> <span class="n">ode_op</span><span class="p">(</span><span class="n">parameters</span><span class="p">)</span>
<span class="c">#Write likelihood</span>
</code></pre></div></div>
<p>There are still some bugs to work out in my API (most notably working with asynchronous data), but I will save that for a later date. For now, feel free to fork and play with the notebook.</p>Demetri Pananosapananos@uwo.caOK, I will keep this one short and sweet, no math. We have a sampling notebook.GSoC 2019: Gradient Descent for ODEs (But This Time, In Theano)2019-06-10T00:00:00-07:002019-06-10T00:00:00-07:00https://dpananos.github.io/posts/2019/05/blog-post-15<p>A little while ago, I wrote a post on doing gradient descent for ODEs. In that post, I used <code class="highlighter-rouge">autograd</code> to do the automatic differentiation. While neat, it was really a way for me to get familiar with some math that I was to use for GSoC. After taking some time to learn more about <code class="highlighter-rouge">theano</code>, I’ve reimplemented the blog post, this time using <code class="highlighter-rouge">theano</code> to perform the automatic differentiation. If you’re read the previous post, then skip right to the code.</p>
<p>Gradient descent usually isn’t used to fit Ordinary Differential Equations (ODEs) to data (at least, that isn’t how the Applied Mathematics departments to which I have been a part have done it). Nevertheless, that doesn’t mean that it can’t be done. For some of my recent GSoC work, I’ve been investigating how to compute gradients of solutions to ODEs without access to the solution’s analytical form. In this blog post, I describe how these gradients can be computed and how they can be used to fit ODEs to synchronous data with gradient descent.</p>
<h2 id="up-to-speed-with-odes">Up To Speed With ODEs</h2>
<p>I realize not everyone might have studied ODEs. Here is everything you need to know:</p>
<p>A differential equation relates an unknown function $y \in \mathbb{R}^n$ to it’s own derivative through a function $f: \mathbb{R}^n \times \mathbb{R} \times \mathbb{R}^m \rightarrow \mathbb{R}^n$, which also depends on time $t \in \mathbb{R}$ and possibly a set of parameters $\theta \in \mathbb{R}^m$. We usually write ODEs as</p>
<script type="math/tex; mode=display">y' = f(y,t,\theta) \quad y(t_0) = y_0</script>
<p>Here, we refer to the vector $y$ as “the system”, since the ODE above really defines a system of equations. The problem is usually equipped with an initial state of the system $y(t_0) = y_0$ from which the system evolves forward in $t$. Solutions to ODEs in analytic form are often <em>very hard</em> if not impossible, so most of the time we just numerically approximate the solution. It doesn’t matter how this is done because numerical integration is not the point of this post. If you’re interested, look up the class of <em>Runge-Kutta</em> methods.</p>
<h2 id="computing-gradients-for-odes">Computing Gradients for ODEs</h2>
<p>In this section, I’m going to be using derivative notation rather than $\nabla$ for gradients. I think it is less ambiguous.</p>
<p>If we want to fit an ODE model to data by minimizing some loss function $\mathcal{L}$, then gradient descent looks like</p>
<script type="math/tex; mode=display">\theta_{n+1} = \theta_n - \alpha \dfrac{\partial \mathcal{L}}{\partial \theta}</script>
<p>In order to compute the gradient of the loss, we need the gradient of the solution, $y$, with respect to $\theta$. The gradient of the solution is the hard part here because it can not be computed (a) analytically (because analytic solutions are hard AF), or (b) through automatic differentiation without differentiating through the numerical integration of our ODE (which seems computationally wasteful).</p>
<p>Thankfully, years of research into ODEs yields a way to do this (that is not the adjoint method. Surprise! You thought I was going to say the adjoint method didn’t you?). Forward mode sensitivity analysis calculates gradients by extending the ODE system to include the following equations:</p>
<script type="math/tex; mode=display">\dfrac{d}{dt}\left( \dfrac{\partial y}{\partial \theta} \right) = \mathcal{J}_f \dfrac{\partial y}{\partial \theta} +\dfrac{\partial f}{\partial \theta}</script>
<p>Here, $\mathcal{J}$ is the Jacobian of $f$ with respect to $y$. The forward sensitivity analysis is <em>just another differential equation</em> (see how it relates the derivative of the unknown $\partial y / \partial \theta$ to itself?)! In order to compute the gradient of $y$ with respect to $\theta$ at time $t_i$, we compute</p>
<script type="math/tex; mode=display">\dfrac{\partial y}{\partial \theta} = \int_{t_0}^{t_i} \mathcal{J}_f \dfrac{\partial y}{\partial \theta} + \dfrac{\partial f}{\partial \theta} \, dt</script>
<p>I know this looks scary, but since forward mode sensitivities are just ODEs, we actually just get this from what we can consider to be a black box</p>
<script type="math/tex; mode=display">\dfrac{\partial y}{\partial \theta} = \operatorname{BlackBox}(f(y,t,\theta), t_0, y_0, \theta)</script>
<p>So now that we have our gradient in hand, we can use the chain rule to write</p>
<script type="math/tex; mode=display">\dfrac{\partial \mathcal{L}}{\partial \theta} =\dfrac{\partial \mathcal{L}}{\partial y} \dfrac{\partial y}{\partial \theta}</script>
<p>We can use automatic differentiation to compute $\dfrac{\partial \mathcal{L}}{\partial y}$.</p>
<p>OK, so that is some math (interesting to me, maybe not so much to you). Let’s actually implement this in python.</p>
<h2 id="gradient-descent-for-the-sir-model">Gradient Descent for the SIR Model</h2>
<p>The SIR model is a set of differential equations which govern how a disease spreads through a homogeneously mixed closed populations. I could write an entire thesis on this model and its various extensions (in fact, I have), so I’ll let you read about those on your free time.</p>
<p>The system, shown below, is parameterized by a single parameter:</p>
<script type="math/tex; mode=display">\dfrac{dS}{dt} = -\theta SI \quad S(0) = 0.99</script>
<script type="math/tex; mode=display">\dfrac{dI}{dt} = \theta SI - I \quad I(0) = 0.01</script>
<p>Let’s define the system, the appropriate derivatives, generate some observations and fit $\theta$ using gradient descent. Here si what you’ll need to get started:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">theano</span>
<span class="kn">import</span> <span class="nn">theano.tensor</span> <span class="k">as</span> <span class="n">tt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">scipy.integrate</span>
</code></pre></div></div>
<p>Let’s then define the ODE system</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">f</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">theta</span><span class="p">):</span>
<span class="s">""""This is the ODE system.
The function can act on either numpy arrays or theano TensorVariables
Args:
y (vector): system state
t (float): current time (optional)
theta (vector): parameters of the ODEs
Returns:
dydt (list): result of the ODEs
"""</span>
<span class="k">return</span> <span class="p">[</span>
<span class="o">-</span><span class="n">theta</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="c">#= dS/dt</span>
<span class="n">theta</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="c">#= dI/dt</span>
<span class="p">]</span>
</code></pre></div></div>
<p>and create a computation graph with <code class="highlighter-rouge">theano</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#Define the differential Equation</span>
<span class="c">#Present state of the system</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">tt</span><span class="o">.</span><span class="n">dvector</span><span class="p">(</span><span class="s">'y'</span><span class="p">)</span>
<span class="c">#Parameter: Basic reproductive ratio</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">tt</span><span class="o">.</span><span class="n">dscalar</span><span class="p">(</span><span class="s">'p'</span><span class="p">)</span>
<span class="c">#Present state of the gradients: will always be 0 unless the parameter is the inital condition</span>
<span class="n">dydp</span> <span class="o">=</span> <span class="n">tt</span><span class="o">.</span><span class="n">dvector</span><span class="p">(</span><span class="s">'dydp'</span><span class="p">)</span>
<span class="n">f_tensor</span> <span class="o">=</span> <span class="n">tt</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">f</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="n">p</span><span class="p">))</span>
<span class="c">#Now compute gradients</span>
<span class="c">#Use Rop to compute the Jacobian vector product</span>
<span class="n">Jdfdy</span> <span class="o">=</span> <span class="n">tt</span><span class="o">.</span><span class="n">Rop</span><span class="p">(</span><span class="n">f_tensor</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">dydp</span><span class="p">)</span>
<span class="n">grad_f</span> <span class="o">=</span> <span class="n">tt</span><span class="o">.</span><span class="n">jacobian</span><span class="p">(</span><span class="n">f_tensor</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span>
<span class="c">#This is the time derivative of dydp</span>
<span class="n">ddt_dydp</span> <span class="o">=</span> <span class="n">Jdfdy</span> <span class="o">+</span> <span class="n">grad_f</span>
<span class="c">#Compile the system as a theano function</span>
<span class="c">#Args:</span>
<span class="c">#y - array of length 2 representing current state of the system (S and I)</span>
<span class="c">#dydp - array of length 2 representing current state of the gradient (dS/dp and dI/dp)</span>
<span class="n">system</span> <span class="o">=</span> <span class="n">theano</span><span class="o">.</span><span class="n">function</span><span class="p">(</span>
<span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="n">y</span><span class="p">,</span> <span class="n">dydp</span><span class="p">,</span> <span class="n">p</span><span class="p">],</span>
<span class="n">outputs</span><span class="o">=</span><span class="p">[</span><span class="n">f_tensor</span><span class="p">,</span> <span class="n">ddt_dydp</span><span class="p">]</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Next, we’ll define the augmented system (that is, the ODE plus the sensitivities).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#create a function to spit out derivatives</span>
<span class="k">def</span> <span class="nf">ODESYS</span><span class="p">(</span><span class="n">Y</span><span class="p">,</span><span class="n">t</span><span class="p">,</span><span class="n">p</span><span class="p">):</span>
<span class="s">"""
Args:
Y (vector): current state and gradient
t (scalar): current time
p (vector): parameters
Returns:
derivatives (vector): derivatives of state and gradient
"""</span>
<span class="n">dydt</span><span class="p">,</span> <span class="n">dydp</span> <span class="o">=</span> <span class="n">system</span><span class="p">(</span><span class="n">Y</span><span class="p">[:</span><span class="mi">2</span><span class="p">],</span> <span class="n">Y</span><span class="p">[</span><span class="mi">2</span><span class="p">:],</span> <span class="n">p</span><span class="p">)</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">dydt</span><span class="p">,</span> <span class="n">dydp</span><span class="p">])</span>
</code></pre></div></div>
<p>We’ll optimize the $L_2$ norm of the error. This is done in <code class="highlighter-rouge">theano</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#Tensor for observed data</span>
<span class="n">t_y_obs</span> <span class="o">=</span> <span class="n">tt</span><span class="o">.</span><span class="n">dmatrix</span><span class="p">(</span><span class="s">'y_obs'</span><span class="p">)</span>
<span class="c">#Tensor for predictions</span>
<span class="n">t_y_pred</span> <span class="o">=</span> <span class="n">tt</span><span class="o">.</span><span class="n">dmatrix</span><span class="p">(</span><span class="s">'y_pred'</span><span class="p">)</span>
<span class="c">#Define error and cost</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">t_y_obs</span> <span class="o">-</span> <span class="n">t_y_pred</span>
<span class="n">Cost</span> <span class="o">=</span> <span class="n">err</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="n">axis</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="nb">sum</span><span class="p">()</span>
<span class="n">Cost_gradient</span> <span class="o">=</span> <span class="n">theano</span><span class="o">.</span><span class="n">tensor</span><span class="o">.</span><span class="n">grad</span><span class="p">(</span><span class="n">Cost</span><span class="p">,</span><span class="n">t_y_pred</span><span class="p">)</span>
<span class="n">C</span> <span class="o">=</span> <span class="n">theano</span><span class="o">.</span><span class="n">function</span><span class="p">(</span><span class="n">inputs</span> <span class="o">=</span> <span class="p">[</span><span class="n">t_y_obs</span><span class="p">,</span> <span class="n">t_y_pred</span><span class="p">],</span> <span class="n">outputs</span><span class="o">=</span><span class="n">Cost</span><span class="p">)</span>
<span class="n">del_C</span> <span class="o">=</span> <span class="n">theano</span><span class="o">.</span><span class="n">function</span><span class="p">(</span><span class="n">inputs</span> <span class="o">=</span> <span class="p">[</span><span class="n">t_y_obs</span><span class="p">,</span> <span class="n">t_y_pred</span><span class="p">],</span> <span class="n">outputs</span><span class="o">=</span><span class="n">Cost_gradient</span><span class="p">)</span>
</code></pre></div></div>
<p>Create some observations from which to fit</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#Initial Condition</span>
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">19920908</span><span class="p">)</span>
<span class="n">Y0</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">0.99</span><span class="p">,</span><span class="mf">0.01</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">])</span>
<span class="c">#Space to compute solutions</span>
<span class="n">t_dense</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">101</span><span class="p">)</span>
<span class="c">#True param value</span>
<span class="n">p_true</span> <span class="o">=</span> <span class="mi">8</span>
<span class="n">y_hat_theano</span> <span class="o">=</span> <span class="n">scipy</span><span class="o">.</span><span class="n">integrate</span><span class="o">.</span><span class="n">odeint</span><span class="p">(</span><span class="n">ODESYS</span><span class="p">,</span> <span class="n">y0</span><span class="o">=</span><span class="n">Y0</span><span class="p">,</span> <span class="n">t</span><span class="o">=</span><span class="n">t_dense</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="n">p_true</span><span class="p">,))</span>
<span class="n">S_obs</span><span class="p">,</span><span class="n">I_obs</span><span class="p">,</span> <span class="o">*</span><span class="n">_</span> <span class="o">=</span> <span class="n">y_hat_theano</span><span class="o">.</span><span class="n">T</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mf">0.1</span><span class="p">,</span><span class="n">size</span> <span class="o">=</span> <span class="n">y_hat_theano</span><span class="o">.</span><span class="n">T</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">y_obs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">c_</span><span class="p">[</span><span class="n">S_obs</span><span class="p">,</span><span class="n">I_obs</span><span class="p">]</span>
</code></pre></div></div>
<p>Perform Gradient Descent</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="n">p_gd</span> <span class="o">=</span> <span class="mf">1.1</span>
<span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">0.01</span>
<span class="n">num_steps</span> <span class="o">=</span> <span class="mi">1000</span>
<span class="n">prev_cost</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="s">'inf'</span><span class="p">)</span>
<span class="n">tol</span> <span class="o">=</span> <span class="mf">1e-5</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="c">#Evaluate solution</span>
<span class="n">y_hat_theano</span> <span class="o">=</span> <span class="n">scipy</span><span class="o">.</span><span class="n">integrate</span><span class="o">.</span><span class="n">odeint</span><span class="p">(</span><span class="n">ODESYS</span><span class="p">,</span> <span class="n">y0</span><span class="o">=</span><span class="n">Y0</span><span class="p">,</span> <span class="n">t</span><span class="o">=</span><span class="n">t_dense</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="n">p_gd</span><span class="p">,))</span>
<span class="c">#Splice out the numerical solution and numerical gradients</span>
<span class="n">y_pred</span> <span class="o">=</span> <span class="n">y_hat_theano</span><span class="p">[:,:</span><span class="mi">2</span><span class="p">]</span>
<span class="n">gradients</span> <span class="o">=</span> <span class="n">y_hat_theano</span><span class="p">[:,</span><span class="mi">2</span><span class="p">:]</span>
<span class="c">#Perform the gradient step</span>
<span class="n">p_gd</span><span class="o">-=</span> <span class="n">learning_rate</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">del_C</span><span class="p">(</span><span class="n">y_obs</span><span class="p">,</span><span class="n">y_pred</span><span class="p">)</span><span class="o">*</span><span class="n">gradients</span><span class="p">)</span>
<span class="c">#Has the loss changed a large amount?</span>
<span class="n">cost</span> <span class="o">=</span> <span class="n">C</span><span class="p">(</span><span class="n">y_obs</span><span class="p">,</span><span class="n">y_pred</span><span class="p">)</span>
<span class="c">#If so, keep going. Stop when the loss stops shrinking</span>
<span class="k">if</span> <span class="nb">abs</span><span class="p">(</span><span class="n">cost</span> <span class="o">-</span> <span class="n">prev_cost</span><span class="p">)</span><span class="o"><</span><span class="n">tol</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">prev_cost</span> <span class="o">=</span> <span class="n">cost</span>
</code></pre></div></div>
<p>And lastly, compare our fitted curves to the true curves</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">dpi</span> <span class="o">=</span> <span class="mi">120</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">t_dense</span><span class="p">,</span><span class="n">y_obs</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.5</span><span class="p">,</span> <span class="n">marker</span> <span class="o">=</span> <span class="s">'.'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">t_dense</span><span class="p">,</span><span class="n">y_obs</span><span class="p">[:,</span><span class="mi">1</span><span class="p">],</span><span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.5</span><span class="p">,</span> <span class="n">marker</span> <span class="o">=</span> <span class="s">'.'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">t_dense</span><span class="p">,</span><span class="n">y_hat_theano</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'k'</span><span class="p">,</span> <span class="n">zorder</span> <span class="o">=</span> <span class="mi">100</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">t_dense</span><span class="p">,</span><span class="n">y_hat_theano</span><span class="p">[:,</span><span class="mi">1</span><span class="p">],</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'k'</span><span class="p">,</span> <span class="n">zorder</span> <span class="o">=</span> <span class="mi">100</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">t_dense</span><span class="p">,</span><span class="n">y_hat_analytical</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="n">linewidth</span> <span class="o">=</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">t_dense</span><span class="p">,</span><span class="n">y_hat_analytical</span><span class="p">[:,</span><span class="mi">1</span><span class="p">],</span> <span class="n">linewidth</span> <span class="o">=</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre></div></div>
<p>Here is our result!</p>
<div style="text-align:center"><img src="/images/blog/final.png" /></div>Demetri Pananosapananos@uwo.caA little while ago, I wrote a post on doing gradient descent for ODEs. In that post, I used autograd to do the automatic differentiation. While neat, it was really a way for me to get familiar with some math that I was to use for GSoC. After taking some time to learn more about theano, I’ve reimplemented the blog post, this time using theano to perform the automatic differentiation. If you’re read the previous post, then skip right to the code.GSoC 2019: ODE Inception2019-05-21T00:00:00-07:002019-05-21T00:00:00-07:00https://dpananos.github.io/posts/2019/05/blog-post-13<p>Let’s take stock of exactly where we are in this journey it implement HMC for differential equations.</p>
<p>To do HMC, we concatenate all the parameters we wish to estimate into a single vector (call it $\theta$). Then, create a momentum variable (call it $p$ for some weird reason. Why do physicists denote MMMMMMomentum with $p$?) of the same dimension as $\theta$ and consider the joint distribution</p>
<script type="math/tex; mode=display">\pi(\theta,p) = \pi(p \vert \theta)\pi(\theta)</script>
<p>We choose a momentum distribution for $p$ and then use The Hamiltonian to define a set of differential equations governing the dynamics of an imaginary particle on some high dimensional surface. The differential equations are</p>
<script type="math/tex; mode=display">\dfrac{d\theta}{dt} = p</script>
<script type="math/tex; mode=display">\dfrac{dp}{dt} = -\dfrac{\partial V}{\partial \theta}</script>
<p>Here, $V = \log(\pi(\theta))$, the log density of our parameters. Given some initial random momentum, the ODEs above say that the particle’s rate of change per unit time is equivalent to the momentum, but the momentum’s rate of change depends on the shape of the surface. <em>So the momentum governs the trajectory in time over the surface, and the surface governs how the momentum will change in time</em>. Yes, that certainly sounds like a differential equation problem.</p>
<p>I won’t get into how we integrate this system. I think <a href="https://colindcarroll.com/">Colin Carrol</a> has done a very good job boiling down the essence of HMC into something more digestible, but if you want the <em>whole shebang</em> on HMC, please read Michael Betancourt’s “<em>A Conceptual Introduction of Hamiltonian Monte Carlo</em>”.</p>
<h2 id="we-solve-odes-to-compute-gradients-of-odes-to-use-in-other-odes">We Solve ODEs to Compute Gradients of ODEs to Use in Other ODEs</h2>
<p>So HMC is just (ok not JUST, but for the purposes of this blog post) a set of differential equations which can be cleverly integrated to get samples from a posterior. There is a bit of a hick up when we do this with ODEs. The $dp / dt$ is a vector in which each component is $\partial V / \partial \phi$. Here, $\phi$ is a particular parameter of our model, and so is an element of $\theta$. If $\phi$ is used to parameterize a differential equation for $y$, then computing gradients is especially difficult because we usually solve differential equations numerically, not analytically. This prevents us from using the chain rule to compute $\partial V / \partial \phi = \partial V/ \partial y \times \partial y / \partial \phi$.</p>
<p>Woe is me, I can’t compute the gradient of my second differential equation to plug into my first differential equation. But, if I solve a <em>third</em> differential equation, then I can compute the gradient of my second differential equation to plug into my first differential equation without having to solve and differentiate the second differential equation! Here is where Xzibit would come out and whisper “Yo Dowg, we heard you like differential equations”.</p>
<p>I understand that last part might be hard to parse so let me explain. In late 2018, a team at The University of Toronto published a new neural network architecture known as the “Neural ODE”. The details of that are not important, but they essentially faced the same problems as we do: they needed gradients of an ODE’s solution with respect to the parameters. In that paper, they use <em>The Adjoint Method</em> to do so. Again, not important how this method works exactly (maybe the topic of a future post). The adjoint method allows for the computation of a functional’s gradient directly, which means instead of computing $\partial y / \partial \phi$ and then $\partial V/ \partial y$, we can compute $\partial V / \partial \phi$ directly. This computation is performed by solving the differential equation for $y$ as well as some other differential equations at the same time.</p>
<p>Go ahead and look at the paper. Appendix B.2 basically outlines a set of differential equations, one of which is the equation for which we want to do inference, and another is a differential equation for $d L / d \theta$ (which I called $d V / d \phi$).</p>
<p>So, to recap, if we solve the adjoint differential equation, we can plug the result into the Hamiltonian differential equation without having to compute derivatives for the differential equation on which we intend to perform bayesian analysis. <em>We solve ODEs to compute gradients of ODEs to use in other ODEs</em>.</p>
<h2 id="implementation">Implementation</h2>
<p>Here is where the wheels fall off. I actually tried implementing this for a very simple differential equation, but I don’t think I’ve really mastered the material. I’m going to lean on my mentor to help me understand how Theano does the AD while I bone up on some AD theory and make some toy models in python.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I think the path is a little clearer now. I can’t say the adjoint method is intuitive, and HMC isn’t either, but they are becoming less opaque black boxes than they previously were. I have a lot of doubts on if this can be done by the end of summer (most of them have to do with my own ability), but damnit if I don’t give it my best shot.</p>Demetri Pananosapananos@uwo.caLet’s take stock of exactly where we are in this journey it implement HMC for differential equations.Gradient Descent for ODEs2019-05-21T00:00:00-07:002019-05-21T00:00:00-07:00https://dpananos.github.io/posts/2019/05/blog-post-14<p>Gradient descent usually isn’t used to fit Ordinary Differential Equations (ODEs) to data (at least, that isn’t how the Applied Mathematics departments to which I have been a part have done it). Nevertheless, that doesn’t mean that it can’t be done. For some of my recent GSoC work, I’ve been investigating how to compute gradients of solutions to ODEs without access to the solution’s analytical form. In this blog post, I describe how these gradients can be computed and how they can be used to fit ODEs to synchronous data with gradient descent.</p>
<h2 id="up-to-speed-with-odes">Up To Speed With ODEs</h2>
<p>I realize not everyone might have studied ODEs. Here is everything you need to know:</p>
<p>A differential equation relates an unknown function $y \in \mathbb{R}^n$ to it’s own derivative through a function $f: \mathbb{R}^n \times \mathbb{R} \times \mathbb{R}^m \rightarrow \mathbb{R}^n$, which also depends on time $t \in \mathbb{R}$ and possibly a set of parameters $\theta \in \mathbb{R}^m$. We usually write ODEs as</p>
<script type="math/tex; mode=display">y' = f(y,t,\theta) \quad y(t_0) = y_0</script>
<p>Here, we refer to the vector $y$ as “the system”, since the ODE above really defines a system of equations. The problem is usually equipped with an initial state of the system $y(t_0) = y_0$ from which the system evolves forward in $t$. Solutions to ODEs in analytic form are often <em>very hard</em> if not impossible, so most of the time we just numerically approximate the solution. It doesn’t matter how this is done because numerical integration is not the point of this post. If you’re interested, look up the class of <em>Runge-Kutta</em> methods.</p>
<h2 id="computing-gradients-for-odes">Computing Gradients for ODEs</h2>
<p>In this section, I’m going to be using derivative notation rather than $\nabla$ for gradients. I think it is less ambiguous.</p>
<p>If we want to fit an ODE model to data by minimizing some loss function $\mathcal{L}$, then gradient descent looks like</p>
<script type="math/tex; mode=display">\theta_{n+1} = \theta_n - \alpha \dfrac{\partial \mathcal{L}}{\partial \theta}</script>
<p>In order to compute the gradient of the loss, we need the gradient of the solution, $y$, with respect to $\theta$. The gradient of the solution is the hard part here because it can not be computed (a) analytically (because analytic solutions are hard AF), or (b) through automatic differentiation without differentiating through the numerical integration of our ODE (which seems computationally wasteful).</p>
<p>Thankfully, years of research into ODEs yields a way to do this (that is not the adjoint method. Surprise! You thought I was going to say the adjoint method didn’t you?). Forward mode sensitivity analysis calculates gradients by extending the ODE system to include the following equations:</p>
<script type="math/tex; mode=display">\dfrac{d}{dt}\left( \dfrac{\partial y}{\partial \theta} \right) = \mathcal{J}_f \dfrac{\partial y}{\partial \theta} +\dfrac{\partial f}{\partial \theta}</script>
<p>Here, $\mathcal{J}$ is the Jacobian of $f$ with respect to $y$. The forward sensitivity analysis is <em>just another differential equation</em> (see how it relates the derivative of the unknown $\partial y / \partial \theta$ to itself?)! In order to compute the gradient of $y$ with respect to $\theta$ at time $t_i$, we compute</p>
<script type="math/tex; mode=display">\dfrac{\partial y}{\partial \theta} = \int_{t_0}^{t_i} \mathcal{J}_f \dfrac{\partial y}{\partial \theta} + \dfrac{\partial f}{\partial \theta} \, dt</script>
<p>I know this looks scary, but since forward mode sensitivities are just ODEs, we actually just get this from what we can consider to be a black box</p>
<script type="math/tex; mode=display">\dfrac{\partial y}{\partial \theta} = \operatorname{BlackBox}(f(y,t,\theta), t_0, y_0, \theta)</script>
<p>So now that we have our gradient in hand, we can use the chain rule to write</p>
<script type="math/tex; mode=display">\dfrac{\partial \mathcal{L}}{\partial \theta} =\dfrac{\partial \mathcal{L}}{\partial y} \dfrac{\partial y}{\partial \theta}</script>
<p>We can use automatic differentiation to compute $\dfrac{\partial \mathcal{L}}{\partial y}$.</p>
<p>OK, so that is some math (interesting to me, maybe not so much to you). Let’s actually implement this in python.</p>
<h2 id="gradient-descent-for-the-sir-model">Gradient Descent for the SIR Model</h2>
<p>The SIR model is a set of differential equations which govern how a disease spreads through a homogeneously mixed closed populations. I could write an entire thesis on this model and its various extensions (in fact, I have), so I’ll let you read about those on your free time.</p>
<p>The system, shown below, is parameterized by a single parameter:</p>
<script type="math/tex; mode=display">\dfrac{dS}{dt} = -\theta SI \quad S(0) = 0.99</script>
<script type="math/tex; mode=display">\dfrac{dI}{dt} = \theta SI - I \quad I(0) = 0.01</script>
<p>Let’s define the system, the appropriate derivatives, generate some observations and fit $\theta$ using gradient descent. Here si what you’ll need to get started:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">autograd</span>
<span class="kn">from</span> <span class="nn">autograd.builtins</span> <span class="kn">import</span> <span class="nb">tuple</span>
<span class="kn">import</span> <span class="nn">autograd.numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="c">#Import ode solver and rename as BlackBox for consistency with blog</span>
<span class="kn">from</span> <span class="nn">scipy.integrate</span> <span class="kn">import</span> <span class="n">odeint</span> <span class="k">as</span> <span class="n">BlackBox</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
</code></pre></div></div>
<p>Let’s then define the ODE system</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">f</span><span class="p">(</span><span class="n">y</span><span class="p">,</span><span class="n">t</span><span class="p">,</span><span class="n">theta</span><span class="p">):</span>
<span class="s">'''Function describing dynamics of the system'''</span>
<span class="n">S</span><span class="p">,</span><span class="n">I</span> <span class="o">=</span> <span class="n">y</span>
<span class="n">ds</span> <span class="o">=</span> <span class="o">-</span><span class="n">theta</span><span class="o">*</span><span class="n">S</span><span class="o">*</span><span class="n">I</span>
<span class="n">di</span> <span class="o">=</span> <span class="n">theta</span><span class="o">*</span><span class="n">S</span><span class="o">*</span><span class="n">I</span> <span class="o">-</span> <span class="n">I</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="n">ds</span><span class="p">,</span><span class="n">di</span><span class="p">])</span>
</code></pre></div></div>
<p>and take appropriate derivatives</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#Jacobian wrt y</span>
<span class="n">J</span> <span class="o">=</span> <span class="n">autograd</span><span class="o">.</span><span class="n">jacobian</span><span class="p">(</span><span class="n">f</span><span class="p">,</span><span class="n">argnum</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="c">#Gradient wrt theta</span>
<span class="n">grad_f_theta</span> <span class="o">=</span> <span class="n">autograd</span><span class="o">.</span><span class="n">jacobian</span><span class="p">(</span><span class="n">f</span><span class="p">,</span><span class="n">argnum</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<p>Next, we’ll define the augmented system (that is, the ODE plus the sensitivities).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">ODESYS</span><span class="p">(</span><span class="n">Y</span><span class="p">,</span><span class="n">t</span><span class="p">,</span><span class="n">theta</span><span class="p">):</span>
<span class="c">#Y will be length 4.</span>
<span class="c">#Y[0], Y[1] are the ODEs</span>
<span class="c">#Y[2], Y[3] are the sensitivities</span>
<span class="c">#ODE</span>
<span class="n">dy_dt</span> <span class="o">=</span> <span class="n">f</span><span class="p">(</span><span class="n">Y</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">2</span><span class="p">],</span><span class="n">t</span><span class="p">,</span><span class="n">theta</span><span class="p">)</span>
<span class="c">#Sensitivities</span>
<span class="n">grad_y_theta</span> <span class="o">=</span> <span class="n">J</span><span class="p">(</span><span class="n">Y</span><span class="p">[:</span><span class="mi">2</span><span class="p">],</span><span class="n">t</span><span class="p">,</span><span class="n">theta</span><span class="p">)</span><span class="nd">@Y</span><span class="p">[</span><span class="o">-</span><span class="mi">2</span><span class="p">::]</span> <span class="o">+</span> <span class="n">grad_f_theta</span><span class="p">(</span><span class="n">Y</span><span class="p">[:</span><span class="mi">2</span><span class="p">],</span><span class="n">t</span><span class="p">,</span><span class="n">theta</span><span class="p">)</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">dy_dt</span><span class="p">,</span><span class="n">grad_y_theta</span><span class="p">])</span>
</code></pre></div></div>
<p>We’ll optimize the $L_2$ norm of the error</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">Cost</span><span class="p">(</span><span class="n">y_obs</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">cost</span><span class="p">(</span><span class="n">Y</span><span class="p">):</span>
<span class="s">'''Squared Error Loss'''</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">y_obs</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linalg</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="n">y_obs</span> <span class="o">-</span> <span class="n">Y</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">axis</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="o">/</span><span class="n">n</span>
<span class="k">return</span> <span class="n">cost</span>
</code></pre></div></div>
<p>Create some observations from which to fit</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">19920908</span><span class="p">)</span>
<span class="c">## Generate Data</span>
<span class="c">#Initial Condition</span>
<span class="n">Y0</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">0.99</span><span class="p">,</span><span class="mf">0.01</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">])</span>
<span class="c">#Space to compute solutions</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">101</span><span class="p">)</span>
<span class="c">#True param value</span>
<span class="n">theta</span> <span class="o">=</span> <span class="mf">5.5</span>
<span class="n">sol</span> <span class="o">=</span> <span class="n">BlackBox</span><span class="p">(</span><span class="n">ODESYS</span><span class="p">,</span> <span class="n">y0</span> <span class="o">=</span> <span class="n">Y0</span><span class="p">,</span> <span class="n">t</span> <span class="o">=</span> <span class="n">t</span><span class="p">,</span> <span class="n">args</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">([</span><span class="n">theta</span><span class="p">]))</span>
<span class="c">#Corupt the observations with noise</span>
<span class="n">y_obs</span> <span class="o">=</span> <span class="n">sol</span><span class="p">[:,:</span><span class="mi">2</span><span class="p">]</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mf">0.05</span><span class="p">,</span><span class="n">size</span> <span class="o">=</span> <span class="n">sol</span><span class="p">[:,:</span><span class="mi">2</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">y_obs</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="n">marker</span> <span class="o">=</span> <span class="s">'.'</span><span class="p">,</span> <span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.5</span><span class="p">,</span> <span class="n">label</span> <span class="o">=</span> <span class="s">'S'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">y_obs</span><span class="p">[:,</span><span class="mi">1</span><span class="p">],</span> <span class="n">marker</span> <span class="o">=</span> <span class="s">'.'</span><span class="p">,</span> <span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.5</span><span class="p">,</span> <span class="n">label</span> <span class="o">=</span> <span class="s">'I'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre></div></div>
<p>Perform Gradient Descent</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">theta_iter</span> <span class="o">=</span> <span class="mf">1.5</span>
<span class="n">cost</span> <span class="o">=</span> <span class="n">Cost</span><span class="p">(</span><span class="n">y_obs</span><span class="p">[:,:</span><span class="mi">2</span><span class="p">])</span>
<span class="n">grad_C</span> <span class="o">=</span> <span class="n">autograd</span><span class="o">.</span><span class="n">grad</span><span class="p">(</span><span class="n">cost</span><span class="p">)</span>
<span class="n">maxiter</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">learning_rate</span> <span class="o">=</span> <span class="mi">1</span> <span class="c">#Big steps</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">maxiter</span><span class="p">):</span>
<span class="n">sol</span> <span class="o">=</span> <span class="n">BlackBox</span><span class="p">(</span><span class="n">ODESYS</span><span class="p">,</span><span class="n">y0</span> <span class="o">=</span> <span class="n">Y0</span><span class="p">,</span> <span class="n">t</span> <span class="o">=</span> <span class="n">t</span><span class="p">,</span> <span class="n">args</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">([</span><span class="n">theta_iter</span><span class="p">]))</span>
<span class="n">Y</span> <span class="o">=</span> <span class="n">sol</span><span class="p">[:,:</span><span class="mi">2</span><span class="p">]</span>
<span class="n">theta_iter</span> <span class="o">-=</span><span class="n">learning_rate</span><span class="o">*</span><span class="p">(</span><span class="n">grad_C</span><span class="p">(</span><span class="n">Y</span><span class="p">)</span><span class="o">*</span><span class="n">sol</span><span class="p">[:,</span><span class="o">-</span><span class="mi">2</span><span class="p">:])</span><span class="o">.</span><span class="nb">sum</span><span class="p">()</span>
<span class="k">if</span> <span class="n">i</span><span class="o">%</span><span class="mi">10</span><span class="o">==</span><span class="mi">0</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">theta_iter</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'YYYEAAAAHHH!'</span><span class="p">)</span>
</code></pre></div></div>
<p>And lastly, compare our fitted curves to the true curves</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sol</span> <span class="o">=</span> <span class="n">BlackBox</span><span class="p">(</span><span class="n">ODESYS</span><span class="p">,</span> <span class="n">y0</span> <span class="o">=</span> <span class="n">Y0</span><span class="p">,</span> <span class="n">t</span> <span class="o">=</span> <span class="n">t</span><span class="p">,</span> <span class="n">args</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">([</span><span class="n">theta_iter</span><span class="p">]))</span>
<span class="n">true_sol</span> <span class="o">=</span> <span class="n">BlackBox</span><span class="p">(</span><span class="n">ODESYS</span><span class="p">,</span> <span class="n">y0</span> <span class="o">=</span> <span class="n">Y0</span><span class="p">,</span> <span class="n">t</span> <span class="o">=</span> <span class="n">t</span><span class="p">,</span> <span class="n">args</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">([</span><span class="n">theta</span><span class="p">]))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">sol</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="n">label</span> <span class="o">=</span> <span class="s">'S'</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'C0'</span><span class="p">,</span> <span class="n">linewidth</span> <span class="o">=</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">sol</span><span class="p">[:,</span><span class="mi">1</span><span class="p">],</span> <span class="n">label</span> <span class="o">=</span> <span class="s">'I'</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'C1'</span><span class="p">,</span> <span class="n">linewidth</span> <span class="o">=</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">y_obs</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="n">marker</span> <span class="o">=</span> <span class="s">'.'</span><span class="p">,</span> <span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.5</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">y_obs</span><span class="p">[:,</span><span class="mi">1</span><span class="p">],</span> <span class="n">marker</span> <span class="o">=</span> <span class="s">'.'</span><span class="p">,</span> <span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.5</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">true_sol</span><span class="p">[:,</span><span class="mi">0</span><span class="p">],</span> <span class="n">label</span> <span class="o">=</span> <span class="s">'Estimated '</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'k'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="n">true_sol</span><span class="p">[:,</span><span class="mi">1</span><span class="p">],</span> <span class="n">color</span> <span class="o">=</span> <span class="s">'k'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre></div></div>
<p>Here is our result!</p>
<div style="text-align:center"><img src="/images/blog/final.png" /></div>
<h2 id="conclusions">Conclusions</h2>
<p>Fitting ODEs via gradient descent is possible, and not as complicated as I had initially thought. There are still some relaxations to be explored. Namely: what happens if we have observations at time $t_i$ for one part of the system but not the other? How does this scale as we add more parameters to the model? Can we speed up gradient descent some how (because it takes too long to converge as it is, hence the <code class="highlighter-rouge">maxiter</code> variable). In any case, this was an interesting, yet related, divergence from my GSoC work. I hope you learned something.</p>Demetri Pananosapananos@uwo.caGradient descent usually isn’t used to fit Ordinary Differential Equations (ODEs) to data (at least, that isn’t how the Applied Mathematics departments to which I have been a part have done it). Nevertheless, that doesn’t mean that it can’t be done. For some of my recent GSoC work, I’ve been investigating how to compute gradients of solutions to ODEs without access to the solution’s analytical form. In this blog post, I describe how these gradients can be computed and how they can be used to fit ODEs to synchronous data with gradient descent.Bayesian Inference for Differential Equations: A Google Summer of Code Project2019-05-07T00:00:00-07:002019-05-07T00:00:00-07:00https://dpananos.github.io/posts/2019/05/blog-post-12<h2 id="do-stuff-that-scares-you">Do Stuff that Scares You</h2>
<p>In the beginning of March 2019, Thomas Wiecki made a tweet about non-linear
ODEs and Bayesian inference in PyMC3. I <a href="https://twitter.com/PhDemetri/status/1103335959033233410">replied</a> basically saying “hey, I like your stuff, and I want to help. Here is the catch, I’m a bit shit at being a dev. How do I move forward?”.</p>
<p>I received some helpful hints from Colin Carroll and Dan Simpson about making small commits to libraries and checking out unit tests. I also received a reply from the PyMC developers account asking if I had considered applying to GSoC 2019.</p>
<p>I hadn’t considered applying, because anything with Google in the name is an auto no for me (mostly because I know I am going to be competing with very qualified candidates and I hate competition. It scares me a little). I took a look at the proposed projects and there it was – <em>ODE capabilities in PyMC3</em> – the perfect project for someone like me. Sometimes, you just feel compelled to do stuff, you know? So I emailed the listed mentors, made a submission, learned I was accepted, recovered from the shock, tweeted out my excitement, and the rest is history.</p>
<p>NumFOCUS asks politely that students make blog submissions about their work biweekly. This is intended to the first post in a long sequence of posts detailing my struggles, successes, and what I have learned. I don’t have comments enabled on the blog (mostly because I didn’t think anyone would ever read it), but feel free to reach out on twitter (because that is where I spend most of my time) and ask about anything that grabs your attention.</p>
<p>Now, a little about the work…</p>
<h2 id="bayesianism-and-differential-equations">Bayesianism and Differential Equations?</h2>
<p>I am a once and future applied mathematician. The bread and butter of applied math is the ODE. I’ve taken more courses on ODE/PDE than I have linear algebra, calculus, or probability. I’ve also published a couple of papers where they were the main tool of investigation.</p>
<p>One thing I’ve noticed in retrospect is that applied mathematicians rarely care about parameter estimation and uncertainty in those estimates. We’re usually characterizing bifurcations, or extending models, or seeing how interventions of a particular type change the dynamics of the system. Only later do we care about getting data and fitting our models.</p>
<p>When we do fit our models, it is usually via a least squares procedure. That works great for 99% of the problems applied mathematicians work on.</p>
<p>But my PhD thesis is part of that 1%.</p>
<p>I work with drug concentrations in patients. We often model drug metabolism as an ODE. You can imagine observing 10 patients, each with 5 concentration measurements after ingesting a drug. My goal is to understand how each patient metabolizes that drug so that I can make <em>personalized</em> doses.</p>
<p>Sounds easy, right? With that dataset of 50 observations, do a least squares fit. Well…it isn’t that easy. There is a lot of heterogeneity at the patient level, and if I were to use all the data to fit a model, I understand more about the population than I do about any given patient. In essence <em>I see the forest for the trees</em> when I really need to <em>see the trees for the forest</em>.</p>
<p>So why not go the other way – do a least squares fit for each patient. This might work if sampling was cheap and easy, but it isn’t. No one wants to sit in a hospital bed and get their blood drawn 5 times in a day. So obtaining data for new patients would be incredibly difficult if I were to take this approach.</p>
<p>So I’m a bit stuck as it stands. I can’t lump the data together because I lose the details of the individual, but focusing on the individual is expensive and impractical. So how do I learn how an individual patient metabolizes a drug?</p>
<p>Clever readers will recognize the approaches above as compelete-pooling and no-pooling. What I really need to do in order to personalize a dose is to use <em>partial-pooling</em>, and that means going Bayesian (ok, not necessarily, but you can tweet at me to ask why).</p>
<p>So there is a clear need to do Bayesian inference with ODEs. Currently PyMC3 doesn’t have that capability (Stan has it), and it really needs it.</p>
<p>I’m really excited to start working on this project. I feel like I’m going to struggle a lot, but that just means I am going to learn a lot. Struggle is nothing but the act of learning.</p>
<p>Keep on the look for another blog post coming out in around 2 weeks time. I’ll probably write about my meeting with my mentor and how I am integrating into the community.</p>Demetri Pananosapananos@uwo.caDo Stuff that Scares YouKing Street Pilot Project2018-11-01T00:00:00-07:002018-11-01T00:00:00-07:00https://dpananos.github.io/posts/2018/11/blog-post-11<p>Toronto started a pilot project to shut down King street to private vehicles in and attempt to ease congestion and increase TTC ridership. I’ve obtained some data from the city and have begun analyzing it. Shown <a href="http://dpananos.github.io/files/TorontoKSP.html">here</a> is an initial plotting of the change in travel times in certain sections of the city. Cooler colors mean travel times have decreased.</p>
<p>I plan on modelling this data very soon. I’ll write a blog post about it once completed.</p>Demetri Pananosapananos@uwo.caToronto started a pilot project to shut down King street to private vehicles in and attempt to ease congestion and increase TTC ridership. I’ve obtained some data from the city and have begun analyzing it. Shown here is an initial plotting of the change in travel times in certain sections of the city. Cooler colors mean travel times have decreased.Neat Little Combinatorics Problem2018-08-31T00:00:00-07:002018-08-31T00:00:00-07:00https://dpananos.github.io/posts/2018/08/blog-post-10<p>I’ll cut right to it. Consider the set $S = (49, 8, 48, 15, 47, 4, 16, 23, 43, 44, 42, 45, 46 )$. What is the expected value for the minimum of 6 samples from this set?</p>
<p>We could always just sample form the set to estimate the expected value. Here is a python script to do just that.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">49</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">48</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">47</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">23</span><span class="p">,</span> <span class="mi">43</span><span class="p">,</span> <span class="mi">44</span><span class="p">,</span> <span class="mi">42</span><span class="p">,</span> <span class="mi">45</span><span class="p">,</span> <span class="mi">46</span><span class="p">])</span>
<span class="n">mins</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1000</span><span class="p">):</span>
<span class="n">mins</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">size</span> <span class="o">=</span> <span class="mi">6</span><span class="p">,</span> <span class="n">replace</span> <span class="o">=</span> <span class="bp">False</span><span class="p">)</span><span class="o">.</span><span class="nb">min</span><span class="p">())</span>
<span class="k">print</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">mins</span><span class="p">))</span>
</code></pre></div></div>
<p>But that is estimating the mean. We can do better and directly compute it. Here is some python code to create all subsets from $S$ of size 6. Then, we simply take out the minimum from each subset and compute the mean.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">combinations</span><span class="p">,</span> <span class="n">groupby</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">49</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">48</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">47</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">23</span><span class="p">,</span> <span class="mi">43</span><span class="p">,</span> <span class="mi">44</span><span class="p">,</span> <span class="mi">42</span><span class="p">,</span> <span class="mi">45</span><span class="p">,</span> <span class="mi">46</span><span class="p">])</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">combinations</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="mi">6</span><span class="p">))</span>
<span class="n">mins</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">c</span><span class="p">))</span>
<span class="n">s</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">g</span> <span class="ow">in</span> <span class="n">groupby</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="n">mins</span><span class="p">)):</span>
<span class="n">s</span><span class="o">+=</span><span class="n">k</span><span class="o">*</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">g</span><span class="p">))</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">mins</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span> <span class="n">s</span> <span class="p">)</span>
</code></pre></div></div>
<p>The script returns 8.18 repeating. Great, but we can do even better! If we can compute the probability density function, we can compute the mean analytically. Let’s consider a smaller problem to outline the solution.</p>
<p>Let our set in question be $(1,2,3,4,5)$. Let the minimum of a sample of 3 numbers from this set be the random variable $z$. Now, note there are $\binom{5}{3} = 10$ ways to choose 3 elements from a set of 5.</p>
<p>How many subsets exist where the minimum is 1? Well, if I sampled 1, then I would still have to pick 2 numbers from a possible 4 numbers larger than 1. There are $\binom{4}{2}$ ways to do this. So $p(z=1) = \binom{4}{2} / \binom{5}{3}$.</p>
<p>In a similar fashion, there are $\binom{3}{2}$ subsets where 2 is the minimum, and $\binom{2}{2}$ subsets where 3 is the minimum. There are no subsets where 4 or 5 are the minimum (why?). So that means the expected minimum value for this set would be</p>
<script type="math/tex; mode=display">\operatorname{E}(z) = \dfrac{ \sum_{k = 1}^{3} k\binom{5-k}{2} }{\binom{5}{3}}</script>
<p>Whatever that sum happens to be. Here is how you could code up the analytic solution to our problem.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="nn">scipy.special</span> <span class="kn">import</span> <span class="n">binom</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">23</span><span class="p">,</span> <span class="mi">42</span><span class="p">,</span> <span class="mi">43</span><span class="p">,</span> <span class="mi">44</span><span class="p">,</span> <span class="mi">45</span><span class="p">,</span> <span class="mi">46</span><span class="p">,</span> <span class="mi">47</span><span class="p">,</span> <span class="mi">48</span><span class="p">,</span> <span class="mi">49</span><span class="p">])</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">sample_size</span> <span class="o">=</span><span class="mi">6</span>
<span class="n">sample_space</span> <span class="o">=</span> <span class="n">x</span><span class="p">[:</span><span class="o">-</span><span class="p">(</span><span class="n">sample_size</span><span class="o">-</span><span class="mi">1</span><span class="p">)]</span>
<span class="n">E</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">s</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">sample_space</span><span class="p">,</span><span class="n">start</span> <span class="o">=</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">E</span><span class="o">+=</span> <span class="n">s</span><span class="o">*</span><span class="n">binom</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">size</span><span class="o">-</span><span class="n">i</span><span class="p">,</span><span class="n">sample_size</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">E</span><span class="o">/</span><span class="n">binom</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">size</span><span class="p">,</span> <span class="n">sample_size</span><span class="p">))</span>
</code></pre></div></div>
<p>Full disclosure, this was on a job application (literally, on the job application), so sorry KiK for putting the answer out there, but the question was too fun not to write up!</p>Demetri Pananosapananos@uwo.caI’ll cut right to it. Consider the set $S = (49, 8, 48, 15, 47, 4, 16, 23, 43, 44, 42, 45, 46 )$. What is the expected value for the minimum of 6 samples from this set?Rat Tumors and PyMC32018-06-24T00:00:00-07:002018-06-24T00:00:00-07:00https://dpananos.github.io/posts/2018/04/blog-post-9<p>I’m very proud to say I have contributed <a href="http://docs.pymc.io/notebooks/GLM-hierarchical-binominal-model.html">this</a> example to PyMC3’s documentation. It details how to compute posterior means for Gelman’s rat tumour example in BDA3.</p>
<p>Hoping to contribute more material to the library in the future.q</p>Demetri Pananosapananos@uwo.caI’m very proud to say I have contributed this example to PyMC3’s documentation. It details how to compute posterior means for Gelman’s rat tumour example in BDA3.