# Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

## CV

This is a page not in th emain menu

## Three Months Into Industry

Published:

I “left” (I say left in quotations because I’m still working on my Ph.D part time) approximately 3 months ago. That is a bit of a milestone. It is a quarter of a year working at a senior-ish level as a data scientist at a national bank. I know a lot of PhDs, especially in quantitative disciplines, are thinking of making the jump from academia to industry. This sequence of blog posts is not advice on how to make that jump, but rather to document one perspective on what that change entails.

## Riddler Solutions

Published:

I like Riddler from 538 mostly because you can solve the riddles with some fun math. If the riddle is interesting enough, I will post solutions on my blog. This is one such riddle.

Published:

The 95% in 95% confidence interval refers not to the probability that any one interval contains the estimand, but rather to the long term relative frequency of the estimator containing the estimand in an infinite sequence of replicated experiments under ideal conditions.

## Don’t Select Features, Engineer Them

Published:

Students in the class I TA love to do feature selection prior to modelling. Examining pairwise correlation and dropping seemingly uncorrelated features is one way they do this, but they also love to fit a LASSO model to their data and refit a model with the selected variables, or they might do stepwise selection if they are feeling in the mood to code it up in python.

## Intuitive Formulae Are Not Always Right

Published:

In the data science class I help TA, we’re going over confidence intervals. Thanks to the central limit theorem, we can report confidence intervals for the out of sample generalization error. Let’s assume our loss function is mean squared error. A confidence interval would then be

## 3 Rules For Giving a Sh!t

Published:

I spend a lot of time on cross validated (CV). CV is my statistical escape from statistics, and it is also a place where I like to prove to myself that I am good at what I do.

Published:

You wanna see a little gotcha in statistics? Take the following data

## What Do Monty Hall and A Steak Have In Common? I Marinate Both

Published:

It took me half a PhD to finally understand the Monty Hall Problem, but I think I get it now. I’ve been tutoring a very bright student in probability (which is so ironic given I failed it way back when) and I’ve surprised myself with how effective I’ve been at solving little homework problems I would have previously not been able to solve. It isn’t like my PhD has been full of these sorts of problems and by virtue to their exposure I’ve gotten better at them. I’ve just been thinking about stats longer than I was thinking about stats in undergrad which sort of leads to this unconcious thinking about homework problems. More on that in the end.

## Nothing is Normal So Don’t Worry About The T Test

Published:

I hate the objection “I can’t use the t-test, my data aren’t normal”. I see it all the time on Cross Validated when a data analyst is tasked with analyzing an experiment after it has been performed. They have piles of data, thousands of observations, and they have no idea what to do with it. They know of the t-test, but they erroneously believe (through no fault of their own) that the t-test is only valid if their data are normal.

## GSoC 2019: It’s Over!

Published:

It’s the end of August, and Google Summer of Code 2019 is over. This blog post is meant to outline what I’ve accomplished, what I’ve failed to accomplish, what I’ve learned, and how I’ve felt over these last 4 months.

## GSoC 2019: A PR is Made!

Published:

Another short update: I’ve made a PR to merge pymc3.ode into PyMC3!

## GSoC 2019: Testing an API

Published:

This is a really short update. Here are a couple things I have been working on since the last blog post.

Published:

## GSoC 2019: Designing an API

Published:

Let’s take stock of where we are on our journey to add ODE capabilities on PyMC3.

## GSoC 2019: A Sampling Notebook

Published:

OK, I will keep this one short and sweet, no math. We have a sampling notebook.

## GSoC 2019: Gradient Descent for ODEs (But This Time, In Theano)

Published:

A little while ago, I wrote a post on doing gradient descent for ODEs. In that post, I used autograd to do the automatic differentiation. While neat, it was really a way for me to get familiar with some math that I was to use for GSoC. After taking some time to learn more about theano, I’ve reimplemented the blog post, this time using theano to perform the automatic differentiation. If you’re read the previous post, then skip right to the code.

Published:

Gradient descent usually isn’t used to fit Ordinary Differential Equations (ODEs) to data (at least, that isn’t how the Applied Mathematics departments to which I have been a part have done it). Nevertheless, that doesn’t mean that it can’t be done. For some of my recent GSoC work, I’ve been investigating how to compute gradients of solutions to ODEs without access to the solution’s analytical form. In this blog post, I describe how these gradients can be computed and how they can be used to fit ODEs to synchronous data with gradient descent.

## GSoC 2019: ODE Inception

Published:

Let’s take stock of exactly where we are in this journey it implement HMC for differential equations.

Published:

## King Street Pilot Project

Published:

Toronto started a pilot project to shut down King street to private vehicles in and attempt to ease congestion and increase TTC ridership. I’ve obtained some data from the city and have begun analyzing it. Shown here is an initial plotting of the change in travel times in certain sections of the city. Cooler colors mean travel times have decreased.

## Neat Little Combinatorics Problem

Published:

I’ll cut right to it. Consider the set $S = (49, 8, 48, 15, 47, 4, 16, 23, 43, 44, 42, 45, 46 )$. What is the expected value for the minimum of 6 samples from this set?

## Rat Tumors and PyMC3

Published:

I’m very proud to say I have contributed this example to PyMC3’s documentation. It details how to compute posterior means for Gelman’s rat tumour example in BDA3.

## Making Plots with Purrr

Published:

I was recently asked to make 4 plots for a collaborator. The plots are all the same, just a scatter plot and a non-linear trend line. Every time I have to do something repetitive, I wince, especially with respect to plots. I thought I would take this opportunity to write a short blog post on how to use functional programming in R to make the same plot for similar yet different data.

## I C What You Did There, Sklearn

Published:

Let me ask you a question: Considering logistic regression can be performed without the use of a penalty parameter, why does sklearn include a penalty in their implementation of logistic regression? I think most people would reply with something about overfitting, which I suppose is a reasonable answer, but isn’t very satisfactory, especially since the documentation for sklearn.linear_model.LogisticRegression() is awash with optimization terminology and never mentions overfitting.

## Advent of Code: Question 2

Published:

Today was a fairly easy challenge. Part one provides us with a 2d array of integers and asks to find the sum of the differences between the largest and smallest numbers in each row. Super easy to do without loops if you know how to use numpy.

Published:

# A Gauntlet Has Been Thrown Down

## A Purrrfect Method for Simulating Data

Published:

When I was doing my Masters, I had to generate a lot of plots, which means I had to generate a lot of data. Usually, the data I would be generating would depend on a parameter (maybe something like the rolling window length, or maybe the bandwidth for some smoothing function) and I would have to try a whirlwind of combinations. In order to do this, I would end up doing is writing code to generate the data once, then just loop over that code for different values of the parameters.

## GyMBo: A Gym Monitoring Bot

Published:

Back in September 2017, I was really tired and learning about Maximum Likelihood and yearned to do some more machine learning. For the longest time I wanted to scrape my gym’s twitter account to get information about when the gym was busiest.

## Coins and Factors

Published:

I love Fivethirtyeight’s Riddler column. Usually, I can solve the problem with computation, but on some rare occasions I can do some interesting math to get the solution without having to code. Here is the first puzzle I ever solved. It is a simple puzzle, yet it has an elegant computational and analytic solution. Let’s take a look.

## The Very Start

Published:

This is my humble website where I post math/stats/data science related stuff. Keep on the look out as I keep updating it with cool projects, questions, and thoughts.

## GyMBo: A Gym Monitoring Robot

A bot capable of predicting future gym usage.

## Retail Churn

Churn is not the same in retail as it is in subscription services. How can we then estimate when a customer is likely to churn?

## The kinetics of regeneration of rhodopsin under enzyme-limited availability of 11-cis retinoid

Published in Vision Research, 2015

## A toolbox for rapid quantitative assessment of chronological lifespan and survival in saccharomyces cerevisiae

Published in Traffic, 2016

## Critical dynamics in population vaccinating behavior

Published in Proceedings of the National Academy of Sciences, 2017

## Drug interactions and pharmacogenetic factors contribute to variation in apixaban concentration in atrial fibrillation patients in routine care

Published in Journal of Thrombosis and Thrombolysis, 2019

## HLADQA1*05 genotype predicts anti‐drug antibody formation and loss of response during infliximab therapy for inflammatory bowel disease

Published in Alimentary pharmacology & therapeutics, 2019

## Comparisons Between Hamiltonian Monte Carlo and Maximum A Posteriori For A Bayesian Model For Apixaban Induction Dose & Dose Personalization

Published in Proceedings of Machine Learning Research, 2020

Published:

View Talk Here

Published:

Published:

Published:

Published:

## University of Waterloo

Teaching Assistant, Department of Applied Mathematics, 2014

I performed teaching assistant duties for various courses in the applied mathematics department at the University of Waterloo. Shown below is a complete list of courses I have TA’d at The University of Waterloo.

## Western University

Teaching Assistant, Department of Epidemiology & Biostatistics, 2019

I have TA’d the following graduate level courses in Biostatistics