Introduction
Let \(Y(a)\) be a potential outcome for a continuous measure under treatment status \(A=a\), which for the purpose of this blog post can be considered a binary treatment. When is
\[ \delta = \dfrac{E[Y(1)]}{E[Y(0)]} \] well approximated by \(\exp(\delta^\star)\) where
\[ \delta^\star = E \left[ \ln \left( \dfrac{Y(1)}{Y(0)} \right) \right] = E[\ln(Y(1)] - E[\ln(Y(0)] \>.\]
It seems to be a fairly well accepted proposition that the difference in means on the log scale can be interpreted as a relative change in means on the natural scale, but upon closer inspection they aren’t equivalent. Firstly, Jensen’s inequality prevents interchanging the expectation operator and the logarithm and second \(E[X/Y] \neq E[X]/E[Y]\) since expectation is a linear operation. I think a more nuanced discussion is needed as to if and when we can interpret \(\exp(\delta^\star)\) as \(\delta\).
To be clear, I’m sure this holds fairly readily. I don’t want to overstate the importance of this blog post, but I don’t want to understate it either. This question came up when considering how AB tests should measure changes in continuous metrics, like revenue. Log-transforming revenue to reign in tails is common advice – and I think that’s fine, especially when it makes the sampling distribution of the sample mean more normal looking. Additionally, talking about changes in a relative sense (i.e. “lift” in the metric) is something that comes natural to a lot of companies. But if we’re going to use \(\delta^\star\) as the metric for our experiment, then I personally would like to understand under what conditions this is a good approximation. What assumptions am I implicitly making? I don’t think curiosity in that sense is a bad thing, or a waste of time.
Taylor Series for Random Variables
Before getting into the meat of the blog post, it might be worthwhile to revisit Taylor series for random variables (which we will make heavy use of in this post). Recall that a Taylor series for a continuously differentiable function \(f\) is
\[ f(x) = \sum_{k=0}^{\infty} \dfrac{f^{(k)}(x_0)}{k!} (x - x_0)^k \>, \quad \mid x - x_0 \mid \lt d\]
and that the error made in approximating \(f\) with first \(n+1\) terms of this sum, \(R_n(x)\), can be bounded by
\[ \mid R_n(x) \mid \leq \dfrac{M}{(n+1)!}(x - x_0)^{n+1} \>, \quad \mid x - x_0 \mid \lt d \>.\] We can also expand a function of a random variable, \(X\), in a Taylor series by considering the variation of \(X-\mu\) about \(\mu\)
\[\begin{align} f(X) &= f((X-\mu) + \mu) \>, \\ &= f(\mu) + f^{\prime}(\mu)(X-\mu) + \dfrac{f^{\prime\prime}(\mu)}{2}(X-\mu)^2 + \mathcal{O}\left( (X-\mu)^3 \right) \>. \end{align}\]From here, we can take expectations and leverage the linearity of \(E\) to get a nice second order approximation of \(E(f(X))\)
\[\begin{align} E[f(X)] &\approx E[f(\mu)] + f^{\prime}(\mu)E[(X-\mu)] + \dfrac{f^{\prime\prime}(\mu)}{2}E[(X-\mu)^2] \>, \\ &\approx f(\mu) + \dfrac{f^{\prime\prime}(\mu)}{2}\sigma^2 \>. \end{align}\]Applying Taylor Series To Our Problem
The quantity \(\exp(\delta^\star)\) could be approximately be \(\delta\) under the right circumstances. Let’s expand \(\ln(Y(1)/Y(0))\) in a Taylor series centered around \(Y(1)/Y(0) = 1\). We’re going to be doing quite a few Taylor expansions, so I’m going to color code some of them in order to keep track of which terms belong to which expansion.
\[\begin{align} E\left[\ln \left( \dfrac{Y(1)}{Y(0)} \right)\right] &\approx E \left [ \textcolor{#1f77b4}{\left( \dfrac{Y(1)}{Y(0)} -1 \right) + \dfrac{1}{2} \left( \dfrac{Y(1)}{Y(0)} -1 \right)^2} \right] \>,\\ &\approx \textcolor{#ff7f0e}{E \left[ \dfrac{Y(1)}{Y(0)} \right]} \textcolor{#1f77b4}{- 1 + \dfrac{1}{2} E \left[\left( \dfrac{Y(1)}{Y(0)} -1 \right)^2 \right]} \>. \end{align}\]This approximation is only valid when \(0 \lt Y(1)/Y(0) \leq 2\), and the error in this approximation is bounded by \(E\left[\left(\frac{Y(1)}{Y(0)} -1\right)^3\right]\). Note that we almost have \(\delta\) in our Taylor series expansion, but not quite. We can apply yet another Taylor series expansion on the part in orange to yield
\[\begin{align} \textcolor{#ff7f0e}{E\left[ \dfrac{Y(1)}{Y(0)} \right] \approx \dfrac{E[Y(1)]}{E[Y(0)]} -\frac{\operatorname{cov}[Y(1), Y(0)]}{\mathrm{E}[Y(0)]^2}+\frac{\mathrm{E}[Y(1)]}{\mathrm{E}[Y(0)]^3} \operatorname{var}[Y(0)]} \>. \end{align}\]Let’s assemble this all together now. Our approximation is now
\[\begin{align} E\left[\ln \left( \dfrac{Y(1)}{Y(0)} \right)\right] &\approx \textcolor{#ff7f0e}{ \dfrac{E[Y(1)]}{E[Y(0)]} -\frac{\operatorname{cov}[Y(1), Y(0)]}{\mathrm{E}[Y(0)]^2}+\frac{\mathrm{E}[Y(1)]}{\mathrm{E}[Y(0)]^3} \operatorname{var}[Y(0)]} \textcolor{#1f77b4}{- 1 + \dfrac{1}{2} E \left[\left( \dfrac{Y(1)}{Y(0)} -1 \right)^2 \right]} \>, \\ &\approx \dfrac{E[Y(1)]}{E[Y(0)]} - 1 \textcolor{#ff7f0e}{-\frac{\operatorname{cov}[Y(1), Y(0)]}{\mathrm{E}[Y(0)]^2}+\frac{\mathrm{E}[Y(1)]}{\mathrm{E}[Y(0)]^3} \operatorname{var}[Y(0)]} \textcolor{#1f77b4}{ + \dfrac{1}{2} E \left[\left( \dfrac{Y(1)}{Y(0)} -1 \right)^2 \right]} \>. \end{align}\]Finally \(\delta\) appears in our long and tedious approximation of \(\delta^\star\). Let’s ignore the terms colored in blue and orange for a moment and come back to them later.
Our approximation is now
\[E\left[\ln \left( \dfrac{Y(1)}{Y(0)} \right)\right] \approx \dfrac{E[Y(1)]}{E[Y(0)]} - 1\]
\[ \delta^\star \approx \delta -1 \]
Exponentiating both sides
\[\begin{align} \exp(\delta^\star) &\approx \exp(\delta-1) \\ & \approx 1 + \delta -1 + \mathcal{O}((\delta-1)^2)\\ & \approx \delta \end{align}\]We Did It…Now What?
Now that we’ve written down all the necessary approximations and assumptions, let’s go back and determine under what circumstances this is a valid approximation. Can we break this approximation? Can we break it really badly?
Let’s leave that for the next blog post. I’m tired from doing all this algebra.