```
set.seed(0)
= 1000
N = rlnorm(N, 0.5, 0.5) y
```

You wanna see a little gotcha in statistics? Take the following data

and explain why `glm(y ~ 1, family = gaussian(link=log)`

and `lm(log(y)~1)`

produce different estimates of the coefficients. In case you don’t have an R terminal, here are the outputs

```
= lm(log(y) ~ 1)
log_lm summary(log_lm)
```

```
Call:
lm(formula = log(y) ~ 1)
Residuals:
Min 1Q Median 3Q Max
-1.61028 -0.34631 -0.02152 0.35173 1.64112
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.49209 0.01578 31.18 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.499 on 999 degrees of freedom
```

```
= glm(y ~ 1 , family = gaussian(link=log))
glm_mod summary(glm_mod)
```

```
Call:
glm(formula = y ~ 1, family = gaussian(link = log))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5282 -0.6981 -0.2541 0.4702 6.5869
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.61791 0.01698 36.4 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.9918425)
Null deviance: 990.85 on 999 degrees of freedom
Residual deviance: 990.85 on 999 degrees of freedom
AIC: 2832.7
Number of Fisher Scoring iterations: 5
```

Answer is the same as the difference between \(E(g(X))\) and \(g(E(X))\) which are not always the same. Let me explain.

First, let’s start with the lognormal random variable. \(y \sim \operatorname{Lognormal}(\mu, \sigma)\) means \(\log(y) \sim \operatorname{Normal}(\mu, \sigma)\). So \(\mu, \sigma\) are the parameters of the underlying normal distribution. When we do `lm(log(y) ~ 1)`

, we are modelling \(E(\log(y)) = \beta_0\). So \(\beta_0\) is an estimate of \(\mu\) and \(\exp(\mu)\) is an estimate of the median of the lognormal. That is an easy check

`median(y)`

`[1] 1.600898`

```
#Meh, close enough
exp(coef(log_lm))
```

```
(Intercept)
1.635723
```

If I wanted an estimate of the mean of the lognormal, I would need to add \(\sigma^2/2\) to my estimate of \(\mu\).

`mean(y)`

`[1] 1.855038`

```
#Meh, close enough
= var(log_lm$residuals)
sigma exp(coef(log_lm) + sigma/2)
```

```
(Intercept)
1.852594
```

Ok, onto the glm now. When we use the glm, we model \(\log(E(y)) = \beta_0\), so we model the mean of the lognormal directly. Case in point

`mean(y)`

`[1] 1.855038`

`exp(coef(glm_mod))`

```
(Intercept)
1.855038
```

and if I wanted the median, I would need to consider the extra factor of \(\exp(\sigma^2/2)\)

`median(y)`

`[1] 1.600898`

`exp(coef(glm_mod) - sigma/2)`

```
(Intercept)
1.637881
```

Log link vs. log outcome can be tricky. Just be sure to know what you’re modelling when you use either.