I hate the objection “I can’t use the t-test, my data aren’t normal”. I see it all the time on Cross Validated when a data analyst is tasked with analyzing an experiment after it has been performed. They have piles of data, thousands of observations, and they have no idea what to do with it. They know of the t-test, but they erroneously believe (through no fault of their own) that the t-test is only valid if their data are normal.
Can I just say that nothing is normally distirbuted? Distributions themselves are convienient fictions – they do not exist. Coin flips are not binomial, heights are not normal, and failure times are not exponential. We idealize our data as coming from those distributions, but that does not mean that our data are de facto generated from that distribution. So to that end, objections like “well my data aren’t normally distributed” are already moot but I’m going to put an end to this because it has been bugging me for a bit. I’m not going to have any math here, I don’t have anything new to add. I’m just going to be brash in hopes the next time your run an AB test you’ll think “Oh yea, that bearded guy got really upset when I said my data aren’t normal”. GAH!
Here is the important part: In the case you have lots of data, you can make use of the Central Limit Theorem (CLT) to justify your use of the t-test. Roughly, the CLT says that with enough data the sampling distribution of the sample mean is normal with mean equal to the population mean and standard deviation equal to the standard error. So the sample means can be idealied as two draws from two normal distributions, and the difference between two normals is also normal.
Boom, there is your need for normality satisfied. Binomial data? Yep, you can still use the t-test (that’s right, I said it). Highly skewed data? T-test that sucker. Categorical data even? Go simulate it. You would be surprised. The only reason I am being so brash is because in the case of AB tests, we have oodles and oodles of data. If we can’t claim we are in an asymptotic regime with that much data, then when can we? I would not be so brash if we were working with smaller sample sizes (in which case we are required to be a little more careful), but for Pete’s sake just simulate some of these problems and look at the false positive rate/power. The t-test is way more robust than you have been lead to believe.
No go forth and use the t-test. I never want to hear of this objection again. And if you need something concrete to reference, check out this answer I have to a question regarding the t-test and non-normal data.