The Arbitrariness of Statistics

I always held a grudge against statistics because it seemed so random to me (no pun intended.) All of the equations seem sporadic, computers do most of it for you, and the conditions for tests (independence must be less than 10% of the population, samples must be larger than 30) seem meaningless. Math should be logical and repeatable, and statistics just didn’t seem like that. After a year of statistics, though, it is starting to grow on me. And all of the “arbitrary“ somehow works?

The first thing I encountered in stats was Bessel’s Correction. Basically, when calculating the variance or standard deviation from a set of data, you subtract (1) from the size of your sample in the divisor:

In theory, this allows the standard deviation from the sample to be closer to the true population parameter. By making the denominator smaller, the standard deviation increases, accounting for more variance or bias due to the small sample. For larger samples, subtracting one has such a little effect it no longer really matters. But why subtract (1) regardless of sample size? Wouldn’t it make more sense if it varied depending on samples?

Well, let’s try and model some data ourselves. First, we’ll have a population of [1,9], with a mean of 5 and an average squared distance from the mean of 6.66 (variability). This is shown below:

Next, let’s take samples of n=4 from this population data. Using a random method, the values chosen for 10 different samples are shown below. The sample mean is shown at the top. The numbers in the grid are the square distance that the corresponding sample data point is from its mean. For example, point 27.5625 (highlighted green) is found with (6.25-1)^2. This has been done for each of the numbers in the randomly chosen samples.

Next, we can now compare the “average distance from mean squared” each value is. (Note, this distance is squared to remove any negative signs and increase “weird“ points.)

Notice, that when the average distance of each value from the mean is found using n, only 2/10 of samples overestimate variability. This suggests that small samples from a population instead underestimate variability, consistent with why Bessel’s correction would be needed. When that average squared distance is found by dividing by (n-1), there are now 4/10 samples that overestimate the mean. When the variability is not calculated from all of these 10 samples, the value found with (n-2) is much closer to the true population variability of 6.66. In fact, it only underestimates the value by 6%. In the statistical world, this is extremely good. With a much more robust sampling distribution and many more samples, this value will approach the true variability. This visually prooves Bessel’s correction, as the (n-1) value makes the sample variability much closer to the true population parameter.

Ok, that is fine and dandy, but why does the same (n-1) also appear when finding degrees of freedom for t-values? I guess it’s the same theory. Basically, degrees of freedom takes the number of data points you have and again subtracts one to account for variability. This is then used to calculate critical values for confidence intervals and other statistical tests.

In the future, I would definitely love to check that value myself. I did think that it is pretty interesting how close a crude 10-sized sample distribution got to the true population mean using Bessel’s correction. And just maybe, I don’t think statistics is as arbitrary.

Previous
Previous

Part 1: The Science Behind TMS and MRI Machines

Next
Next

Send it!