Demystifying Stats for non-statisticians: P-Values

statistics & probability
By Ben We.

Statistical Mediation & Moderation in Psychological Research (19)If you’ve ever taken a statistics course, you’ve experienced the strange, slightly opaque world of statistical jargon, where colloquial language has highly specific meanings that are easily abused. One of the most famous, most abused statistical terms is the “p-value.” In almost every field of science there’s an ongoing discussion over P-values, and whether the common P-Value Threshold of 0.05 is even reasonable or not. So, what is a P-value, and why is 0.05 such a contentious number?

What is a P-Value?

Before I give you the book definition of a P-value, let me recap how the statistical court of significance works. All data is initially assumed to be ordinary, terribly boring, and totally within expected boundaries unless proven elsewise. To decide that some observations are worthy of note, statisticians need a quantitative method.

That’s where the P-value comes in. Academically, the P-value is the probability of obtaining results as extreme as the observed data, assuming that the null hypothesis is correct1. While that definition is rather stuffy, opaque and not at all appealing, I hope you’ll understand it with this more concrete example. Let’s say you have a friend named Alvin who loves to play tricks on you. One day, he comes up to you with a coin and tells you to guess heads or tails.

You guess, and he flips heads.

Then, he flips heads again.

And, heads again,

And, heads again,

And, once more, to be sure, he flips the coin and it lands heads.

How many flips would it take before you suspect that he is using a trick coin? For me, it would be between four and five heads in a row - four is suspicious, and five, far too many heads in a row. This gut feeling is your intuition’s P-value threshold. Unfortunately, while intuition is acceptable with your friend Alvin, it doesn’t fly for statistics. So, let’s actually see what the probability of flipping 4 or 5 heads in a row is.

4 Heads: ½ x ½ x ½ x ½ = 0.0625

5 Heads: ½ x ½ x ½ x ½ x ½ = 0.03125

To translate those numbers into English: There’s a 6.25% chance that you could flip 4 heads in a row, and a 3.125% chance that you could flip 5 heads in a row assuming the coin is an ordinary coin. A 6.25% chance is pretty low, so I would want to examine Alvin’s coin closely. However, your threshold may vary depending on how rare you feel a 6% event is or not.

That 6% chance is the P(robability)-value, and the point when you feel with uncomfortable with the odds is your P-value threshold.

So, how did we get 0.05, and why all the fuss?

0.05, or 5% is a common threshold that most statisticians use to separate “statistically significant” from “statistically insignificant” results. Unfortunately, this threshold was not carefully calculated, but rather was picked arbitrarily by a statistician way in the past.

The problem is that so many people are taught with that standard, that they just look for that magic number, throwing out results with a P-value of 0.051, while confidently wielding results with a P-value of 0.049 like the ten commandments. However, the truth is that the “insignificant” 0.051 is almost just as rare as the “significant” 0.049. So, why the hang up on 0.05, and why don’t we use other numbers?

For one, it is a good baseline metric – 1 in 20 is a pretty rare event, but not so rare that it is an insurmountable bar. For example, during early March, there is approximately a 4-5% chance of experiencing a mixed rain/snow storm in Boston. As any local can attest, a mixed storm in March isn’t extremely rare, but it isn’t expected (or wanted for that matter). As a result, a 5% threshold is strict enough to keep the most unlikely events out, but not so strict as to disallow relatively unlikely events.

More importantly, since 5% is the accepted standard, using that value indicates that you play by the rules and that you’re not trying to pull some trickery. So, many statisticians won’t bat an eye at a P-value of 0.05.

However, there are actual quantifiable issues with the 0.05 P-value, and times when even the most rule-following statistician would gladly move the threshold. However, these are outside the scope of this post, and will be covered another time. Until then, good luck and hope the data gods reward you with some good P-values!

Our statistics and probability tutors are doctoral candidates and PhDs. Our team also includes a small number of tutors, including MD and MBA candidates, who use statistics in the context of specialized fields. We help students master the fundamentals of statistics and probability: basic probability models, combinatorics (combinations and permutations), random variables, discrete and continuous probability distributions, statistical estimation and testing, confidence intervals, and linear regression. Whether you are encountering statistics for the first time, or you are looking for graduate level assistance in a specialized field, such as biostatistics or stochastic processes, we can help you.

Contact us!

Looking for more information on statistics? Check out some other helpful blog posts below!:

Introductory Statistics: Are my data normal?

Statistical Mediation & Moderation in Psychological Research

Why Understanding Statistics Matters More Than Ever