How to choose the right statistical test using four simple questions

academics College statistical analysis
By Varun

Undergraduate students writing a scientific paper, lab report, or senior thesis frequently have to decide how to formalize statistical analyses. How do you know if values in an experimental group are different from those in a control group? While many students have been taught about general statistical concepts—such as p-values and confidence intervals—choosing the right test for statistical significance can be daunting without a framework. As someone who has worked on dozens of translational and clinical research manuscripts, I make this sort of choice in almost every table or figure. I’ve found that the decision can often be boiled down to the following four questions: 

  1. Is my data parametric or non-parametric? 
  2. Is my data numeric or categorical? 
  3. How many groups do I have? 
  4. Are my groups paired or unpaired? 

Let’s go through these questions to better understand how they can be helpful.  

Is my data parametric or non-parametric? 

It is important to decide whether your data follows a normal distribution curve, in which case parametric statistical tests are most appropriate. When plotted in a histogram, values for normally distributed variables lie in an approximately symmetrical fashion around the mean. Parametric tests are usually reasonable for continuous variables that are not measured on a unique, experimenter-derived scale. Examples of such variables include age, height, weight, blood pressure, and temperature.  

If the underlying distribution of the data is not normal—or you are not sure about the distribution—then it is safer to use non-parametric statistical tests. These tests are often well-suitable for scores on a known scale. For example, the Glasgow Coma Scale (GCS) is a clinical score used to measure a person’s level of consciousness after a brain injury. The GCS is quantitative (with scores ranging from 3 to 15) but it is neither continuous nor normally distributed.  

Is my data numeric or categorical? 

Parametric and non-parametric variables differ in their distributions, but still have a hierarchy of smaller and greater values. In contrast, a categorical variable is a variable that can take on only a fixed number of possible values—and whose values differ from each other qualitatively rather than quantitatively. Examples of categorical variables include polical party, blood type, and gender. Determining whether two or more distributions of categorical variables are different requires applying distinct statistical tests.  

How many groups do I have? 

Different statistical tests are also applied when comparing more than two groups. This collection of statistical tests is often referred to as ANOVA (or analysis of variance). When comparing more than two sets of numerical data, an ANOVA-like test for more than two groups should be used initially. If this test returns a significant result, then one can apply another “post hoc” test to determine between which exact groups the difference lies, depending on whether the data is parametric or non-parametric.  

Are my groups paired or unpaired? 

Pairing means that your data includes repeated measurements (e.g. multiple measurements across time) on the same set of subjects. Pairing can also mean that data points for one group of subjects are somehow linked or related to values in another group (e.g. studies on pairs of twins). The relationship between sets of measurements with paired data necessitates the use of different statistical tests.  

An overall algorithm for picking an appropriate test is summarized in the graphic below: 

Screen Shot 2024-10-01 at 9.49.57 AM

One caveat to this graphic is that it assumes you are interested in testing for a difference rather than an association. As with the tests listed above, the best approach for investigating associations between variables depends on what kinds of variables (e.g., parametric, non-parametric) that you are dealing with. 

To sum it up, the key to analyzing your data is understanding your variables. Choosing the right statistical test can be challenging, but hopefully the algorithm above makes this decision a little bit easier!  

Comments

topicTopics
academics study skills medical school admissions MCAT SAT college admissions expository writing strategy English writing MD/PhD admissions LSAT physics GMAT GRE chemistry academic advice graduate admissions biology math interview prep law school admissions ACT language learning test anxiety personal statements premed career advice MBA admissions test prep AP exams homework help creative writing MD study schedules mathematics computer science Common Application history research summer activities secondary applications philosophy organic chemistry economics supplements admissions coaching dental admissions 1L grammar statistics & probability PSAT psychology law legal studies ESL reading comprehension CARS PhD admissions SSAT calculus covid-19 logic games engineering USMLE admissions advice medical school mentorship Latin Spanish biochemistry parents AMCAS English literature case coaching verbal reasoning DAT STEM adjusting to college dental school excel genetics political science skills French Linguistics MBA coursework Tutoring Approaches academic integrity astrophysics chinese classics freewriting gap year letters of recommendation mechanical engineering technical interviews units Anki DO Social Advocacy algebra amino acids art history artificial intelligence business careers cell biology cold emails data science diversity statement finance first generation student geometry graphing kinematics linear algebra mental health pre-dental presentations quantitative reasoning revising software engineering study abroad tech industry time management work and activities writer's block 2L AAMC DMD IB exams ISEE Japanese MD/PhD programs MMI Sentence Correction algorithms analysis essay argumentative writing athletics business skills executive function fellowships functions genomics infinite information sessions international students internships logic networking office hours outlining poetry proofs reading recommendations research fit resume scholarships science social sciences statement of purpose trigonometry 3L ADHD Academic Interest ChatGPT EMT FlexMed Fourier Series Greek Health Professional Shortage Area Italian JD/MBA admissions Lagrange multipliers London MD vs PhD Montessori National Health Service Corps Pythagorean Theorem Python Shakespeare Step 2 TMDSAS Taylor Series Truss Analysis Zoom acids and bases active learning architecture art art and design schools art portfolios bacteriology bibliographies biomedicine boarding school brain teaser burnout campus visits cantonese capacitors capital markets central limit theorem centrifugal force chem/phys chemical engineering chess chromatography class participation climate change clinical experience community service competitions constitutional law consulting cover letters creative nonfiction curriculum dementia demonstrated interest dimensional analysis distance learning econometrics electric engineering electricity and magnetism embryology entropy escape velocity evolution extracurriculars fundraising harmonics health policy history of medicine history of science hybrid vehicles hydrophobic effect ideal gas law immunology induction infinite series