MCAT statistics crash course: what do I need to know? (part 1)

Health Professions MCAT statistics & probability

By Emily Leven

As most of you know, the AAMC has published an extensive outline of topics and skills assessed by the new MCAT (the complete guide can be found HERE). It is important to look at this list before you start studying a subject, and going back to it periodically to make sure you are on the right track. This list is not exhaustive, and only covers reasonably broad topics, but it is still quite useful in keeping your study plan focused.

Today, we’re going to be thinking about how to approach questions that ask you to understand and work with the data- these questions fall under the umbrella of “reasoning about the design and execution of research” (more to come on “data-based and statistical reasoning!”). Graphs, figures, and tables have always been on the MCAT, but with the sociology and psychology content on the new MCAT there is a greater emphasis on understanding statistics per se, and being able to “work” with the data.

Below is the exact list from AAMC of what they want you to know (taken from the AAMC website: Design and Execution), and a crash course review to help you learn it! Use this to guide your learning using statistics review books for what to really hone in on.

Reasoning about the design and execution of research

(the text in bold is taken directly from the AAMC website!)

Identifying the role of theory, past findings, and observations in scientific questioning

It’s important to understand that research doesn’t occur in a vacuum. Often other researchers (or the same research group) will repeat studies to confirm results or to test them in a different population. Sometimes, research is discredited when the results can’t be replicated. You can also have incidental findings in research that become their own research projects. And the circle continues!

Identifying testable research questions and hypotheses

You’ll have to identify what the study is actually testing.
A good hint here is to look at what the passage SAYS the research is testing, though sometimes the passage doesn’t explicitly give you this information.
Even trickier: sometimes a passage will present a study and provide a research question and methods. But you’ll read the methods and realize that the results don’t ACTUALLY test the hypothesis! (See an example of this when you get to “appropriateness” below).

Distinguishing between samples and populations and between results that do and do not support generalizations about populations

First, what is a population? Populations are the ENTIRE group of people or subjects you are interested in for your study. Generalization is the process of applying results from your study population to other populations.
- Example: Maybe you are testing an intervention that will result in all school age children being up to date on all recommended vaccines before starting kindergarten in the United States. In this case, your population is all children entering kindergarten in the US. But you can’t reasonably get data about ALL of these children, so you look at a smaller subset (a sample) of 500 primary schools across the country and then generalize (assume that the results in your sample apply to the population) these results to your larger population.
- Example: If you pick 500 schools in New York City and Los Angeles, do you think that your findings here will be the same as what you would find in all schools across the country? Probably not- these results probably do not support a generalizable conclusion. Vaccination practices (like many things that involve decisions by individuals) are not the same in every part of the country. NYC and LA are two large, urban samples with a lot of diversity within them, but are likely to be quite different from a small, rural town in Montana, or a suburb in North Carolina. If you picked 500 schools from all different parts of the country, maybe then you could make a stronger case for generalizability.

Identifying the relationships among the variables in a study (e.g., independent versus dependent variables; control and confounding variables)

Independent: WHAT the researcher changes
Dependent: OUTCOME of the change
- Example: Let’s go back to our vaccinations in kindergarten. Our study looks at 500 schools across the country and tracks how many kindergarteners began school in 2015 with all their vaccinations. Now, for 2016, they enlist half the schools to institute a policy that sends all parents a reminder letter that vaccines are highly recommended for children and a list of pediatrician’s offices in the community that will take “walk-ins” for vaccinations. The number of kindergarteners who start school with all their vaccines is tracked at the same schools for the 2016 school year.
  - Independent variable: information campaign / convenient process for vaccination
  - Dependent variable: vaccination rate.
- Control: This is the variable or group that STAYS THE SAME throughout the experiment.
  - Example: The schools that did not institute the information campaign in 2016 are the control group. We are comparing their vaccination rates to the group that DID receive the information campaign, and also comparing them to themselves in 2015, to try to minimize the effects of confounders.
- Confounders: these are things that could have an effect on your dependent variable that are NOT your independent variable.
  - For example: Maybe in summer of 2016, there’s a nationwide campaign for childhood vaccination. Vaccination rates EVERYwhere go up. If our group of 250 schools have increased vaccination rates, it will be hard for us to know whether this was because of OUR independent variable (the letters and the pediatricians with walk in hours) or the effect of the national information campaign). This is why it’s helpful that we have our 2015 data. If BOTH groups saw a similar increase in vaccination rates, we can guess that this is an effect that has been confounded by a third variable (in this case, the national campaign).

Reasoning about the appropriateness, precision, and accuracy of tools used to conduct research in the natural sciences

The key word here is “tools.” These are terms applied to your measurement tools.
Accuracy: how close are your results to the true results?
Precision: how close are your results to each other?
Appropriateness: this is kind of a broad term. Basically, is your measurement tool a good way of getting the data?
- Example: Maybe you aren’t calling school nurses, but mailing parents surveys and asking if their children received the HibMenCY, MenACWY-D, or MenACWY-CRM vaccine. All of these are meningitis vaccinations- the difference between them is the earliest age it is going to be safe and effective to administer to a child. But most parents (and people) will have no idea what you’re asking them if you ask them this way. Parents are more likely to know that their child received a meningitis vaccine than they are to know the specific scientific names, and really, all you care about is that the child is protected against meningitis. It would be more appropriate to simply ask parents about “meningitis” than to use the specific names.

Reasoning about the appropriateness, reliability, and validity of tools used to conduct research in the behavioral and social sciences

These terms are very similar to precision and accuracy, and again, are talking about the tools you are using to measure your data.
Validity: similar to accuracy– are you measuring what you want to measure?
- For example, maybe you are calling school nurses (whose job includes tracking student vaccinations) and asking “what is the current vaccination rate at your school?” and the nurse says “80% of kindergarteners got vaccinated this year.” Well, does that mean that 80% of kindergartners got vaccinated this year, but 15% had already received their vaccinations in prior years? A more appropriate tool would ask the same clear, specific question of every nurse.
Reliability: similar to precision - Is your measurement tool good enough that you could reproduce your results if you used the same tool again?
- In this example, instead of reaching out to school nurses, maybe you are just cold-calling the schools and asking whoever picks up the phone. You can imagine that if you called the same school several times and different people picked up, they may or may not know the information you are looking for and could provide different answers. The data you are collecting here is therefore not very reliable (it is also probably not going to be very accurate either).

Reasoning about the features of research studies that suggest associations between variables or causal relationships between them (e.g., temporality, random assignment)

Causal relationship: did A cause B?

Temporality: does the timing make sense? If we do our vaccination information campaign in summer 2016 and vaccination rates don’t rise until 2019, we can’t confidently say that the rates changed because of our campaign.

Random assignment: this is a sampling technique in which we randomly assign people to either our control group (no information campaign) or our intervention group (schools with the campaign). The idea is that if you split people up randomly, then on the whole they will balance each other out and the groups will be comparable, without too many differences between them.
- Example: If you split up the schools by east coast and west coast, instead of by random assignment, then any differences between the groups may be due to regional differences in opinions about vaccinations or availability of vaccinations (two potentially confounding variables) rather than the causal impact of our information campaign.

Reasoning about ethical issues in scientific research

In this example, some ethical questions to consider might be:
- What if our campaign encourages many people to get their children vaccinated, but their communities aren’t prepared for this increase in demand for vaccines and there is a shortage? Are we creating undue stress for parents by encouraging them to protect their children serious illness without ensuring that they can actually take this action?
- What sort of language should be used on the information letters so that there is no coercion to vaccinate children?

If you’ve never worked with statistics, don’t be overwhelmed. AAMC is very clear about where you should focus your studying, and you can always come back to this outline for some “non-scientific” explanations to concepts that you find tricky.

Emily L.

Emily holds a BA from the University of Pennsylvania and an MD from the Icahn School of Medicine at Mount Sinai. She's currently completing training in gastroenterology at Mount Sinai Hospital.

Work with Emily L. Meet all our coaches

Did you know we offer MCAT tutoring?

Learn more

Comments