MRI scans have allowed researchers to peer inside the human brain. And the technology is great at revealing damage from a stroke, or areas that light up when we see a face.

But brain scan studies have yet to offer much insight into the underpinnings of traits like intelligence, or mental health conditions like anxiety and depression.

A key reason is that these studies need to include scans of thousands of brains, instead of the dozens typically used, a team reported in the March 16 issue of the journal Nature.

"You need a very large sample, and bigger samples are better," says Dr. Nico Dosenbach, an author of the study and an associate professor of neurology at Washington University in St. Louis.

That's a lesson the field of genetics has already learned, says Paul Thompson, a neuroscientist at the University of Southern California who was not involved in the research.

"Twenty years ago you'd hear someone had discovered a gene for criminality or for psychosis or a gene for autism," Thompson says, "and then another group wouldn't find the same thing, or they'd find another gene, and they'd be scratching their head."

Geneticists eventually fixed the problem by expanding their studies from dozens or hundreds of people to millions, Thompson says. Now, neuroscientists appear to be in a similar position, one that will require them to reconsider the results of many small studies.

An illuminating search for intelligence

The new paper on brain scan studies has its roots in a 2018 effort to understand how children develop cognitive abilities, also known as intelligence.

A team including Scott Marek, a researcher in Dosenbach's lab at Washington University, planned to use data from a federal study that was scanning the brains of thousands of adolescents.

"What we wanted to do is just ask the question with this huge sample: How is cognitive ability represented in the brain?"

Previous research had found that intelligence is strongly linked to the thickness of the brain's outermost layer and to the strength of connections between certain brain regions.

So Marek's team analyzed nearly 1,000 brain scans from the federal study. Then they checked their work, using 1,000 different scans.

"What we noticed was that we couldn't replicate everything," Marek says. "It didn't look great."

An area or connection that seemed important in one set of scans might appear insignificant in the other. It was only when they increased the sample size to thousands of brains that the results became more reliable.

The team wondered whether this was also the case with other studies that searched the entire brain for differences associated with complex problems like anxiety, depression and ADHD.

So they got brain scan data from about 50,000 people, then used a computer to conduct lots of simulated studies, both small and large. Once again, the team found that it took thousands of scans to get reliable results.

That was troubling because for years, much smaller samples have been used to produce a stream of scientific papers on mental illness and behavioral disorders.

So far, that research "hasn't really translated to tangibles for patients," Dosenbach says, "and I think these results give us a clue as to why."

The perils of small samples

One problem with small studies is they can only find brain features that produce relatively large effects on mood, behavior, or mental abilities. In Alzheimer's disease, for example, it's easy to show that atrophy of the hippocampus is accompanied by a dramatic loss of memory.

Differences in the brain that are associated with mental illness tend to be far less obvious, and far more controversial. For example, some studies have found that people with major depressive disorder have less activity in the brain's frontal lobe. But the strength of that correlation varies widely from study to study. And there's no way to look at the activity in any one person's frontal lobe and know how that person is feeling.

Another problem with small studies is something called publication bias.

"If multiple groups are doing similar research using small samples, just by chance one of the groups, or several, will have a significant result," Dosenbach says. "And that's what's going to get reported."

When enough of these studies get published, a misleading finding can become the conventional wisdom. But this doesn't mean small studies are necessarily wrong.

"Even a tiny study could hold true." he says. "It's just the chances of that happening are much, much, much, much smaller than for an extremely large study."

So the public should be wary of headlines that extrapolate the findings from a small MRI study to the general population.

A study with "aftershocks"

Many brain scientists are still trying to digest the news that human behavior studies may require thousands of scans.

"It's a little like an earthquake in Los Angeles," UCLA's Thompson says. "It sent a few aftershocks through the neuroscience community."

But Thompson says the solution is obvious, and achievable: Combine the scans from many small studies into one or more large databases — then check the results.

The ENIGMA Consortium, which Thompson helped create, is one effort to make this easy. The group maintains a database with more than 50,000 MRI scans. And scientists have already used that to identify brain differences associated with schizophrenia.

"There's huge differences all over the brain in schizophrenia," Thompson says. "The auditory centers that are involved in hallucinations are abnormal. There's alterations in memory systems, in vision systems."

But it may take even larger studies to find the brain areas and connections associated with mental illnesses like depression and bipolar disorder because the differences are far more subtle.

Some of those studies are already underway.

The National Institutes of Health study on adolescent brain development, for example, has enrolled more than 11,000 young people, and it is scanning their brains periodically to track changes.

The study's large size is, in part, an effort to address the problems found in smaller studies, says Terry Jernigan, a brain scientist at the University of California, San Diego and one of the study's principal investigators.

But it's not enough for brain scan studies to include thousands of people — the studies must also be more diverse than they typically have been, Jernigan says.

"You want to know to what extent your observations are generalizable to all the groups in our society," she says.

Copyright 2022 NPR. To see more, visit https://www.npr.org.