My cup ran over with criticisms of a very important study of the effects of social media on teen girls’ mental health, without my getting beyond the abstract. Readers will have to wait for the next article to see more criticisms, but these flaws revealed in the abstract alone are rich and worth discussing.
This research paper is a very confusing read, even for someone who is quite familiar with this kind of research. Yet what is said in the paper is crucial to the case being made by Jean Twenge (and Jon Haidt) that government intervention is urgently needed to curb the harms of social media to the mental health of teens. I’ll use the abstract of the paper to discuss how to find flaws in a research study that is intended to influence public health policy.
Parents and school teachers and administrators cannot be expected to interpret original research studies on their own. But they might learn from discussions like this one to be more skeptical of experts who claim their advice is based on social science, but who make emotional appeals and rely on anecdotes to rouse their readership into action.
The ratio of emotional story-telling to actual scientific evidence is very high in stories in the popular press expressing alarm about the damaging effects of teen girls’ use of social media on their mental health.
There is an excess of hype and drama about this topic, even in op-eds in the New York Times. When in doubt, be skeptical of social scientists who try too hard to convince you that they are correct and that other experts have just not noticed something that is obvious to them.
In a widely discussed article, Jean Twenge says that she has been studying generational trends in mental health for over 25 years and that she never before found such a dramatic change in mental health as she saw around 2012.
Have Smartphones Destroyed a Generation? More comfortable online than out partying, post-Millennials are safer, physically, than adolescents have ever been.
Around 2012, I noticed abrupt shifts in teen behaviors and emotional states. The gentle slopes of the line graphs became steep mountains and sheer cliffs, and many of the distinctive characteristics of the Millennial generation began to disappear. In all my analyses of generational data — some reaching back to the 1930s — I had never seen anything like it.
Twenge has advice for parents and teachers:
If you were going to give advice for a happy adolescence based on this survey, it would be straightforward: Put down the phone, turn off the laptop, and do something — anything — that does not involve a screen.
Twenge commands special authority because her views are said to be derived from the best available evidence.
However, most of the key research that Twenge and her fellow advocate Jonathan Haidt cite was not conducted by either of them. I suspect that many of the authors of these studies they cite would disagree with Twenge and Haidt’s interpretation of their work, some vigorously so. That situation makes one centerpiece study led that was led by Twenge particularly important.
The key research article by Twenge and her colleagues is here.
Increases in Depressive Symptoms, Suicide-Related Outcomes, and Suicide Rates Among U.S…In two nationally representative surveys of U.S. adolescents in grades 8 through 12 ( N = 506,820) and national..
The article is unfortunately paywalled, but here is its abstract. We can do a lot with it.
In two nationally representative surveys of U.S. adolescents in grades 8 through 12 (N = 506,820) and national statistics on suicide deaths for those ages 13 to 18, adolescents’ depressive symptoms, suicide-related outcomes, and suicide rates increased between 2010 and 2015, especially among females. Adolescents who spent more time on new media (including social media and electronic devices such as smartphones) were more likely to report mental health issues, and adolescents who spent more time on nonscreen activities (in-person social interaction, sports/exercise, homework, print media, and attending religious services) were less likely. Since 2010, iGen adolescents have spent more time on new media screen activities and less time on nonscreen activities, which may account for the increases in depression and suicide. In contrast, cyclical economic factors such as unemployment and the Dow Jones Index were not linked to depressive symptoms or suicide rates when matched by year.
The editors at a top psychology journal, Clinical Psychological Science, and the reviewers the editors picked were obviously impressed enough to recommend the article and its abstract be published in the form that we now see.
I noticed lots of things that made me suspicious because I have higher standards for talking about risks to health than most psychologists do.
- I received excellent training in my Ph.D. studies as a research-oriented clinical psychologist. I received my doctorate in 1975 but then began working in situations where medical scientists and public health officials demanded stricter standards than what was required of psychologists trying to get published in a respectable psychology journal. Lives depended on what a different kind of expert decided about risks from the often limited and flawed data that was available to them.
The COVID pandemic and the quick decisions that had to be made about what advice could be given concerning vaccination, social distancing and lockdowns put this kind of expertise on display. The world-class experts giving briefings on the best of cable news were good at policing each other to avoid exaggerating what was known and to admit they did not know. “We don’t know yet” was often the best answer, as frustrating as it was.
For a start, I expect more information from an abstract than this one provided. The authors did not follow standard advice on what to include in an abstract. I’ll have a future story documenting how abstracts attached to paywalled articles like the one we are discussing here can actually kill people, aside from spreading misconceptions.
Rather than doing their own research to collect new data, these authors relied on existing survey data sets collected for other purposes. This leaves lots of questions about they did this that the authors do not address in a transparent way.
How did the authors integrate this data from different sources in one study? Relying on someone else’s data is attractive and may at first seem expedient, but effectively and validly doing requires a lot of difficult decision-making.
Inevitably, the original researchers did not ask the right survey questions for new research. What questions in the surveys best fit the new issues researchers wanted to address? How could the new researchers verify that their selection from already collected data was most valid and relevant to their issues?
Twenge and her co-authors imply in the abstract that they had been able somehow to integrate the survey questions with information from the national statistics on deaths by suicide. I knew that was bunk. Ethics committees overseeing the protection of human subjects insist the data be anonymized so that identification and matching of people across data sets becomes virtually impossible.
Then, there is the problem of the small number of suicides in this relatively low-risk group. Let’s stop here and apply some numbers I revealed last time.
Any potential risk factors the authors can find in these pre-existing survey questions must pass the test of predicting relatively infrequent events with some percision. The abstract suggests the authors may have succeeded (“which may account for the increases in depression and suicide), but that would be statistically improbable, given the basic rate of death by suicide and any conceivable fluctuation in the study period of this article.
For 2017, we have about 420 suicides to explain among 20.5 million girls. I wish the authors luck in using whatever fancy statistics they can muster to predict which girls will die by suicide with the risk factors they can pull from other people’s data. Chances are no one died by suicide or only a chance handful from participants in the survey data they acquired. Neither Twenge and her co-authors or readers can tell.
Not being able to identify which of the teens completing the survey died by suicide means the authors will be left making speculative statements beyond what their data allow.
The authors used the term “iGen adolescents” in the abstract to describe the teens they studied. That fits with Jean Twente’s best-selling books, but I was skeptical about such a sweeping term being able to capture much of the similarities and differences in an increasingly diverse and divided America in the association of use of social media and mental health.
Was any similarity of teens falling in this age range more important than the vast range of differences? Consider one white teen having alcoholics or Trump supporters for parents versus another teen having teetotaler Hindu parents who insisted that their teen study hard enough to go to medical school and become a physician? White teens with two Ivy League faculty as parents versus a Black teen raised by a single grandmother who dropped out of high school and does not have internet? Versus a Black teen raised by a single grandmother who dropped out of high school, but the great of the story is the teen’s mother was an innocent victim of random gun violence and the grandmother insists the teen fulfill the mother’s dream and go to college, no excuses accepted?
I could generate thousands of these kinds of contrasts, and some would be quite absurd.
The final conclusion where I seem to be headed is that a generational label like iGen or Generation Z cannot capture much of variations among teens — or across an individual teen’s transition into adulthood and afterward.
“iGen” [Don’t you like the cool choice of labels so that you automatically think of having “iPhones” as what 25 million American teen girls have in common?] might serve to highlight some things teens that might be missed that teens have in common. Surely it misses a lot of things teens don’t have in common, whether they are from radically different backgrounds or with nearly identical demographics but just different in the place of social media in their lives.
The authors end their abstract with straight-faced reassurance that they controlled for “cyclical economic factors such as unemployment and the Dow Jones Index,” matched by year. I can just imagine some badass experts at conferences I have attended who would lie in wait for a speaker to say such a silly thing.
Academics who think their research saves lives can be real a*holes when dealing with other academics whose research they think will never save any lives.
Imagine the response of experts accustomed to identifying health risks from correlations found in survey or surveillance data. Unprepared for what they would hear, some would have spilled coffee on their fancy suits and chocked on the stale Danish from the free conference breakfast as they scrambled to correct the speaker, not allowing anyone to discuss what else the presenter had to say.
I can imagine the string of cliched criticism that could be unleashed.
“Of course, you know that correlation does not equal causality.”
“You can’t do magic with statical controls of correlations when all you have is somebody else’s survey data they collected for some other purposes.”
“What a dumb choice! Are you a psychologist who does not understand regression analysis or do you have books to sell at the conference? Will your next slide tell us where to find your Tedtalk?
Maybe the badass expert would be in an uncharacteristically charitable mood and simply explain:
“I appreciate your effort to find support for a hypothesis that excites you.You should realize that you are relying on statistical controls to settle some issues of causality that are not readily solved. If you were to rely on such controls, you are first making the assumption that you have isolated all the variables that could possibly explain away your findings. I don’t think these crude economic indicators begin to do that. Secondly, you are assuming that these variables are measured without error. I don’t think an economist would say these two variables perfectly measure year to year differences in the economy affecting either teen’s use of social media or dying by suicide.”
One final cynical a*hole comment before we move on —
“If we had used your approach to statistical analysis, we would have concluded at the early days of mystery in the HIV/AIDS epidemic that someone using poppers to enhance orgasm during casual sex or simply having too many Judy Garland LPs in their vinyl collection was a modifiable risk factor.”
Frightened and humiliated, the psychologist trying to finish their talk would miss a very serious and useful message that was being disguised here.
Not knowing what you are doing with bad data and a computer program can lead to all kinds of compelling, but spurious correlations to get worked up about, some more plausible for a while than the modifiable risk factor you are listening for in very noisy data.
So, just what did Twenge and colleagues do with “two nationally representative surveys of U.S. adolescents in grades 8 through 12 (N = 506,820) and national statistics on suicide deaths for those ages 13 to 18, adolescents’ depressive symptoms, suicide-related outcomes, and suicide rates increased between 2010 and 2015?”
That is a true mystery that is never clarified in this abstract. I was stumped at first. I gave the authors the benefit of a doubt and thought maybe they did some kind of prospective analysis, looking ahead and predicting later things that happened to individuals from their earlier responses on surveys.
I had to get a copy of the paywalled article. The overall design of the study was still difficult to decipher from the methods section, where it should have been laid out in detail and given a name, like case-control or cohort study.
I eventually figured out that the authors did not have two “nationally representative surveys of U.S. adolescents.” They had over two dozen cross-sectional retrospective studies (a one-time survey asking about the past year) with nonoverlapping samples and important differences in the questions that were asked. No questions at all about social media in the survey for some years (!).
This dog’s breakfast of a design for this study will be the topic of my next article about this study, as we dig deeper into what can reasonably be claimed from this study and what cannot — if we stick to principles of best science, not just good story-telling.