I argued in a recent post that the conceptual replication of the marshmallow test was largely successful. A single data point - whether someone can wait for a larger reward - predicts future achievement.

That replication has generated a lot of commentary. Most concerns the extension to the original study, an examination of whether the marshmallow test retained its predictive power if they accounted for factors such as the parent and child’s background (including socioeconomic status), home environment, and measures of the child’s behavioural and cognitive development.

The result was that these “controls” eliminated the predictive power of the marshmallow test. If you know those other variables, the marshmallow test does not give you any further information.

As I said before, this is hardly surprising. They used around 30 controls - 14 for child and parent background, 9 for the quality of the home environment, 5 for childhood achievement and 2 for behavioural characteristics. It is likely that many of them capture the features that give the marshmallow test its predictive power.

So can we draw any conclusions from the inclusion of those particular controls? One of the most circulated interpretations is by Jessica Calarco in the Atlantic, titled Why Rich Kids Are So Good at the Marshmallow Test. The subtitle is “Affluence—not willpower—seems to be what’s behind some kids' capacity to delay gratification”. Calarco writes:

Ultimately, the new study finds limited support for the idea that being able to delay gratification leads to better outcomes. Instead, it suggests that the capacity to hold out for a second marshmallow is shaped in large part by a child’s social and economic background—and, in turn, that that background, not the ability to delay gratification, is what’s behind kids’ long-term success.

This conclusion is a step too far. For a start, controlling for child background and home environment (slightly more than) halved the predictive power of the marshmallow test. It did not eliminate it. It was only on including additional behavioural and cognitive controls - characteristics of the child themselves - that the predictive power of the marshmallow test was eliminated

But the more interesting question in one of causation. Are the social and economic characteristics themselves the cause of later achievement?

One story we could tell is that the social and economic characteristics are simply proxies for parental characteristics, which are genetically transmitted to the children. Heritability of traits such as IQ tend to increase with age, so parental characteristics would likely have predictive power in addition to that of the four-year old’s cognitive and behavioural skills.

On the flipside, maybe the behavioural and cognitive characteristics of the child are simply reflections of the development environment that the child has been exposed to date. This is effectively Calarco’s interpretation.

Which is the right interpretation? This study doesn’t help answer this question. It was never designed to. As lead study author Tyler Watts tweeted in response to the Atlantic article:


If you want to know whether social and economic background causes future success, you should look elsewhere. (I’d start with twin and adoption studies.)

That said, there were a couple of interesting elements to this new study. While the marshmallow test was predictive of future achievement at age 15, there was no association between the marshmallow test and two composite measure of behaviours at 15. The composite behaviour measures were for internalising behaviours (such as depression) and externalising behaviours (such as anti-social behaviours). This inability to predict future behavioural problems hints that the marshmallow test may obtain its predictive power through the cognitive rather than the behavioural channel.

This possibility is also suggested by the correlation between the marshmallow test and the Applied Problems test, which requires the children to count and solve simple addition problems.

[T]he marshmallow test had the strongest correlation with the Applied Problems subtest of the WJ-R, r(916) = .37, p < .001; and correlations with measures of attention, impulsivity, and self-control were lower in magnitude (rs = .22–.30, p < .001). Although these correlational results were far from conclusive, they suggest that the marshmallow test should not be thought of as a mere behavioral proxy for self-control, as the measure clearly relates strongly to basic measures of cognitive capacity.

Not conclusive, but it points to some areas worth further exploring.

PS: After writing this post (I usually post on delay of between a week and three months), Robert VerBruggen posted a piece at the Institute for Family Studies, making many of the same points. I would have skipped writing the new content - and simply quoted VerBruggen - if I’d seen it earlier. Inside Higher Ed also has a good write-up by Greg Toppo, including this quote from Walter Mischel:

[A] child’s ability to wait in the ‘marshmallow test’ situation reflects that child’s ability to engage various cognitive and emotion-regulation strategies and skills that make the waiting situation less frustrating. Therefore, it is expected and predictable, as the Watts paper shows, that once these cognitive and emotion-regulation skills, which are the skills that are essential for waiting, are statistically ‘controlled out,’ the correlation is indeed diminished.

Also from Mischel:

Unfortunately, our 1990 paper’s own cautions to resist sweeping over-generalizations, and the volume of research exploring the conditions and skills underlying the ability to wait, have been put aside for more exciting but very misleading headline stories over many years.

PPS: In another thread to her article, Calarco draws on the concept of scarcity:

There’s plenty of other research that sheds further light on the class dimension of the marshmallow test. The Harvard economist Sendhil Mullainathan and the Princeton behavioral scientist Eldar Shafir wrote a book in 2013, Scarcity: Why Having Too Little Means So Much, that detailed how poverty can lead people to opt for short-term rather than long-term rewards; the state of not having enough can change the way people think about what’s available now. In other words, a second marshmallow seems irrelevant when a child has reason to believe that the first one might vanish.

I’ve written about scarcity previously in my review of Mullainathan and Shafir’s book. I’m not sure the work on scarcity sheds light on the marshmallow test results. The concept behind scarcity is that poverty-related concerns consume mental bandwidth that isn’t then available for other tasks. A typical experiment to demonstrate scarcity involves priming the experimental subjects with a problem before testing their IQ. When the problem has a large financial cost (e.g. expensive car repairs), the performance of low-income people plunges. Focusing their attention on their lack of resources consumes mental bandwidth. On applying this to the marshmallow test, I haven’t seen much evidence four-year olds are struggling with this problem.

(As an aside, scarcity seems to be the catchall response to discussions of IQ and achievement, a bit like epigenetics is the response to any discussion of genetics.)

Given Calarco’s willingness to bundle the marshmallow test replication into the replication crisis (calling it a “failed replication”), its worth also thinking about scarcity in that light. If I had to predict which results would not survive a pre-registered replication, the experiments in the original scarcity paper are right up there. They involve priming, the poster-child for failed replications. The size of the effect, 13 IQ points from a simple prime, fails the “effect is too large” heuristic.

Then there is a study that looked at low-income households before and after payday, which found no change in cognitive function either side of that day (you could consider this a “conceptual replication”). In addition, for a while now I have been hearing rumours of file drawers containing failed attempts to elicit the scarcity mindset. I was able to find one pre-registered direct replication, but it doesn’t seem the result has been published. (Sitting in a file drawer somewhere?)

There was even debate around whether the original scarcity paper (pdf) showed the claimed result. Reanalysis of the data without dichotomising income (splitting it into two bands rather than treating it as a continuous variable) eliminated the effect. The original authors managed to then resurrect the effect (pdf) by combining the data from three experiments, but once you are at this point, you have well and truly entered the garden of forking paths.