Is the marshmallow test just a measure of affluence?

MischelI argued in a recent post that the conceptual replication of the marshmallow test was largely successful. A single data point – whether someone can wait for a larger reward – predicts future achievement.

That replication has generated a lot of commentary. Most concerns the extension to the original study, an examination of whether the marshmallow test retained its predictive power if they accounted for factors such as the parent and child’s background (including socioeconomic status), home environment, and measures of the child’s behavioural and cognitive development.

The result was that these “controls” eliminated the predictive power of the marshmallow test. If you know those other variables, the marshmallow test does not give you any further information.

As I said before, this is hardly surprising. They used around 30 controls – 14 for child and parent background, 9 for the quality of the home environment, 5 for childhood achievement and 2 for behavioural characteristics. It is likely that many of them capture the features that give the marshmallow test its predictive power.

So can we draw any conclusions from the inclusion of those particular controls? One of the most circulated interpretations is by Jessica Calarco in the Atlantic, titled Why Rich Kids Are So Good at the Marshmallow Test. The subtitle is “Affluence—not willpower—seems to be what’s behind some kids’ capacity to delay gratification”. Calarco writes:

Ultimately, the new study finds limited support for the idea that being able to delay gratification leads to better outcomes. Instead, it suggests that the capacity to hold out for a second marshmallow is shaped in large part by a child’s social and economic background—and, in turn, that that background, not the ability to delay gratification, is what’s behind kids’ long-term success.

This conclusion is a step too far. For a start, controlling for child background and home environment (slightly more than) halved the predictive power of the marshmallow test. It did not eliminate it. It was only on including additional behavioural and cognitive controls – characteristics of the child themselves – that the predictive power of the marshmallow test was eliminated

But the more interesting question in one of causation. Are the social and economic characteristics themselves the cause of later achievement?

One story we could tell is that the social and economic characteristics are simply proxies for parental characteristics, which are genetically transmitted to the children. Heritability of traits such as IQ tend to increase with age, so parental characteristics would likely have predictive power in addition to that of the four-year old’s cognitive and behavioural skills.

On the flipside, maybe the behavioural and cognitive characteristics of the child are simply reflections of the development environment that the child has been exposed to date. This is effectively Calarco’s interpretation.

Which is the right interpretation? This study doesn’t help answer this question. It was never designed to. As lead study author Tyler Watts tweeted in response to the Atlantic article:

If you want to know whether social and economic background causes future success, you should look elsewhere. (I’d start with twin and adoption studies.)

That said, there were a couple of interesting elements to this new study. While the marshmallow test was predictive of future achievement at age 15, there was no association between the marshmallow test and two composite measure of behaviours at 15. The composite behaviour measures were for internalising behaviours (such as depression) and externalising behaviours (such as anti-social behaviours). This inability to predict future behavioural problems hints that the marshmallow test may obtain its predictive power through the cognitive rather than the behavioural channel.

This possibility is also suggested by the correlation between the marshmallow test and the Applied Problems test, which requires the children to count and solve simple addition problems.

[T]he marshmallow test had the strongest correlation with the Applied Problems subtest of the WJ-R, r(916) = .37, p < .001; and correlations with measures of attention, impulsivity, and self-control were lower in magnitude (rs = .22–.30, p < .001). Although these correlational results were far from conclusive, they suggest that the marshmallow test should not be thought of as a mere behavioral proxy for self-control, as the measure clearly relates strongly to basic measures of cognitive capacity.

Not conclusive, but it points to some areas worth further exploring.

PS: After writing this post (I usually post on delay of between a week and three months), Robert VerBruggen posted a piece at the Institute for Family Studies, making many of the same points. I would have skipped writing the new content – and simply quoted VerBruggen – if I’d seen it earlier. Inside Higher Ed also has a good write-up by Greg Toppo, including this quote from Walter Mischel:

[A] child’s ability to wait in the ‘marshmallow test’ situation reflects that child’s ability to engage various cognitive and emotion-regulation strategies and skills that make the waiting situation less frustrating. Therefore, it is expected and predictable, as the Watts paper shows, that once these cognitive and emotion-regulation skills, which are the skills that are essential for waiting, are statistically ‘controlled out,’ the correlation is indeed diminished.

Also from Mischel:

Unfortunately, our 1990 paper’s own cautions to resist sweeping over-generalizations, and the volume of research exploring the conditions and skills underlying the ability to wait, have been put aside for more exciting but very misleading headline stories over many years.

PPS: In another thread to her article, Calarco draws on the concept of scarcity:

There’s plenty of other research that sheds further light on the class dimension of the marshmallow test. The Harvard economist Sendhil Mullainathan and the Princeton behavioral scientist Eldar Shafir wrote a book in 2013, Scarcity: Why Having Too Little Means So Much, that detailed how poverty can lead people to opt for short-term rather than long-term rewards; the state of not having enough can change the way people think about what’s available now. In other words, a second marshmallow seems irrelevant when a child has reason to believe that the first one might vanish.

I’ve written about scarcity previously in my review of Mullainathan and Shafir’s book. I’m not sure the work on scarcity sheds light on the marshmallow test results. The concept behind scarcity is that poverty-related concerns consume mental bandwidth that isn’t then available for other tasks. A typical experiment to demonstrate scarcity involves priming the experimental subjects with a problem before testing their IQ. When the problem has a large financial cost (e.g. expensive car repairs), the performance of low-income people plunges. Focusing their attention on their lack of resources consumes mental bandwidth. On applying this to the marshmallow test, I haven’t seen much evidence four-year olds are struggling with this problem.

(As an aside, scarcity seems to be the catchall response to discussions of IQ and achievement, a bit like epigenetics is the response to any discussion of genetics.)

Given Calarco’s willingness to bundle the marshmallow test replication into the replication crisis (calling it a “failed replication”), its worth also thinking about scarcity in that light. If I had to predict which results would not survive a pre-registered replication, the experiments in the original scarcity paper are right up there. They involve priming, the poster-child for failed replications. The size of the effect, 13 IQ points from a simple prime, fails the “effect is too large” heuristic.

Then there is a study that looked at low-income households before and after payday, which found no change in cognitive function either side of that day (you could consider this a “conceptual replication”). In addition, for a while now I have been hearing rumours of file drawers containing failed attempts to elicit the scarcity mindset. I was able to find one pre-registered direct replication, but it doesn’t seem the result has been published. (Sitting in a file drawer somewhere?)

There was even debate around whether the original scarcity paper (pdf) showed the claimed result. Reanalysis of the data without dichotomising income (splitting it into two bands rather than treating it as a continuous variable) eliminated the effect. The original authors managed to then resurrect the effect (pdf) by combining the data from three experiments, but once you are at this point, you have well and truly entered the garden of forking paths.

Does a moral reminder decrease cheating?

ArielyIn The (Honest) Truth About Dishonesty, Dan Ariely describes an experiment to determine how much people cheat:

[P]articipants entered a room where they sat in chairs with small desks attached (the typical exam-style setup). Next, each participant received a sheet of paper containing a series of twenty different matrices … and were told that their task was to find in each of these matrices two numbers that added up to 10 …

We also told them that they had five minutes to solve as many of the twenty matrices as possible and that they would get paid 50 cents per correct answer (an amount that varied depending on the experiment). Once the experimenter said, “Begin!” the participants turned the page over and started solving these simple math problems as quickly as they could. …

Here’s an example matrix:

matrix

This was how the experiment started for all the participants, but what happened at the end of the five minutes was different depending on the particular condition.

Imagine that you are in the control condition… You walk up to the experimenter’s desk and hand her your solutions. After checking your answers, the experimenter smiles approvingly. “Four solved,” she says and then counts out your earnings. … (The scores in this control condition gave us the actual level of performance on this task.)

Now imagine you are in another setup, called the shredder condition, in which you have the opportunity to cheat. This condition is similar to the control condition, except that after the five minutes are up the experimenter tells you, “Now that you’ve finished, count the number of correct answers, put your worksheet through the shredder at the back of the room, and then come to the front of the room and tell me how many matrices you solved correctly.” …

If you were a participant in the shredder condition, what would you do? Would you cheat? And if so, by how much?

With the results for both of these conditions, we could compare the performance in the control condition, in which cheating was impossible, to the reported performance in the shredder condition, in which cheating was possible. If the scores were the same, we would conclude that no cheating had occurred. But if we saw that, statistically speaking, people performed “better” in the shredder condition, then we could conclude that our participants overreported their performance (cheated) when they had the opportunity to shred the evidence. …

Perhaps somewhat unsurprisingly, we found that given the opportunity, many people did fudge their score. In the control condition, participants solved on average four out of the twenty matrices. Participants in the shredder condition claimed to have solved an average of six—two more than in the control condition. And this overall increase did not result from a few individuals who claimed to solve a lot more matrices, but from lots of people who cheated by just a little bit.

The question then becomes how to reduce cheating. Ariely describes one idea:

[O]ur memory and awareness of moral codes (such as the Ten Commandments) might have an effect on how we view our own behavior.

… We took a group of 450 participants and split them into two groups. We asked half of them to try to recall the Ten Commandments and then tempted them to cheat on our matrix task. We asked the other half to try to recall ten books they had read in high school before setting them loose on the matrices and the opportunity to cheat. Among the group who recalled the ten books, we saw the typical widespread but moderate cheating. On the other hand, in the group that was asked to recall the Ten Commandments, we observed no cheating whatsoever. And that was despite the fact that no one in the group was able to recall all ten.

This result was very intriguing. It seemed that merely trying to recall moral standards was enough to improve moral behavior.

This experiment comes from a paper co-authored by Nina Mazar, On Amir and Ariely (pdf). (I’m not sure where the 450 students in the book comes from – the paper reports 229 students for this experiment. A later experiment in the paper uses 450. There were also a few differences in this experiment to the general cheating story above. People took their answers home for “recycling”, rather than shredding them, and payment was $10 per correct matrix to two randomly selected students.)

This experiment has now been subject to a multi-lab replication by Verschuere and friends. The abstract of the paper:

The self-concept maintenance theory holds that many people will cheat in order to maximize self-profit, but only to the extent that they can do so while maintaining a positive self-concept. Mazar, Amir, and Ariely (2008; Experiment 1) gave participants an opportunity and incentive to cheat on a problem-solving task. Prior to that task, participants either recalled the 10 Commandments (a moral reminder) or recalled 10 books they had read in high school (a neutral task). Consistent with the self-concept maintenance theory, when given the opportunity to cheat, participants given the moral reminder priming task reported solving 1.45 fewer matrices than those given a neutral prime (Cohen ́s d = 0.48); moral reminders reduced cheating. The Mazar et al. (2008) paper is among the most cited papers in deception research, but it has not been replicated directly. This Registered Replication Report describes the aggregated result of 25 direct replications (total n = 5786), all of which followed the same pre-registered protocol. In the primary meta-analysis (19 replications, total n = 4674), participants who were given an opportunity to cheat reported solving 0.11 more matrices if they were given a moral reminder than if they were given a neutral reminder (95% CI: -0.09; 0.31). This small effect was numerically in the opposite direction of the original study (Cohen ́s d = -0.04).

And here’s a chart demonstrating the result (Figure 2):

Figure 2

Multi-lab experiments like this are fantastic. There’s little ambiguity about the result.

That said, there is a response by Amir, Mazar and Ariely. Lots of fluff about context. No suggestion of “maybe there’s nothing here”.

The marshmallow test held up OK

MischelA common theme I see on my weekly visits to Twitter is the hordes piling onto the latest psychological study or effect that hasn’t survived a replication or meta-analysis. More often than not, the study deserves the criticism. But recently, the hordes have occasionally swung into action too quickly.

One series of tweets suggested that loss aversion had entered the replication crisis. A better description of the two papers that triggered the tweets is that they were the latest salvos in a decade-old debate about the interpretation of many loss aversion experiments. They have nothing to do with replication. (If you’re interested, the papers are here (ungated) and here. I have sympathy with parts of the arguments, and some other critiques of the concept of loss aversion. I’ll discuss these papers in a later post.)

Another set of tweets concerned a conceptual replication of the marshmallow test. Many of the comments suggested that the replication was a failure, and that the original study was rubbish. My view is that the original work has actually held up OK, although the interpretation of the result and some of the story-telling that followed the study is challenged.

First, to the original paper by Shoda, Mischel, and Peake, published in 1990 (pdf). In that study, four-year old children were placed at a table with a bell and a pair of “reward objects”. The pair of regard objects might be one marshmallow and two marshmallows, or one pretzel and two pretzels, and so on.

The children were told that the experimenter was going to leave the room, and that if they waited until the experimenter came back, they could have their preferred reward (the two marshmallows). Otherwise, they could call the experimenter back earlier by ringing the bell, but in that case they could only have their less preferred reward (one marshmallow). (Could a truly impatient child just not ring the bell and eat all three marshmallows?) The time until the children rang the bell, up to a maximum of 15 to 20 minutes, was recorded.

The headline result was that the time to ring the bell was predictive of future achievement in the SAT. Those who delayed their gratification had higher achievement. The time waited correlated 0.57 with SAT math scores and 0.42 with SAT verbal scores.

The new paper discusses a “conceptual replication”. It doesn’t copy the experimental design and replicate it precisely, but relies on a similar experimental design and a measure of academic achievement based on a composite of age-15 reading and math scores.

The main point to emerge from this replication is that there is an association between the delay in gratification and academic achievement, but the correlation (0.28) is only half to two-thirds of that found in the original study.

Anyone familiar with the replication literature will find this reduction in correlation unsurprising. One of the headline findings from the Reproducibility Project was that effect sizes in replications were around half of those in the original studies. Small sample sizes (low experimental power) also tend to result in Type M errors, whereby the effect size is exaggerated. (The original study only had 35 children in the baseline condition for which they were able to get the later academic results.)

Shoda and friends recognised this possibility (although perhaps not the reasons for it). As they wrote in the original paper:

[G]iven the smallness of the sample, the obtained coefficients could very well exaggerate the magnitude of the true association. For example, in the diagnostic condition, the 95% confidence interval for the correlation of preschool delay time with SAT verbal score ranges from .10 to .66, and with SAT quantitative score, the confidence interval ranges from .29 to .76. The value and importance given to SAT scores in our culture make caution essential before generalizing from the present study; at the very least, further replications with other populations, cohorts, and testing conditions seem necessary next steps.

The differences between the experiments could also be behind the difference in size of correlation. Each study used different measures of achievement. The marshmallow test in the replication had a maximum wait of only 7 minutes, compared to 15 to 20 minutes in the original (although most of the predictive power in the new study was found to be in the first 20 seconds). The replication created categories for time waited (e.g. 0 to 20 seconds, 20 seconds to 2 minutes, and so on), rather than using time as a continuous variable. It also focused on children with parents who did not have a college education – too many of the children with college-educated parents waited the full seven minutes. The original study drew its sample from the Stanford community.

Given the original authors’ notes about effect size, and the differences in study design, the original findings have held up rather well. For a simple diagnostic, the marshmallow test still has a surprising amount of predictive power. Delay of gratification at age 4 predicts later achievement. Some of the write-ups of this new work have stated that the marshmallow test may not be as strong a predictor of future outcomes as previously believed, but how strong did you actually believe it to be in the first place?

The other headline from the replication is that the predictive ability of the marshmallow test disappears with controls. That is, if you account for the children’s socioeconomic status, parental characteristics and a set of measures of cognitive and behavioural development, the marshmallow test does not provide any further information about that future achievement. It’s no surprise that controls of this nature do this. It simply suggests that the controls are better predictors. The original claim was not that the marshmallow test was the best or only predictor.

What is called into question are the implications that have been drawn from the marshmallow test studies. Shoda and friends suggested that the predictive power of the test might be related to the meta-cognitive strategies that the children employed. For instance, successful children might divert themselves so that they don’t just sit and stare at the marshmallows. If that is the case, we could teach children these strategies, and they might then be better able to delay gratification and have higher achievement in life. This has been a common theme of discussion of the marshmallow test for the last 30 years.

In the replication data, most of the predictive power of the marshmallow test was found to lie in the first 20 seconds. There was not a lot of difference between the kids who waited more than 20 seconds and those that waited the full seven minutes. It is questionable whether meta-cognitive strategies come into play in those first few seconds. If not, there may be little benefit in teaching children strategies to enable them to delay gratification. It seems less a problem of developing strategies for gratification, and more one of basic impulse control. To increase future achievement, broader behaviour and cognitive change might be required.

Teacher expectations and self-fulfilling prophesies

I first came across the idea of teacher expectations turning into self-fulfilling prophesies more than a decade ago, in Steven Covey’s The 7 Habits of Highly Effective People:

One of the classic stories in the field of self-fulfilling prophecies is of a computer in England that was accidently programmed incorrectly. In academic terms, it labeled a class of “bright” kids “dumb” kids and a class of supposedly “dumb” kids “bright.” And that computer report was the primary criterion that created the teachers’ paradigms about their students at the beginning of the year.

When the administration finally discovered the mistake five and a half months later, they decided to test the kids again without telling anyone what had happened. And the results were amazing. The “bright” kids had gone down significantly in IQ test points. They had been seen and treated as mentally limited, uncooperative, and difficult to teach. The teachers’ paradigms had become a self-fulfilling prophecy.

But scores in the supposedly “dumb” group had gone up. The teachers had treated them as though they were bright, and their energy, their hope, their optimism, their excitement had reflected high individual expectations and worth for those kids.

These teachers were asked what it was like during the first few weeks of the term. “For some reason, our methods weren’t working,” they replied. “So we had to change our methods.” The information showed that the kids were bright. If things weren’t working well, they figured it had to be the teaching methods. So they worked on methods. They were proactive; they worked in their Circle of Influence. Apparent learner disability was nothing more or less than teacher inflexibility.

I tried to find the source for this story, and failed. But what I did find was a similar concept called the Pygmalion effect, and assumed that Covey’s story was a mangled or somewhat made-up telling of that research.

What is the Pygmalion effect? It has appeared in my blog feed twice in the past two weeks. Here’s a slice from the first, by Shane Parrish at Farnam Street, describing the effect and the most famous study in the area:

The Pygmalion effect is a psychological phenomenon wherein high expectations lead to improved performance in a given area. Its name comes from the story of Pygmalion, a mythical Greek sculptor. Pygmalion carved a statue of a woman and then became enamored with it. Unable to love a human, Pygmalion appealed to Aphrodite, the goddess of love. She took pity and brought the statue to life. The couple married and went on to have a daughter, Paphos.

Research by Robert Rosenthal and Lenore Jacobson examined the influence of teachers’ expectations on students’ performance. Their subsequent paper is one of the most cited and discussed psychological studies ever conducted.

Rosenthal and Jacobson began by testing the IQ of elementary school students. Teachers were told that the IQ test showed around one-fifth of their students to be unusually intelligent. For ethical reasons, they did not label an alternate group as unintelligent and instead used unlabeled classmates as the control group. It will doubtless come as no surprise that the “gifted” students were chosen at random. They should not have had a significant statistical advantage over their peers. As the study period ended, all students had their IQs retested. Both groups showed an improvement. Yet those who were described as intelligent experienced much greater gains in their IQ points. Rosenthal and Jacobson attributed this result to the Pygmalion effect. Teachers paid more attention to “gifted” students, offering more support and encouragement than they would otherwise. Picked at random, those children ended up excelling. Sadly, no follow-up studies were ever conducted, so we do not know the long-term impact on the children involved.

The increases in IQ were 8 IQ points for the control group, and 12 points for those who were “growth spurters”. (The papers describing the study – from 1966 (pdf) and 1968 (pdf) – are somewhat thin on the experimental methodology, but it seems the description used in the study was “growth spurters” or high scorers in a “test for intellectual blooming”).

I always took the Pygmalion effect with a grain of salt. Most educational interventions have little to zero effect – particularly over the long-run – even when they involve far more than giving a label.

As it turns out, the story is not as clean as Parrish and others typically tell it. There have been battles over the Pygmalion effect since the original paper, with failed replications, duelling meta-analyses and debates about what the Pygmalion effect actually is.

Bob C-J discusses this at The Introduction to the New Statistics (HT: Slate Star Codex – the second appearance of the Pygmalion effect in my feed). Here is a cut of Bob C-J’s summary of these battles:

The original study was shrewdly popularized and had an enormous impact on policy well before sufficient data had been collected to demonstrate it is a reliable and robust result.

Critics raged about poor measurement, flexible statistical analysis, and cherry-picking of data.

That criticism was shrugged off.

Replications were conducted.

The point of replication studies was disputed.

Direct replications that showed no effect were discounted for a variety of post-hoc reasons.

Any shred of remotely supportive evidence was claimed as a supportive replication.  This stretched the Pygmalion effect from something specific (an impact on actual IQ) to basically any type of expectancy effect in any situation…. which makes it trivially true but not really what was originally claimed.  Rosenthal didn’t seem to notice or mind as he elided the details with constant promotion of the effect. …

Multiple rounds of meta-analysis were conducted to try to ferret out the real effect; though these were always contested by those on opposing sides of this issue.  …

Even though the best evidence suggests that expectation effects are small and cannot impact IQ directly, the Pygmalion Effect continues to be taught and cited uncritically.  The criticisms and failed replications are largely forgotten.

The truth seems to be that there *are* expectancy effects–but:

  • that there are important boundary conditions (like not producing real effects on IQ)
  • they are often small
  • and there are important moderators (Jussim & Harber, 2005).

The Jussim and Harber paper (pdf) Bob C-J references provides a great discussion of the controversy. (Bob C-J also recommends a book by Jussim). Here’s a section of the abstract:

This article shows that 35 years of empirical research on teacher expectations justifies the following conclusions: (a) Self-fulfilling prophecies in the classroom do occur, but these effects are typically small, they do not accumulate greatly across perceivers or over time, and they may be more likely to dissipate than accumulate; (b) powerful self-fulfilling prophecies may selectively occur among students from stigmatized social groups; (c) whether self-fulfilling prophecies affect intelligence, and whether they in general do more harm than good, remains unclear, and (d) teacher expectations may predict student outcomes more because these expectations are accurate than because they are self-fulfilling.

That paper contains some amusing facts about the original Rosenthal and Jacobson study. Some students had pre-test IQ scores near zero, others near 200, yet “the children were neither vegetables nor geniuses.” Exclude scores outside of the range 60 to 160, and the effect disappears. Five of the “bloomers” had increases of over 90 IQ points. Again, exclude these five and the effect disappears. The original study is basically worthless. While there is something to the effect of teacher expectations on students, the gap between the story telling and reality is rather large.

Bankers are more honest than the rest of us

Well, probably not. But that’s one interpretation you could take from a the oft-quoted and cited Nature paper by Cohn and colleagues Business culture and dishonesty in the banking industry. That bankers are more honest is as plausible as the interpretation of the experiment provided by the authors.

As background to this paper, here’s an extract from the abstract:

[W]e show that employees of a large, international bank behave, on average, honestly in a control condition. However, when their professional identity as bank employees is rendered salient, a significant proportion of them become dishonest. … Our results thus suggest that the prevailing business culture in the banking industry weakens and undermines the honesty norm, implying that measures to re-establish an honest culture are very important.

I’ve known of this paper since it was first published (plenty of media and tweets), but have always placed it in the basket of likely not true and unlikely to be replicated. Show me some pre-registered replications and I would pay attention. As a result, I didn’t investigate any further.

But recently Koen Smets pointed me toward a working paper from Jean-Michel Hupé that critiqued the statistical analysis. That paper in turn pointed to a critique by Vranka and Houdek, Many faces of bankers’ identity: how (not) to study dishonesty.

These critiques caused me to go back to the Nature paper – and importantly, to the supplementary materials – and read it in detail. It has a host of problems besides being unlikely to replicate. The most interesting of these could lead us to ask whether bankers are actually more honest.

The experiment

Cohn and friends recruited 128 bank employees and randomly split them into two groups, the treatment and control. Before undertaking the experimental task, the treatment group was “primed” with a series of questions that reminded them that they were a bank employee (e.g. At which bank are you presently employed?). The control group were asked questions unrelated to their professional identity.

The experimenters then asked each member of these two groups to flip a coin 10 times, reporting the result via a computer. No-one else could see what they had flipped. For each flip that came up the right way, the experimenters paid them (approximately) $20 (or more precisely, they would be paid $20 per flip if they equalled or outperformed a randomly selected colleague). Ten correct flips and you could have $200 coming your way.

So how can we know if any particular person is telling the truth? You can’t. But across a decent sized group, you know the distribution of results that you would expect (a binomial distribution with a mean of 0.5). You would expect, on average, 50% heads and 50% tails. Someone getting 10 heads is a 1 in a thousand event. By comparing the distribution of the results to what you would expect, you can infer the level of cheating.

So, how did the bankers go? In the control group, 51.6% of coin flips were successful. It’s slightly more than 50%, but within the realms of chance for a group of honest coin flippers. The bankers primed with their professional identity reported 58.2% successful flips, 6.6 percentage points more than the control group. The dishonest bandits.

But how do we know that this result is particular to bankers? What if we primed other professionals with their profession? What if we took a group with no connection to the banking industry and primed them with banking concepts?

Cohn and friends answered these questions directly. When they primed a group of non-banking professionals with their professional identity, they reported 3 percentage points fewer successful coin flips than those in a control condition. Students primed with banking concepts also reported fewer successes, around 1.5%. These differences weren’t statistically significant and could have happened by chance, with no detectable effect from the primes.

These experimental outcomes are the centrepiece behind the conclusion that the prevailing culture in banking weakens and undermines the honesty norm.

But now let’s go to the supplementary materials and learn a bit more about these non-banking professionals and students.

An alternative interpretation

I have only reported the differences in successful coin flips above – as did the authors in the main paper (in a chart, Figure 3a). So how many successes did these non-banking professionals and students have?

In the control condition, the non-banking professionals reported 59.8% successful flips. This dropped to 55.8% when primed with their professional identity. The students were also dishonest bandits, reporting 57.9% successful flips in the control condition, and 56.4% in the banking prime condition.

So looking across the three groups (bankers, non-banking professionals and students), the only honest group we have come across are the bankers in the control condition.

This raises the question of what the appropriate reference point for this analysis is. Should we be asking if banking primes induce banker dishonesty? Or should we be asking whether the control primes – which were designed to be innocuous – can induce honesty? To accept that the banking prime induces bankers to cheat more, we also need to have a starting point that bankers, on the whole, cheat less.

I don’t see a great deal of value in trying to interpret this result and determine which of these frames are correct, as the result is just noise. It is unlikely to replicate. But once you look at these numbers, the interpretation by Cohn and friends appears little more than an overly keen attempt to get the results to fit their “theoretical framework”.

Other problems

I’ve just picked my favourite problem, but the two critiques I linked above argue that there are others. Vranka and Houdek suggest that there are many other ways to interpret the results. I agree with that overarching premise, but am less convinced by some of their suggested alternatives, such as the presence of stereotype or money primes. Those primes seem as robust as this banking prime is likely to be.

Hupé critiques the statistical approach, with which I also have some sympathy, but I haven’t spent enough time thinking about it to agree with his suggested alternative approach.

A quick afterthought

That this experimental result is bunk is not a reason to dismiss the idea that banking culture is poor or that exposure to that culture increases dishonesty. The general problem with the priming literature is that it attempts to elicit differences through primes that are insignificant relative to the actual environments people face.

For example, there is a large difference between answering a few questions about banking and working in a bank. In the latter, you are surrounded by other people, interacting with them daily, seeing what they do. Just because a few questions do not produce an effect doesn’t mean that months of exposure to a your work environment won’t change behaviour. Unfortunately, experiments such as this add approximately zero useful information as to whether this is actually the case.

Noise

Daniel Kahneman has a new book in the pipeline called Noise. It is to be co-authored with Cass Sunstein and Olivier Sibony, and will focus on the “chance variability in human judgment”, the “noise” of the book’s title.

I hope the book is more Kahneman than Sunstein. For all Thinking, Fast and Slow’s faults, it is a great book. You can see the thought that went into constructing it.

Sunstein’s recent books feel like research papers pulled together by a university student – which might not be too far from the truth given the fleet of research assistants at Sunstein’s command. Part of the flatness of Sunstein’s books might also come from his writing pace – he writes more than a book a year. (I count over 30 on his Wikipedia page since 2000, and 10 in the last five years.) Hopefully Kahneman will slow things down, although with a planned publication date of 2020, Noise will be a shorter project than Thinking, Fast and Slow.

What is noise?

Kahneman has already written about noise, most prominently with three colleagues in Harvard Business Review. In that article they set out the case for examining noise in decision-making and how to address it.

Part of that article is spent distinguishing noise from bias. Your bathroom scale is biased if it always reads four kilograms too heavy. If it gives you a different reading each time you get on the scale, it is noisy. Decisions can be noisy, biased, or both. A biased but low noise decision will always be wrong. A biased but high noise decision will be all over the shop but might occasionally get lucky.

One piece of evidence for noise in decision-making is the degree to which people will contradict their own prior judgments. Pathologists assessing biopsy results had a correlation of 0.63 with their own judgment of severity when shown the same case twice (the HBR article states 0.61, but I read the referenced article as stating 0.63). Software programmers differed by a median of 71% in the estimates for the same project, with a correlation of 0.7 between their first and second effort. The lack of consistency in decision-making only grows once you start looking across people.

I find the concept of noise a useful way of thinking about decision-making. One of the main reasons why simple algorithms are typically superior to human decision makers is not because of bias or systematic errors by the humans, but rather the inconsistency of human judgment. We are often all over the place.

Noise is also a good way of identifying those domains where arguments about the power of human intuition and decision-making (which I often make) fall down. Simple heuristics can make us smart. Developed in the right circumstances, naturalistic decision-making can lead to good decisions. But where human decisions are inconsistent, or noisy, it is often unchallenging to identify better alternatives.

Measuring noise

One useful feature of noise is that you can measure it without knowing the correct or best decision. If you don’t know your weight, it is hard to tell if the scale is biased. But the fact it differs in measurement as you get on, off, and on again points to the noise. If you have a decision for which there is a large lag before you know if it was the right one, this lag is an obstacle to measuring bias, but not for noise.

This ability to measure noise without knowing the right answer also avoids many of the debates about whether the human decisions are actually biased. Two inconsistent decisions cannot both be right.

You can measure noise in an organisation’s decision-making processes by examining pairs of decision makers and calculating the relative deviation of their judgments from each other. If one decision maker recommends, say, a price of $200, and the other of $400, the noise is 66%. (They were $200 apart, with the average of the two being $300. 200/300=0.66). You average this noise score across all possible pairs to give you the noise score for that decision.

The noise score has an intuitive meaning. It is the expected relative difference if you picked any two decision makers at random.

In the HBR article, Kahneman and colleagues report on the noise measurements for ten decisions in two financial services organisations. The noise was between 34% to 62% for the six decisions in organisation A, with an average noise of 48%. Noise was between 46% and 70% for the four decisions in organisation B, with an average noise of 60%. This was substantially above the organisations’ expectations. Experience of the decision makers did not appear to reduce noise.

Reducing noise

The main solution proposed by Kahneman and friends to reduce noise is replacing human judgement with algorithms. By returning the same decision every time, the algorithms are noise free.

Rather than suggesting a complex algorithm, Kahneman and friends propose what they call a “reasoned rule”. Here are the five steps in developing a reasoned rule, with loan application assessment an example:

  1. Select six to eight variables that are distinct and obviously related to the predicted outcome. Assets and revenues (weighted positively) and liabilities (weighted negatively) would surely be included, along with a few other features of loan applications.
  2. Take the data from your set of cases (all the loan applications from the past year) and compute the mean and standard deviation of each variable in that set.
  3. For every case in the set, compute a “standard score” for each variable: the difference between the value in the case and the mean of the whole set, divided by the standard deviation. With standard scores, all variables are expressed on the same scale and can be compared and averaged.
  4. Compute a “summary score” for each case―the average of its variables’ standard scores. This is the output of the reasoned rule. The same formula will be used for new cases, using the mean and standard deviation of the original set and updating periodically.
  5. Order the cases in the set from high to low summary scores, and determine the appropriate actions for different ranges of scores. With loan applications, for instance, the actions might be “the top 10% of applicants will receive a discount” and “the bottom 30% will be turned down.”

The reliability of this reasoned rule – it returns the same outcome every time – gives it a large advantage over the human.

I suspect that most lenders are already using more sophisticated models than this, but the strength of a simple approach was shown in Robyn Dawes’s classic article The Robust Beauty of Improper Linear Models in Decision Making (ungated pdf). You typically don’t need a “proper” linear model, such as that produced by regression, to outperform human judgement.

As a bonus, improper linear models, as they are less prone to overfitting, often perform well compared to proper models (as per Simple Heuristics That Make Us Smart). Fear of the expense of developing a complex algorithm is not an excuse to leave the human decisions alone.

Ultimately the development of the reasoned rule cannot avoid the question of what the right answer to the problem is. It will take time to determine definitively whether it outperforms. But if the human decision is noisy, there is an excellent chance that it will hit closer to the mark, on average, that the scattered human decisions.

Behavioural economics: underrated or overrated?

Tyler Cowen’s Conversations with Tyler feature a section in which Cowen throws a series of ideas at the guest, and the guest responds with whether each idea is overrated or underrated. In a few of the conversations, Cowen asks about behavioural economics. Here are three responses:

Atul Gawande

COWEN: The idea of nudge.

GAWANDE: I think overrated.

COWEN: Why?

GAWANDE: I think that there are important insights in nudge units and in that research capacity, but when you step back and say, “What are the biggest problems in clinical behavior and delivery of healthcare?” the nudges are focused on small solutions that have not demonstrated capacity for major scale.

The kind of nudge capability is something we’ve built into the stuff we’ve done, whether it’s checklists or coaching, but it’s been only one. We’ve had to add other tools. You could not get to massive reductions in deaths in surgery or childbirth or massive improvements in end-of-life outcomes based on just those behavioral science insights alone. We’ve had to move to organizational insights and to piece together multiple kinds of layers of understanding in order to drive high-volume change in healthcare delivery.

—–

Steven Pinker

COWEN: Behavioral economics. Economists playing at psychology. Obviously you have a stronger background in psychology than the economists. What do you think of behavioral econ?

PINKER: I’m for it.

COWEN: What’s it missing?

PINKER: I’m completely out of my depth here, but I do think it is too quick to dismiss classical economics. Is this maybe another false dichotomy?

The idea that the rational actor and models derived from it are obsolete because humans make certain irrational choices, have certain rules of thumb that can’t be normatively defended — those aren’t necessarily incompatible, because even though every individual human brain might have its quirks and be irrational, it is possible for a collective enterprise that works by certain rules to have a kind of rationality that none of the individual minds has.

Also it’s possible because we’re corrigible, because the mind is many parts. We can override some of our biases and instincts either though confrontations with reality, through education, through debate.

We do know even that people who are experienced in market transactions, for example, don’t fall for the kinds of fallacies that behavioral economists are so fond of pointing out. You really can’t turn a person into a money pump, even though in the lab I can set up a demo that shows people can be intransitive in their preferences.

You actually put a person in a situation where there’s real money at stake, and all of a sudden they’re not so irrational.

COWEN: They walk away.

—–

Jonathan Haidt

COWEN: You’re a trained psychologist, in addition to your most famous work, you have a lot of other papers which are very well cited, but less famous for other public intellectuals doing what you’d call traditional psychological research. Here we have these economists, they do what they call behavioral economics, and they tread into the field of psychology, do they know what they’re doing? Behavioral economics, underrated or overrated?

HAIDT: Properly rated right now, with one caveat. We psychologists have long felt, “Oh those economists they’re the only ones that are ever consulted in Congress, and they have all these high‑prestige jobs, they have a Nobel Prize, nobody listens to us.”

Some economists beginning with Robert Frank, and Dan Kahneman, Dick Thaler, the fact that economists have been listening to psychologists, and making our work more well‑known, of course Kahneman did a lot of that work, and he is a psychologist.

That’s all good, I’m thrilled with the way that’s going. The only caveat that I would put which I would say if they don’t do this soon, then they would be overrated, is the behavioral economics work is an example of this wonderful dictum from Robert Zion, the famous social psychologist, which is that cognitive psychology is social psychology with all the interesting variables set to zero.

To the extent that behavioral economists are saying, “Look at a person shopping, what influences their decision? If the apple is at eye‑level — .” They’re looking at lone consumers who are trying to make choices to optimize their outcomes. That’s great work, but that’s setting all the interesting variables to zero. The interesting stuff is all social. It’s what does this say about me? Will I be ostracized from my group?

If behavioral economics becomes more social, which I think will be the next phase, then I would say it would deserve ever‑rising market value.

COWEN: Thorstein Veblen, that was his initial vision for it actually, was that it be quite social and that the idea of a social reference class was central to people’s behavioral biases.

HAIDT: Interesting. Again, this is a critique from outside, but what a lot of people say which sounds right to me is that the early economists were great social theorists. My God, you read Adam Smith, what a brilliant world philosopher, historian, they thought so broadly and you tell me, but it seems there was a weird turn in the mid‑20th century towards mathematics.

COWEN: Yes.

HAIDT: I think it made economists set all the interesting variables to zero.

——

These three conversations are worth reading or listening to in full. The episodes with Malcolm Gladwell and Joe Henrich are also excellent.

My blogroll

After my recent post on how I focus, I received a couple of requests for the blogs I follow. Here are my current subscriptions in Feedly, with occasional comments.

Some of these blogs have been in my reader for years, others I am trialling. I am usually trialling a few at any time, and tend to have a “one in, one out” pattern of subscription. It normally takes me about 10 minutes once every day or two to scan the new entries and decide which are worth reading. This set of blogs generates more posts for my read later pile than I can get through.

Askblog (Arnold Kling has been one of my main influences in thinking about causation in social science and economics )

Behavioral Public Policy Blog

Behavioral Scientist (For which I am a founding columnist. You can find my contributions here.)

Behavioural Insights Team

The BE Hub

Bryan Caplan at Econlog (too much politics in the other Econlog bloggers for my taste)

Cal Newport (Author of Deep Work, for which I will I will post a review at some point. My review of So Good They Can’t Ignore You is here.)

Centre for Advanced Hindsight

Decision Science News

Dominic Cumming’s blog

The Enlightened Economist (For the book recommendations)

Evonomics

Ergodicity Economics (Started subscribing after seeing the video posted at the bottom of this post)

Farnam Street

Fresh Economic Thinking

Gene Expression (I’m subscribed to the full Razib Khan firehose, but am there for the gnxp material)

ideas42

Information Processing (Keeps me on top of the latest on genomic prediction)

Jason Collins blog (As a check that my feed is working)

John Kay (Most of my day job is in financial services and markets)

Marginal Revolution

Matt Ridley

Megan McArdle

Offsetting Behaviour

O’Reilly Media

Slate Star Codex

Statistical Modelling, Causal Inference, and Social Science (Andrew Gelman’s blog. In terms of what I have learnt, the most valuable blog on the list)

Tim Harford

Thorstein Veblen’s The Theory of the Leisure Class

220px-The_theory_of_the_Leisure_ClassIn 2011, Thorstein Veblen was ranked seventh in a poll of economists on their favourite, dead, 20th century economist. He ranked behind Keynes, Friedman, Samuelson, Hayek, Schumpeter and Galbraith. His supporters were among the least liberal (in the classical sense of the word) of the survey participants. Given his approach to consumerism, as detailed in The Theory of the Leisure Class, this is no surprise.

The Theory of the Leisure Class, published in 1899, was one of the earliest books to explore the economic assumption that people wish to consume. Veblen noted this was not purely a desire to consume in itself. People also care about status, reputation and honour. They care about their relative position to others, such as their relative wealth. And consumption provides a means of establishing this relative position.

Conspicuous leisure and consumption

To turn wealth into status and reputation, you needs to signal your wealth. Veblen explored two possible signals, conspicuous leisure and conspicuous consumption, with Veblen’s coining of the latter term his best known claim to fame. Veblen has a relatively modern take on these two concepts, recognising the need for waste. Signalling theory tells us that waste required for a signal to be reliable.

When there are few goods for conspicuous consumption, as would be the case in primitive societies, conspicuous leisure is a more accessible way to signal wealth. Conspicuous leisure might involve reaching a level of manners and etiquette that could only be achieved through an excessive use of time, or becoming proficient at sports. Veblen also considers what he calls “vicarious conspicuous leisure”, whereby the head of the house employs servants (or even the housewife) in exercises that waste time.

As society advances, people move from conspicuous leisure to conspicuous consumption. They have an increasingly large circle of people with whom they associate and wish to signal status to. In a small village, everyone is familiar with each other and will note the habits of the servants and other householders carrying out the conspicuous leisure. In a larger city, the conspicuous waste needs to be visible, and conspicuous consumption in the nature of watches, clothing and carriages is immediately obvious. Conspicuous consumption can also be vicarious, with servants dressed up in excessive livery.

Veblen considered that conspicuous consumption will consume all future growth in production and efficiency. He states:

The need of conspicuous waste, therefore, stands ready to absorb any increase in the community’s industrial efficiency or output of goods, after the most elementary physical wants have been provided for.

Veblen also suggests that the use of additional production for conspicuous consumption acts as a Malthusian check on fertility. If signals are wasteful, then some of these resources will not be available for increasing the number of offspring. However, to be evolutionary stable, any reduction in conspicuous consumption by an individual would have them suffer a cost in the form of reputation and status, and in turn, mating opportunities.

One of Veblen’s interesting perspectives is that costliness masquerades under the name of beauty. Veblen states that “beauty, in the naive sense of the word, is the occasion rather than the ground of their monopolization or of their commercial value.” The marks of expensiveness, rarity and exclusivity become known as beauty.

This leads to imperfections in goods, which are evidence of being hand and not machine-made, becoming signs of beauty. Counterfeits lose their beauty on being identified as such. Or each year the fashion changes, which is wasteful – and people prefer the more recent fashions to the older ones.

Veblen applies this concept to beauty in women, with tastes shifting from “women of physical presence” to a “lady”, as conspicuous consumption and leisure grew. The less suited a woman is for work, the more waste, and the more beautiful she would be perceived.

The evolution of the leisure class

Veblen follows his discussion of beauty with a series of evolutionary arguments on the nature of the leisure class. The “leisure class” is an unproductive upper class, and contrasts with the “industrial class”, a subordinated but productive working class. Veblen’s line of argument is often difficult to follow, with the boundary between social and genetic selection unclear. His underlying agenda, a critique of the leisure class, also clouds his arguments.

Veblen argues that the selection of institutions affects the selection of people within society. Institutions change fast, so although only the fittest habits of thought will normally survive, the selection of people cannot keep up. Further, changes which may be good for society as a whole may be bad for certain people. Veblen’s discussion provides a nice picture of a dynamic environment and selection pressures that vary with it.

Despite this dynamism, society is slow to change and conservative. Veblen argues that the leisure class is able to keep society conservative through withdrawing the means of sustenance to the industrial class. As a result, the industrial class does not have the resources to invest in new ideas and habits. Even if they did gain some surplus, that would be wasted on the conspicuous consumption that the leisure class has established as the societal norm.

On an individual level, Veblen considers there are two basic types of people – predatory and peaceful. Predatory types are violent (in certain stages of society), selfish and dishonest, and are not diligent. Peaceful types are the opposite. Which traits are expressed will depend on the state of society. For Veblen, the spectrum of predatory to peaceful roughly coincides with the spectrum of blonde through brunette to Mediterranean ethnicities.

Veblen suggests that society progressed from a peaceful, native state, to a barbarian state, before shifting back towards the more peaceful modern society. Peaceful traits were selected for in the native state, and predatory traits selected for in the barbarian states. Veblen states, however, that selection did not eliminate all the peaceful traits in the barbarian era, allowing peaceful traits to be present in modern society.

As to how these traits are distributed at his time of writing, Veblen sees the leisure class as the predatory type and the industrial class of the peaceful type. The leisure class is not able to be violent in modern society, so they use more “peaceful predatory” methods, such as fraud. The industrial class is not in need of predatory habits, with Veblen suggesting that “economic man” in the sense of the selfish person (an indirect slight on Adam Smith) is useless for modern society. It is by being diligent and honest that the industrial man thrives.

Veblen’s shot at “economic man” is not particularly effective, and does not recognise that selfishness is required, in an evolutionary sense, for all people. The reason industrial man is diligent is because that is how he benefits. If he did not benefit, he would be selected against and disappear. That society benefits is the operation of Smith’s invisible hand.

Despite his categorisation of types between classes, Veblen later suggests that there are no broad character differences between the leisure class and the rest. Some predatory behaviour persists in the industrial class due to the behaviour of the leisure class. He also notes that people in the leisure class, by virtue of their resources, are not subject to harsh selection pressure, so peaceful characteristics can persist. What is most determinative of the traits in the leisure class are those traits which lead to admission to the class. While these have changed over time (say, from raw violence to fraud), they are generally of a predatory nature. It is not easy to gel this position of no difference with his earlier statements, and I am not sure they can be reconciled. My one suggestion is that the differences will grow if the current institutional framework continues to exist.

Put together, Veblen’s use of evolutionary theory is a strange mix of group selection and broad statements on inherent traits. There is little detailed consideration of the selection process that might have occurred. If nothing else, it appears that Veblen simply wanted to critique the leisure class and would use whatever tools were at his disposal. Through his evolutionary discussion, Veblen also manages to avoid addressing the basis for the desire for reputation and status.

Sport, religion and education

The rest of the book largely involves Veblen applying his framework to sport, religion and education.

Sports reflect the predatory skills of the leisure class and delinquents. Veblen disagreed with the common view that sports build temperament, and instead they involve chicanery, falsehood and browbeating. That is why we need umpires. For the industrial classes, Veblen felt that sport is more a diversion than a habit, although the role of sport for the industrial class seems somewhat different today.

Veblen considered that the temperament that inclines one to sport inclines one to religion (and vice versa). Religion, and the conspicuous leisure and consumption associated with it, change the patterns of consumption in the community and lowers its vitality. As an example, Veblen referred to the religious Southern United States. He considered that their industry was more handicraft than industrial. Their range of habits, such as duels, cock-fighting and male sexual incontinence (shown by the presence of mulattoes) were evidence of barbarian traits.

On education, Veblen saw the alignment of education institutions with sport and religion as evidence of education’s status as a leisure class activity. Higher education has many rituals and ceremonies and encourages proper speech and spelling (conspicuous leisure), while lower schools tend to more practical. The teaching of the classics and dead languages were, in particular, conspicuous consumption.

One interesting sideline is Veblen’s view on how industrialisation has affected the status of women. Industrialisation allows women to revert to a more primitive type (Veblen’s primitive type being peaceful and industrial). The leisure class, however, needs to keep women in their place to engage in vicarious conspicuous leisure (they are, after all, a signal for the man). As a result, when educational institutions finally began to admit women, they were primarily enrolled in courses with a quasi-artistic quality, which help women in performing vicarious conspicuous leisure.

[This post is a combined and edited version of three previous posts exploring the book. Those old posts are here, here and here.]

How I focus (and live)

This post is a record of some strategies that I use to focus and be mildly productive. It also records a few other features of my lifestyle.

Why develop these strategies? On top of delivering in my day job, I have always tried to invest heavily in my human capital, and that takes a degree of focus.

The need to adopt many of the below also reflects how easily distracted I am. I have horrible habits when I get in front of a device. The advent of the web has been a mixed blessing for me.

My approaches can shift markedly over time, so it will be interesting to see which of the below are still reflected in my behaviour in a couple of years (and which continue to be supported by the evidence as effective).

If there is a common theme to the below, it is that creating the right environment, not reliance on willpower, is the path to success.

Periods of focus: Most of my productive output occurs in two places. One is on the train, with an hour commute at the beginning and end of each day that I travel to work. The only activities I do on the train are reading (books or articles) and writing. Internet is turned off. This is now an ingrained habit. The train is largely empty for most of the journey, with half through a national park, so it’s a pleasant way to work.

The rest of my output occurs in productive blocks (pomodoros) during the day. At the beginning of each day I schedule a set of half-hour blocks in my diary around my other commitments. In these blocks, I will turn off or close everything I don’t need for the task. I am typically less successful at putting up barriers to human (as opposed to digital) interruptions, except for occasionally closing my office door.

Ideally I will have several blocks in a row (in the morning), with a couple of minutes to stretch in between. I aim for at least 20 half-hour sessions each week. I average maybe 30. I block out the occasional morning in my diary to make sure each week is not completely filled with meetings (with eight direct reports and working in a bureaucracy, that is a real risk).

I also read whenever I can, and that fills a lot of the other space in my life. I read around 100 books per year (about 70-80 non-fiction).

Phone: My iPhone is used for four main purposes: as a phone; as a train timetable; as a listening device (podcasts, audiobooks and music); and for my meditation apps (more on meditation below). It also has a few utilities such as Uber that I rarely use. I don’t use my phone for social media, as a diary, or for email. Most of the day it stays in my pocket or on my desk. All notifications, except calls and text messages, are turned off. I rarely have any reason to look at it.

Even when I do look at my phone, the view is sparse. These are the two screens I see.

One thing you can’t see in these screenshots (for some strange technical reason) is that my phone is in grey scale. There is little colour to get me excited (although I am colour blind….). Except when I make a phone call, message someone, or (loosely) lock the phone with Forest, I use search to find the app. They are hidden in the Dump folder. When I go to my phone, there is little to divert me from my original intention.

iPad: I have an iPad, and it is similarly constrained. All notifications are turned off. It has email, but the account is turned off in settings, with account changes restricted. It takes me about a minute to disable restrictions to turn email on, which slows me down enough to make sure I am checking it for a reason. More on email below.

I also use the iPad for reading and writing (including these posts) on the train. When reading, I use my Kindle in preference to my iPad when I can, as the Kindle has far fewer rabbit holes.

Internet: I subscribe to Freedom which cuts off internet for certain apps and certain times. Among other things, I use it to block the internet from 8pm through to 7am (I don’t want to be checking email or browsing when I first get up), and on Sundays (generally a screen free day). I also use Freedom to shut off internet or certain apps at ad hoc times when I want to focus.

I try not to randomly browse at other times. I have little interest in news (see below), so that reduces the probability of messing around. I have previously used RescueTime to track my time online, but don’t currently as I can’t install it on my work computer, phone or iPad. The tracking had a subtle but limited effect on my behaviour on my home computer when I tried it.

Email: Currently my biggest failure, particularly when I am in the office. I aim to batch my email to a few times per day, but I check and am distracted by new emails more often than I would like. Partly that is because part of my workflow occurs through email, so it is hard not to look.

Social media: I have a Facebook account, but zero friends, so it provides little distraction. (I also like that when I run into people who I haven’t seen for a while, I don’t already know what they have been up to.) I only have the account because this blog has a Facebook page. I try to limit my visits to Twitter and LinkedIn to once a week (normally successful with Twitter, less so with LinkedIn as direct messages sometimes draw me in). Freedom helps constrain this.

Paper diary: My paper diary is an attempt to keep myself away from distracting devices. I also find it faster than the electronic alternative. I have an electronic calendar for work, but it is replicated in the paper diary.

News: I consume little news. I don’t have a television, don’t purchase newspapers and don’t visit internet news sites unless I follow a link based on a recommendation. I rarely miss anything important. If something big happens, someone will normally tell me.

I used to apply a filter to political news of “if this was happening in Canada, would I care?” That eliminated most political news, but I have found that after a few years, I have become so disconnected from Australian politics that most of it flows around me. I don’t recognise most politicians, and I feel unconnected to any of the personalities. Voting is compulsory in Australia, so to avoid being fined or voting for people I know nothing about, I get my name ticked off the electoral roll at a polling place, take the voting slip, but don’t bother filling it out. (And I have almost no idea what Trump is up to.)

I am in a similar place for sports news. Now that I have been disconnected for a while, I have no interest. Any names I overhear mean nothing to me. I couldn’t tell you who won any of the tennis grand slams last year or who the World Series champion is. I don’t think I could recognise a current Australian cricketer on sight.

Blogs: In substitute to going to any news sources, I subscribe to around 25 blogs using a feed reader (Feedly). I scan them around once a day. They provide more reading material than I can get through (through the posts themselves or links), so I have a backlog of reading material in Instapaper (I used to use Pocket, but dumped it when the ads appeared).

Sleep and rest: The evidence on the effect of lack of sleep is strong. I need eight hours a night and generally get it (children permitting). I don’t use screens (except for the Kindle) after 8pm at night. I also subscribe to the broader need for rest and the declining productivity that comes from overwork.

Meditation: Meditation is new for me (around four months), and I am still in the experimental phase. I meditate for around 15 to 20 minutes every day. I find it puts me on the right track at the start of the day (which is when I meditate, children permitting). It also acts as a daily reminder of what I am trying to do.

The evidence of increased concentration and emotional control seems strong enough to give it a go. I suspect I would have dismissed the idea a few years ago (maybe even a year ago), and pending changes in the evidence in favour and my own experience, I am prepared to dismiss it again in the future.

A benchmark I’d like to be able to compare meditation to is focused reading. If I shifted the meditation time to reading, that’s 15 to 20 additional books a year. What is the balance of costs and benefits?

I use three apps to meditate: Insight Timer, Headspace and 10% Happier. I find 10% Happier most useful as a teacher. Headspace is convenient and easy to use, but I don’t like the gamification element to it, and the packages seem relatively shallow and repetitive (although the repetitive nature is not necessarily a bad thing). At the end of the year when it is time to re-subscribe, I suspect I will drop Headspace and stick with 10% Happier if I am still learning something from it. Insight Timer will otherwise give me what I need.

I will post more on my thoughts on meditation in the near future – likely through a review of Sam Harris’s Waking Up in the first instance, as that was the book that pushed me across the line.

I give myself a 60% chance of still being meditating when I write my next post of what I do to focus (planning to do this roughly annually). My lapsing could be due to either changing my mind or failing to sustain the habit.

Diet: I see diet as closely linked to the ability to focus and be productive. I eat well. My diet might best be described as three parts Paleo, one part early agriculturalist, and 5% rubbish. My diet is mainly fruit (lots), vegetables, tubers, nuts, eggs (a dozen a week), meat, legumes and dairy (a lot of yogurt). I eat grains occasionally, largely in the form of rice (a few times of week) and porridge (once or twice a week). I’ll eat bread maybe once or twice a month (I love hamburgers and eggs on toast). A heuristic I often fall back onto is no processed grains, industrial seed oils or added sugar. There’s some arbitrariness to it, but it works. Stephan Guyenet is my most trusted source on diet.

It’s easy to stick to this diet because this is what is in my house. There are no cookies, ice cream or sugar based snacks. I don’t have to go down the aisles of the supermarket when shopping (although my groceries are normally home delivered). If I want to binge, rice crackers and toast are as exciting as I can find in the cupboard.

Exercise: As for diet, part of the productivity package. My major filter for choosing exercise is the desire to still be able to surf and get off the toilet when I’m 80. I surf a couple of times a week. Living within five minutes walk of a beach with good surf is a basic lifestyle criteria.

I did Crossfit for a few years, but don’t live near a Crossfit gym at the moment. However, I don’t think Crossfit is a sustainable long-term approach – at least if I trained as regularly as expected in the gyms I have been to. The intensity would have me falling apart in old age.

That said, I still keep Crossfit elements to my exercise – heavy compound lifts once or twice a week, and a short high intensity burst around once a week (so I’m in the gym once to twice a week). I also walk a lot, including trying to get out of the office for a decent walk at lunch each day. While walking, I consume a lot of audiobooks and podcasts. I stretch for 10 to 15 minutes most days.