Author: Jason Collins

Economics. Behavioural and data science. PhD economics and evolutionary biology. Blog at

Teacher expectations and self-fulfilling prophesies

I first came across the idea of teacher expectations turning into self-fulfilling prophesies more than a decade ago, in Steven Covey’s The 7 Habits of Highly Effective People:

One of the classic stories in the field of self-fulfilling prophecies is of a computer in England that was accidently programmed incorrectly. In academic terms, it labeled a class of “bright” kids “dumb” kids and a class of supposedly “dumb” kids “bright.” And that computer report was the primary criterion that created the teachers’ paradigms about their students at the beginning of the year.

When the administration finally discovered the mistake five and a half months later, they decided to test the kids again without telling anyone what had happened. And the results were amazing. The “bright” kids had gone down significantly in IQ test points. They had been seen and treated as mentally limited, uncooperative, and difficult to teach. The teachers’ paradigms had become a self-fulfilling prophecy.

But scores in the supposedly “dumb” group had gone up. The teachers had treated them as though they were bright, and their energy, their hope, their optimism, their excitement had reflected high individual expectations and worth for those kids.

These teachers were asked what it was like during the first few weeks of the term. “For some reason, our methods weren’t working,” they replied. “So we had to change our methods.” The information showed that the kids were bright. If things weren’t working well, they figured it had to be the teaching methods. So they worked on methods. They were proactive; they worked in their Circle of Influence. Apparent learner disability was nothing more or less than teacher inflexibility.

I tried to find the source for this story, and failed. But what I did find was a similar concept called the Pygmalion effect, and assumed that Covey’s story was a mangled or somewhat made-up telling of that research.

What is the Pygmalion effect? It has appeared in my blog feed twice in the past two weeks. Here’s a slice from the first, by Shane Parrish at Farnam Street, describing the effect and the most famous study in the area:

The Pygmalion effect is a psychological phenomenon wherein high expectations lead to improved performance in a given area. Its name comes from the story of Pygmalion, a mythical Greek sculptor. Pygmalion carved a statue of a woman and then became enamored with it. Unable to love a human, Pygmalion appealed to Aphrodite, the goddess of love. She took pity and brought the statue to life. The couple married and went on to have a daughter, Paphos.

Research by Robert Rosenthal and Lenore Jacobson examined the influence of teachers’ expectations on students’ performance. Their subsequent paper is one of the most cited and discussed psychological studies ever conducted.

Rosenthal and Jacobson began by testing the IQ of elementary school students. Teachers were told that the IQ test showed around one-fifth of their students to be unusually intelligent. For ethical reasons, they did not label an alternate group as unintelligent and instead used unlabeled classmates as the control group. It will doubtless come as no surprise that the “gifted” students were chosen at random. They should not have had a significant statistical advantage over their peers. As the study period ended, all students had their IQs retested. Both groups showed an improvement. Yet those who were described as intelligent experienced much greater gains in their IQ points. Rosenthal and Jacobson attributed this result to the Pygmalion effect. Teachers paid more attention to “gifted” students, offering more support and encouragement than they would otherwise. Picked at random, those children ended up excelling. Sadly, no follow-up studies were ever conducted, so we do not know the long-term impact on the children involved.

The increases in IQ were 8 IQ points for the control group, and 12 points for those who were “growth spurters”. (The papers describing the study – from 1966 (pdf) and 1968 (pdf) – are somewhat thin on the experimental methodology, but it seems the description used in the study was “growth spurters” or high scorers in a “test for intellectual blooming”).

I always took the Pygmalion effect with a grain of salt. Most educational interventions have little to zero effect – particularly over the long-run – even when they involve far more than giving a label.

As it turns out, the story is not as clean as Parrish and others typically tell it. There have been battles over the Pygmalion effect since the original paper, with failed replications, duelling meta-analyses and debates about what the Pygmalion effect actually is.

Bob C-J discusses this at The Introduction to the New Statistics (HT: Slate Star Codex – the second appearance of the Pygmalion effect in my feed). Here is a cut of Bob C-J’s summary of these battles:

The original study was shrewdly popularized and had an enormous impact on policy well before sufficient data had been collected to demonstrate it is a reliable and robust result.

Critics raged about poor measurement, flexible statistical analysis, and cherry-picking of data.

That criticism was shrugged off.

Replications were conducted.

The point of replication studies was disputed.

Direct replications that showed no effect were discounted for a variety of post-hoc reasons.

Any shred of remotely supportive evidence was claimed as a supportive replication.  This stretched the Pygmalion effect from something specific (an impact on actual IQ) to basically any type of expectancy effect in any situation…. which makes it trivially true but not really what was originally claimed.  Rosenthal didn’t seem to notice or mind as he elided the details with constant promotion of the effect. …

Multiple rounds of meta-analysis were conducted to try to ferret out the real effect; though these were always contested by those on opposing sides of this issue.  …

Even though the best evidence suggests that expectation effects are small and cannot impact IQ directly, the Pygmalion Effect continues to be taught and cited uncritically.  The criticisms and failed replications are largely forgotten.

The truth seems to be that there *are* expectancy effects–but:

  • that there are important boundary conditions (like not producing real effects on IQ)
  • they are often small
  • and there are important moderators (Jussim & Harber, 2005).

The Jussim and Harber paper (pdf) Bob C-J references provides a great discussion of the controversy. (Bob C-J also recommends a book by Jussim). Here’s a section of the abstract:

This article shows that 35 years of empirical research on teacher expectations justifies the following conclusions: (a) Self-fulfilling prophecies in the classroom do occur, but these effects are typically small, they do not accumulate greatly across perceivers or over time, and they may be more likely to dissipate than accumulate; (b) powerful self-fulfilling prophecies may selectively occur among students from stigmatized social groups; (c) whether self-fulfilling prophecies affect intelligence, and whether they in general do more harm than good, remains unclear, and (d) teacher expectations may predict student outcomes more because these expectations are accurate than because they are self-fulfilling.

That paper contains some amusing facts about the original Rosenthal and Jacobson study. Some students had pre-test IQ scores near zero, others near 200, yet “the children were neither vegetables nor geniuses.” Exclude scores outside of the range 60 to 160, and the effect disappears. Five of the “bloomers” had increases of over 90 IQ points. Again, exclude these five and the effect disappears. The original study is basically worthless. While there is something to the effect of teacher expectations on students, the gap between the story telling and reality is rather large.

Bankers are more honest than the rest of us

Well, probably not. But that’s one interpretation you could take from a the oft-quoted and cited Nature paper by Cohn and colleagues Business culture and dishonesty in the banking industry. That bankers are more honest is as plausible as the interpretation of the experiment provided by the authors.

As background to this paper, here’s an extract from the abstract:

[W]e show that employees of a large, international bank behave, on average, honestly in a control condition. However, when their professional identity as bank employees is rendered salient, a significant proportion of them become dishonest. … Our results thus suggest that the prevailing business culture in the banking industry weakens and undermines the honesty norm, implying that measures to re-establish an honest culture are very important.

I’ve known of this paper since it was first published (plenty of media and tweets), but have always placed it in the basket of likely not true and unlikely to be replicated. Show me some pre-registered replications and I would pay attention. As a result, I didn’t investigate any further.

But recently Koen Smets pointed me toward a working paper from Jean-Michel Hupé that critiqued the statistical analysis. That paper in turn pointed to a critique by Vranka and Houdek, Many faces of bankers’ identity: how (not) to study dishonesty.

These critiques caused me to go back to the Nature paper – and importantly, to the supplementary materials – and read it in detail. It has a host of problems besides being unlikely to replicate. The most interesting of these could lead us to ask whether bankers are actually more honest.

The experiment

Cohn and friends recruited 128 bank employees and randomly split them into two groups, the treatment and control. Before undertaking the experimental task, the treatment group was “primed” with a series of questions that reminded them that they were a bank employee (e.g. At which bank are you presently employed?). The control group were asked questions unrelated to their professional identity.

The experimenters then asked each member of these two groups to flip a coin 10 times, reporting the result via a computer. No-one else could see what they had flipped. For each flip that came up the right way, the experimenters paid them (approximately) $20 (or more precisely, they would be paid $20 per flip if they equalled or outperformed a randomly selected colleague). Ten correct flips and you could have $200 coming your way.

So how can we know if any particular person is telling the truth? You can’t. But across a decent sized group, you know the distribution of results that you would expect (a binomial distribution with a mean of 0.5). You would expect, on average, 50% heads and 50% tails. Someone getting 10 heads is a 1 in a thousand event. By comparing the distribution of the results to what you would expect, you can infer the level of cheating.

So, how did the bankers go? In the control group, 51.6% of coin flips were successful. It’s slightly more than 50%, but within the realms of chance for a group of honest coin flippers. The bankers primed with their professional identity reported 58.2% successful flips, 6.6 percentage points more than the control group. The dishonest bandits.

But how do we know that this result is particular to bankers? What if we primed other professionals with their profession? What if we took a group with no connection to the banking industry and primed them with banking concepts?

Cohn and friends answered these questions directly. When they primed a group of non-banking professionals with their professional identity, they reported 3 percentage points fewer successful coin flips than those in a control condition. Students primed with banking concepts also reported fewer successes, around 1.5%. These differences weren’t statistically significant and could have happened by chance, with no detectable effect from the primes.

These experimental outcomes are the centrepiece behind the conclusion that the prevailing culture in banking weakens and undermines the honesty norm.

But now let’s go to the supplementary materials and learn a bit more about these non-banking professionals and students.

An alternative interpretation

I have only reported the differences in successful coin flips above – as did the authors in the main paper (in a chart, Figure 3a). So how many successes did these non-banking professionals and students have?

In the control condition, the non-banking professionals reported 59.8% successful flips. This dropped to 55.8% when primed with their professional identity. The students were also dishonest bandits, reporting 57.9% successful flips in the control condition, and 56.4% in the banking prime condition.

So looking across the three groups (bankers, non-banking professionals and students), the only honest group we have come across are the bankers in the control condition.

This raises the question of what the appropriate reference point for this analysis is. Should we be asking if banking primes induce banker dishonesty? Or should we be asking whether the control primes – which were designed to be innocuous – can induce honesty? To accept that the banking prime induces bankers to cheat more, we also need to have a starting point that bankers, on the whole, cheat less.

I don’t see a great deal of value in trying to interpret this result and determine which of these frames are correct, as the result is just noise. It is unlikely to replicate. But once you look at these numbers, the interpretation by Cohn and friends appears little more than an overly keen attempt to get the results to fit their “theoretical framework”.

Other problems

I’ve just picked my favourite problem, but the two critiques I linked above argue that there are others. Vranka and Houdek suggest that there are many other ways to interpret the results. I agree with that overarching premise, but am less convinced by some of their suggested alternatives, such as the presence of stereotype or money primes. Those primes seem as robust as this banking prime is likely to be.

Hupé critiques the statistical approach, with which I also have some sympathy, but I haven’t spent enough time thinking about it to agree with his suggested alternative approach.

A quick afterthought

That this experimental result is bunk is not a reason to dismiss the idea that banking culture is poor or that exposure to that culture increases dishonesty. The general problem with the priming literature is that it attempts to elicit differences through primes that are insignificant relative to the actual environments people face.

For example, there is a large difference between answering a few questions about banking and working in a bank. In the latter, you are surrounded by other people, interacting with them daily, seeing what they do. Just because a few questions do not produce an effect doesn’t mean that months of exposure to a your work environment won’t change behaviour. Unfortunately, experiments such as this add approximately zero useful information as to whether this is actually the case.


Daniel Kahneman has a new book in the pipeline called Noise. It is to be co-authored with Cass Sunstein and Olivier Sibony, and will focus on the “chance variability in human judgment”, the “noise” of the book’s title.

I hope the book is more Kahneman than Sunstein. For all Thinking, Fast and Slow’s faults, it is a great book. You can see the thought that went into constructing it.

Sunstein’s recent books feel like research papers pulled together by a university student – which might not be too far from the truth given the fleet of research assistants at Sunstein’s command. Part of the flatness of Sunstein’s books might also come from his writing pace – he writes more than a book a year. (I count over 30 on his Wikipedia page since 2000, and 10 in the last five years.) Hopefully Kahneman will slow things down, although with a planned publication date of 2020, Noise will be a shorter project than Thinking, Fast and Slow.

What is noise?

Kahneman has already written about noise, most prominently with three colleagues in Harvard Business Review. In that article they set out the case for examining noise in decision-making and how to address it.

Part of that article is spent distinguishing noise from bias. Your bathroom scale is biased if it always reads four kilograms too heavy. If it gives you a different reading each time you get on the scale, it is noisy. Decisions can be noisy, biased, or both. A biased but low noise decision will always be wrong. A biased but high noise decision will be all over the shop but might occasionally get lucky.

One piece of evidence for noise in decision-making is the degree to which people will contradict their own prior judgments. Pathologists assessing biopsy results had a correlation of 0.63 with their own judgment of severity when shown the same case twice (the HBR article states 0.61, but I read the referenced article as stating 0.63). Software programmers differed by a median of 71% in the estimates for the same project, with a correlation of 0.7 between their first and second effort. The lack of consistency in decision-making only grows once you start looking across people.

I find the concept of noise a useful way of thinking about decision-making. One of the main reasons why simple algorithms are typically superior to human decision makers is not because of bias or systematic errors by the humans, but rather the inconsistency of human judgment. We are often all over the place.

Noise is also a good way of identifying those domains where arguments about the power of human intuition and decision-making (which I often make) fall down. Simple heuristics can make us smart. Developed in the right circumstances, naturalistic decision-making can lead to good decisions. But where human decisions are inconsistent, or noisy, it is often unchallenging to identify better alternatives.

Measuring noise

One useful feature of noise is that you can measure it without knowing the correct or best decision. If you don’t know your weight, it is hard to tell if the scale is biased. But the fact it differs in measurement as you get on, off, and on again points to the noise. If you have a decision for which there is a large lag before you know if it was the right one, this lag is an obstacle to measuring bias, but not for noise.

This ability to measure noise without knowing the right answer also avoids many of the debates about whether the human decisions are actually biased. Two inconsistent decisions cannot both be right.

You can measure noise in an organisation’s decision-making processes by examining pairs of decision makers and calculating the relative deviation of their judgments from each other. If one decision maker recommends, say, a price of $200, and the other of $400, the noise is 66%. (They were $200 apart, with the average of the two being $300. 200/300=0.66). You average this noise score across all possible pairs to give you the noise score for that decision.

The noise score has an intuitive meaning. It is the expected relative difference if you picked any two decision makers at random.

In the HBR article, Kahneman and colleagues report on the noise measurements for ten decisions in two financial services organisations. The noise was between 34% to 62% for the six decisions in organisation A, with an average noise of 48%. Noise was between 46% and 70% for the four decisions in organisation B, with an average noise of 60%. This was substantially above the organisations’ expectations. Experience of the decision makers did not appear to reduce noise.

Reducing noise

The main solution proposed by Kahneman and friends to reduce noise is replacing human judgement with algorithms. By returning the same decision every time, the algorithms are noise free.

Rather than suggesting a complex algorithm, Kahneman and friends propose what they call a “reasoned rule”. Here are the five steps in developing a reasoned rule, with loan application assessment an example:

  1. Select six to eight variables that are distinct and obviously related to the predicted outcome. Assets and revenues (weighted positively) and liabilities (weighted negatively) would surely be included, along with a few other features of loan applications.
  2. Take the data from your set of cases (all the loan applications from the past year) and compute the mean and standard deviation of each variable in that set.
  3. For every case in the set, compute a “standard score” for each variable: the difference between the value in the case and the mean of the whole set, divided by the standard deviation. With standard scores, all variables are expressed on the same scale and can be compared and averaged.
  4. Compute a “summary score” for each case―the average of its variables’ standard scores. This is the output of the reasoned rule. The same formula will be used for new cases, using the mean and standard deviation of the original set and updating periodically.
  5. Order the cases in the set from high to low summary scores, and determine the appropriate actions for different ranges of scores. With loan applications, for instance, the actions might be “the top 10% of applicants will receive a discount” and “the bottom 30% will be turned down.”

The reliability of this reasoned rule – it returns the same outcome every time – gives it a large advantage over the human.

I suspect that most lenders are already using more sophisticated models than this, but the strength of a simple approach was shown in Robyn Dawes’s classic article The Robust Beauty of Improper Linear Models in Decision Making (ungated pdf). You typically don’t need a “proper” linear model, such as that produced by regression, to outperform human judgement.

As a bonus, improper linear models, as they are less prone to overfitting, often perform well compared to proper models (as per Simple Heuristics That Make Us Smart). Fear of the expense of developing a complex algorithm is not an excuse to leave the human decisions alone.

Ultimately the development of the reasoned rule cannot avoid the question of what the right answer to the problem is. It will take time to determine definitively whether it outperforms. But if the human decision is noisy, there is an excellent chance that it will hit closer to the mark, on average, that the scattered human decisions.

Behavioural economics: underrated or overrated?

Tyler Cowen’s Conversations with Tyler feature a section in which Cowen throws a series of ideas at the guest, and the guest responds with whether each idea is overrated or underrated. In a few of the conversations, Cowen asks about behavioural economics. Here are three responses:

Atul Gawande

COWEN: The idea of nudge.

GAWANDE: I think overrated.


GAWANDE: I think that there are important insights in nudge units and in that research capacity, but when you step back and say, “What are the biggest problems in clinical behavior and delivery of healthcare?” the nudges are focused on small solutions that have not demonstrated capacity for major scale.

The kind of nudge capability is something we’ve built into the stuff we’ve done, whether it’s checklists or coaching, but it’s been only one. We’ve had to add other tools. You could not get to massive reductions in deaths in surgery or childbirth or massive improvements in end-of-life outcomes based on just those behavioral science insights alone. We’ve had to move to organizational insights and to piece together multiple kinds of layers of understanding in order to drive high-volume change in healthcare delivery.


Steven Pinker

COWEN: Behavioral economics. Economists playing at psychology. Obviously you have a stronger background in psychology than the economists. What do you think of behavioral econ?

PINKER: I’m for it.

COWEN: What’s it missing?

PINKER: I’m completely out of my depth here, but I do think it is too quick to dismiss classical economics. Is this maybe another false dichotomy?

The idea that the rational actor and models derived from it are obsolete because humans make certain irrational choices, have certain rules of thumb that can’t be normatively defended — those aren’t necessarily incompatible, because even though every individual human brain might have its quirks and be irrational, it is possible for a collective enterprise that works by certain rules to have a kind of rationality that none of the individual minds has.

Also it’s possible because we’re corrigible, because the mind is many parts. We can override some of our biases and instincts either though confrontations with reality, through education, through debate.

We do know even that people who are experienced in market transactions, for example, don’t fall for the kinds of fallacies that behavioral economists are so fond of pointing out. You really can’t turn a person into a money pump, even though in the lab I can set up a demo that shows people can be intransitive in their preferences.

You actually put a person in a situation where there’s real money at stake, and all of a sudden they’re not so irrational.

COWEN: They walk away.


Jonathan Haidt

COWEN: You’re a trained psychologist, in addition to your most famous work, you have a lot of other papers which are very well cited, but less famous for other public intellectuals doing what you’d call traditional psychological research. Here we have these economists, they do what they call behavioral economics, and they tread into the field of psychology, do they know what they’re doing? Behavioral economics, underrated or overrated?

HAIDT: Properly rated right now, with one caveat. We psychologists have long felt, “Oh those economists they’re the only ones that are ever consulted in Congress, and they have all these high‑prestige jobs, they have a Nobel Prize, nobody listens to us.”

Some economists beginning with Robert Frank, and Dan Kahneman, Dick Thaler, the fact that economists have been listening to psychologists, and making our work more well‑known, of course Kahneman did a lot of that work, and he is a psychologist.

That’s all good, I’m thrilled with the way that’s going. The only caveat that I would put which I would say if they don’t do this soon, then they would be overrated, is the behavioral economics work is an example of this wonderful dictum from Robert Zion, the famous social psychologist, which is that cognitive psychology is social psychology with all the interesting variables set to zero.

To the extent that behavioral economists are saying, “Look at a person shopping, what influences their decision? If the apple is at eye‑level — .” They’re looking at lone consumers who are trying to make choices to optimize their outcomes. That’s great work, but that’s setting all the interesting variables to zero. The interesting stuff is all social. It’s what does this say about me? Will I be ostracized from my group?

If behavioral economics becomes more social, which I think will be the next phase, then I would say it would deserve ever‑rising market value.

COWEN: Thorstein Veblen, that was his initial vision for it actually, was that it be quite social and that the idea of a social reference class was central to people’s behavioral biases.

HAIDT: Interesting. Again, this is a critique from outside, but what a lot of people say which sounds right to me is that the early economists were great social theorists. My God, you read Adam Smith, what a brilliant world philosopher, historian, they thought so broadly and you tell me, but it seems there was a weird turn in the mid‑20th century towards mathematics.


HAIDT: I think it made economists set all the interesting variables to zero.


These three conversations are worth reading or listening to in full. The episodes with Malcolm Gladwell and Joe Henrich are also excellent.

My blogroll

After my recent post on how I focus, I received a couple of requests for the blogs I follow. Here are my current subscriptions in Feedly, with occasional comments.

Some of these blogs have been in my reader for years, others I am trialling. I am usually trialling a few at any time, and tend to have a “one in, one out” pattern of subscription. It normally takes me about 10 minutes once every day or two to scan the new entries and decide which are worth reading. This set of blogs generates more posts for my read later pile than I can get through.

Askblog (Arnold Kling has been one of my main influences in thinking about causation in social science and economics )

Behavioral Public Policy Blog

Behavioral Scientist (For which I am a founding columnist. You can find my contributions here.)

Behavioural Insights Team

The BE Hub

Bryan Caplan at Econlog (too much politics in the other Econlog bloggers for my taste)

Cal Newport (Author of Deep Work, for which I will I will post a review at some point. My review of So Good They Can’t Ignore You is here.)

Centre for Advanced Hindsight

Decision Science News

Dominic Cumming’s blog

The Enlightened Economist (For the book recommendations)


Ergodicity Economics (Started subscribing after seeing the video posted at the bottom of this post)

Farnam Street

Fresh Economic Thinking

Gene Expression (I’m subscribed to the full Razib Khan firehose, but am there for the gnxp material)


Information Processing (Keeps me on top of the latest on genomic prediction)

Jason Collins blog (As a check that my feed is working)

John Kay (Most of my day job is in financial services and markets)

Marginal Revolution

Matt Ridley

Megan McArdle

Offsetting Behaviour

O’Reilly Media

Slate Star Codex

Statistical Modelling, Causal Inference, and Social Science (Andrew Gelman’s blog. In terms of what I have learnt, the most valuable blog on the list)

Tim Harford

Thorstein Veblen’s The Theory of the Leisure Class

In 2011, Thorstein Veblen was ranked seventh in a poll of economists on their favourite, dead, 20th century economist. He ranked behind Keynes, Friedman, Samuelson, Hayek, Schumpeter and Galbraith. His supporters were among the least liberal (in the classical sense of the word) of the survey participants. Given his approach to consumerism, as detailed in The Theory of the Leisure Class, this is no surprise.

The Theory of the Leisure Class, published in 1899, was one of the earliest books to explore the economic assumption that people wish to consume. Veblen noted this was not purely a desire to consume in itself. People also care about status, reputation and honour. They care about their relative position to others, such as their relative wealth. And consumption provides a means of establishing this relative position.

Conspicuous leisure and consumption

To turn wealth into status and reputation, you needs to signal your wealth. Veblen explored two possible signals, conspicuous leisure and conspicuous consumption, with Veblen’s coining of the latter term his best known claim to fame. Veblen has a relatively modern take on these two concepts, recognising the need for waste. Signalling theory tells us that waste required for a signal to be reliable.

When there are few goods for conspicuous consumption, as would be the case in primitive societies, conspicuous leisure is a more accessible way to signal wealth. Conspicuous leisure might involve reaching a level of manners and etiquette that could only be achieved through an excessive use of time, or becoming proficient at sports. Veblen also considers what he calls “vicarious conspicuous leisure”, whereby the head of the house employs servants (or even the housewife) in exercises that waste time.

As society advances, people move from conspicuous leisure to conspicuous consumption. They have an increasingly large circle of people with whom they associate and wish to signal status to. In a small village, everyone is familiar with each other and will note the habits of the servants and other householders carrying out the conspicuous leisure. In a larger city, the conspicuous waste needs to be visible, and conspicuous consumption in the nature of watches, clothing and carriages is immediately obvious. Conspicuous consumption can also be vicarious, with servants dressed up in excessive livery.

Veblen considered that conspicuous consumption will consume all future growth in production and efficiency. He states:

The need of conspicuous waste, therefore, stands ready to absorb any increase in the community’s industrial efficiency or output of goods, after the most elementary physical wants have been provided for.

Veblen also suggests that the use of additional production for conspicuous consumption acts as a Malthusian check on fertility. If signals are wasteful, then some of these resources will not be available for increasing the number of offspring. However, to be evolutionary stable, any reduction in conspicuous consumption by an individual would have them suffer a cost in the form of reputation and status, and in turn, mating opportunities.

One of Veblen’s interesting perspectives is that costliness masquerades under the name of beauty. Veblen states that “beauty, in the naive sense of the word, is the occasion rather than the ground of their monopolization or of their commercial value.” The marks of expensiveness, rarity and exclusivity become known as beauty.

This leads to imperfections in goods, which are evidence of being hand and not machine-made, becoming signs of beauty. Counterfeits lose their beauty on being identified as such. Or each year the fashion changes, which is wasteful – and people prefer the more recent fashions to the older ones.

Veblen applies this concept to beauty in women, with tastes shifting from “women of physical presence” to a “lady”, as conspicuous consumption and leisure grew. The less suited a woman is for work, the more waste, and the more beautiful she would be perceived.

The evolution of the leisure class

Veblen follows his discussion of beauty with a series of evolutionary arguments on the nature of the leisure class. The “leisure class” is an unproductive upper class, and contrasts with the “industrial class”, a subordinated but productive working class. Veblen’s line of argument is often difficult to follow, with the boundary between social and genetic selection unclear. His underlying agenda, a critique of the leisure class, also clouds his arguments.

Veblen argues that the selection of institutions affects the selection of people within society. Institutions change fast, so although only the fittest habits of thought will normally survive, the selection of people cannot keep up. Further, changes which may be good for society as a whole may be bad for certain people. Veblen’s discussion provides a nice picture of a dynamic environment and selection pressures that vary with it.

Despite this dynamism, society is slow to change and conservative. Veblen argues that the leisure class is able to keep society conservative through withdrawing the means of sustenance to the industrial class. As a result, the industrial class does not have the resources to invest in new ideas and habits. Even if they did gain some surplus, that would be wasted on the conspicuous consumption that the leisure class has established as the societal norm.

On an individual level, Veblen considers there are two basic types of people – predatory and peaceful. Predatory types are violent (in certain stages of society), selfish and dishonest, and are not diligent. Peaceful types are the opposite. Which traits are expressed will depend on the state of society. For Veblen, the spectrum of predatory to peaceful roughly coincides with the spectrum of blonde through brunette to Mediterranean ethnicities.

Veblen suggests that society progressed from a peaceful, native state, to a barbarian state, before shifting back towards the more peaceful modern society. Peaceful traits were selected for in the native state, and predatory traits selected for in the barbarian states. Veblen states, however, that selection did not eliminate all the peaceful traits in the barbarian era, allowing peaceful traits to be present in modern society.

As to how these traits are distributed at his time of writing, Veblen sees the leisure class as the predatory type and the industrial class of the peaceful type. The leisure class is not able to be violent in modern society, so they use more “peaceful predatory” methods, such as fraud. The industrial class is not in need of predatory habits, with Veblen suggesting that “economic man” in the sense of the selfish person (an indirect slight on Adam Smith) is useless for modern society. It is by being diligent and honest that the industrial man thrives.

Veblen’s shot at “economic man” is not particularly effective, and does not recognise that selfishness is required, in an evolutionary sense, for all people. The reason industrial man is diligent is because that is how he benefits. If he did not benefit, he would be selected against and disappear. That society benefits is the operation of Smith’s invisible hand.

Despite his categorisation of types between classes, Veblen later suggests that there are no broad character differences between the leisure class and the rest. Some predatory behaviour persists in the industrial class due to the behaviour of the leisure class. He also notes that people in the leisure class, by virtue of their resources, are not subject to harsh selection pressure, so peaceful characteristics can persist. What is most determinative of the traits in the leisure class are those traits which lead to admission to the class. While these have changed over time (say, from raw violence to fraud), they are generally of a predatory nature. It is not easy to gel this position of no difference with his earlier statements, and I am not sure they can be reconciled. My one suggestion is that the differences will grow if the current institutional framework continues to exist.

Put together, Veblen’s use of evolutionary theory is a strange mix of group selection and broad statements on inherent traits. There is little detailed consideration of the selection process that might have occurred. If nothing else, it appears that Veblen simply wanted to critique the leisure class and would use whatever tools were at his disposal. Through his evolutionary discussion, Veblen also manages to avoid addressing the basis for the desire for reputation and status.

Sport, religion and education

The rest of the book largely involves Veblen applying his framework to sport, religion and education.

Sports reflect the predatory skills of the leisure class and delinquents. Veblen disagreed with the common view that sports build temperament, and instead they involve chicanery, falsehood and browbeating. That is why we need umpires. For the industrial classes, Veblen felt that sport is more a diversion than a habit, although the role of sport for the industrial class seems somewhat different today.

Veblen considered that the temperament that inclines one to sport inclines one to religion (and vice versa). Religion, and the conspicuous leisure and consumption associated with it, change the patterns of consumption in the community and lowers its vitality. As an example, Veblen referred to the religious Southern United States. He considered that their industry was more handicraft than industrial. Their range of habits, such as duels, cock-fighting and male sexual incontinence (shown by the presence of mulattoes) were evidence of barbarian traits.

On education, Veblen saw the alignment of education institutions with sport and religion as evidence of education’s status as a leisure class activity. Higher education has many rituals and ceremonies and encourages proper speech and spelling (conspicuous leisure), while lower schools tend to more practical. The teaching of the classics and dead languages were, in particular, conspicuous consumption.

One interesting sideline is Veblen’s view on how industrialisation has affected the status of women. Industrialisation allows women to revert to a more primitive type (Veblen’s primitive type being peaceful and industrial). The leisure class, however, needs to keep women in their place to engage in vicarious conspicuous leisure (they are, after all, a signal for the man). As a result, when educational institutions finally began to admit women, they were primarily enrolled in courses with a quasi-artistic quality, which help women in performing vicarious conspicuous leisure.

[This post is a combined and edited version of three previous posts exploring the book. Those old posts are here, here and here.]

How I focus (and live)

This post is a record of some strategies that I use to focus and be mildly productive. It also records a few other features of my lifestyle.

Why develop these strategies? On top of delivering in my day job, I have always tried to invest heavily in my human capital, and that takes a degree of focus.

The need to adopt many of the below also reflects how easily distracted I am. I have horrible habits when I get in front of a device. The advent of the web has been a mixed blessing for me.

My approaches can shift markedly over time, so it will be interesting to see which of the below are still reflected in my behaviour in a couple of years (and which continue to be supported by the evidence as effective).

If there is a common theme to the below, it is that creating the right environment, not reliance on willpower, is the path to success.

Periods of focus: Most of my productive output occurs in two places. One is on the train, with an hour commute at the beginning and end of each day that I travel to work. The only activities I do on the train are reading (books or articles) and writing. Internet is turned off. This is now an ingrained habit. The train is largely empty for most of the journey, with half through a national park, so it’s a pleasant way to work.

The rest of my output occurs in productive blocks (pomodoros) during the day. At the beginning of each day I schedule a set of half-hour blocks in my diary around my other commitments. In these blocks, I will turn off or close everything I don’t need for the task. I am typically less successful at putting up barriers to human (as opposed to digital) interruptions, except for occasionally closing my office door.

Ideally I will have several blocks in a row (in the morning), with a couple of minutes to stretch in between. I aim for at least 20 half-hour sessions each week. I average maybe 30. I block out the occasional morning in my diary to make sure each week is not completely filled with meetings (with eight direct reports and working in a bureaucracy, that is a real risk).

I also read whenever I can, and that fills a lot of the other space in my life. I read around 100 books per year (about 70-80 non-fiction).

Phone: My iPhone is used for four main purposes: as a phone; as a train timetable; as a listening device (podcasts, audiobooks and music); and for my meditation apps (more on meditation below). It also has a few utilities such as Uber that I rarely use. I don’t use my phone for social media, as a diary, or for email. Most of the day it stays in my pocket or on my desk. All notifications, except calls and text messages, are turned off. I rarely have any reason to look at it.

Even when I do look at my phone, the view is sparse. These are the two screens I see.

One thing you can’t see in these screenshots (for some strange technical reason) is that my phone is in grey scale. There is little colour to get me excited (although I am colour blind….). Except when I make a phone call, message someone, or (loosely) lock the phone with Forest, I use search to find the app. They are hidden in the Dump folder. When I go to my phone, there is little to divert me from my original intention.

iPad: I have an iPad, and it is similarly constrained. All notifications are turned off. It has email, but the account is turned off in settings, with account changes restricted. It takes me about a minute to disable restrictions to turn email on, which slows me down enough to make sure I am checking it for a reason. More on email below.

I also use the iPad for reading and writing (including these posts) on the train. When reading, I use my Kindle in preference to my iPad when I can, as the Kindle has far fewer rabbit holes.

Internet: I subscribe to Freedom which cuts off internet for certain apps and certain times. Among other things, I use it to block the internet from 8pm through to 7am (I don’t want to be checking email or browsing when I first get up), and on Sundays (generally a screen free day). I also use Freedom to shut off internet or certain apps at ad hoc times when I want to focus.

I try not to randomly browse at other times. I have little interest in news (see below), so that reduces the probability of messing around. I have previously used RescueTime to track my time online, but don’t currently as I can’t install it on my work computer, phone or iPad. The tracking had a subtle but limited effect on my behaviour on my home computer when I tried it.

Email: Currently my biggest failure, particularly when I am in the office. I aim to batch my email to a few times per day, but I check and am distracted by new emails more often than I would like. Partly that is because part of my workflow occurs through email, so it is hard not to look.

Social media: I have a Facebook account, but zero friends, so it provides little distraction. (I also like that when I run into people who I haven’t seen for a while, I don’t already know what they have been up to.) I only have the account because this blog has a Facebook page. I try to limit my visits to Twitter and LinkedIn to once a week (normally successful with Twitter, less so with LinkedIn as direct messages sometimes draw me in). Freedom helps constrain this.

Paper diary: My paper diary is an attempt to keep myself away from distracting devices. I also find it faster than the electronic alternative. I have an electronic calendar for work, but it is replicated in the paper diary.

News: I consume little news. I don’t have a television, don’t purchase newspapers and don’t visit internet news sites unless I follow a link based on a recommendation. I rarely miss anything important. If something big happens, someone will normally tell me.

I used to apply a filter to political news of “if this was happening in Canada, would I care?” That eliminated most political news, but I have found that after a few years, I have become so disconnected from Australian politics that most of it flows around me. I don’t recognise most politicians, and I feel unconnected to any of the personalities. Voting is compulsory in Australia, so to avoid being fined or voting for people I know nothing about, I get my name ticked off the electoral roll at a polling place, take the voting slip, but don’t bother filling it out. (And I have almost no idea what Trump is up to.)

I am in a similar place for sports news. Now that I have been disconnected for a while, I have no interest. Any names I overhear mean nothing to me. I couldn’t tell you who won any of the tennis grand slams last year or who the World Series champion is. I don’t think I could recognise a current Australian cricketer on sight.

Blogs: In substitute to going to any news sources, I subscribe to around 25 blogs using a feed reader (Feedly). I scan them around once a day. They provide more reading material than I can get through (through the posts themselves or links), so I have a backlog of reading material in Instapaper (I used to use Pocket, but dumped it when the ads appeared).

Sleep and rest: The evidence on the effect of lack of sleep is strong. I need eight hours a night and generally get it (children permitting). I don’t use screens (except for the Kindle) after 8pm at night. I also subscribe to the broader need for rest and the declining productivity that comes from overwork.

Meditation: Meditation is new for me (around four months), and I am still in the experimental phase. I meditate for around 15 to 20 minutes every day. I find it puts me on the right track at the start of the day (which is when I meditate, children permitting). It also acts as a daily reminder of what I am trying to do.

The evidence of increased concentration and emotional control seems strong enough to give it a go. I suspect I would have dismissed the idea a few years ago (maybe even a year ago), and pending changes in the evidence in favour and my own experience, I am prepared to dismiss it again in the future.

A benchmark I’d like to be able to compare meditation to is focused reading. If I shifted the meditation time to reading, that’s 15 to 20 additional books a year. What is the balance of costs and benefits?

I use three apps to meditate: Insight Timer, Headspace and 10% Happier. I find 10% Happier most useful as a teacher. Headspace is convenient and easy to use, but I don’t like the gamification element to it, and the packages seem relatively shallow and repetitive (although the repetitive nature is not necessarily a bad thing). At the end of the year when it is time to re-subscribe, I suspect I will drop Headspace and stick with 10% Happier if I am still learning something from it. Insight Timer will otherwise give me what I need.

I will post more on my thoughts on meditation in the near future – likely through a review of Sam Harris’s Waking Up in the first instance, as that was the book that pushed me across the line.

I give myself a 60% chance of still being meditating when I write my next post of what I do to focus (planning to do this roughly annually). My lapsing could be due to either changing my mind or failing to sustain the habit.

Diet: I see diet as closely linked to the ability to focus and be productive. I eat well. My diet might best be described as three parts Paleo, one part early agriculturalist, and 5% rubbish. My diet is mainly fruit (lots), vegetables, tubers, nuts, eggs (a dozen a week), meat, legumes and dairy (a lot of yogurt). I eat grains occasionally, largely in the form of rice (a few times of week) and porridge (once or twice a week). I’ll eat bread maybe once or twice a month (I love hamburgers and eggs on toast). A heuristic I often fall back onto is no processed grains, industrial seed oils or added sugar. There’s some arbitrariness to it, but it works. Stephan Guyenet is my most trusted source on diet.

It’s easy to stick to this diet because this is what is in my house. There are no cookies, ice cream or sugar based snacks. I don’t have to go down the aisles of the supermarket when shopping (although my groceries are normally home delivered). If I want to binge, rice crackers and toast are as exciting as I can find in the cupboard.

Exercise: As for diet, part of the productivity package. My major filter for choosing exercise is the desire to still be able to surf and get off the toilet when I’m 80. I surf a couple of times a week. Living within five minutes walk of a beach with good surf is a basic lifestyle criteria.

I did Crossfit for a few years, but don’t live near a Crossfit gym at the moment. However, I don’t think Crossfit is a sustainable long-term approach – at least if I trained as regularly as expected in the gyms I have been to. The intensity would have me falling apart in old age.

That said, I still keep Crossfit elements to my exercise – heavy compound lifts once or twice a week, and a short high intensity burst around once a week (so I’m in the gym once to twice a week). I also walk a lot, including trying to get out of the office for a decent walk at lunch each day. While walking, I consume a lot of audiobooks and podcasts. I stretch for 10 to 15 minutes most days.

Michael Mauboussin’s More Than You Know: Finding Financial Wisdom in Unconventional Places

Michael Mauboussin’s message in More Than You Know: Finding Financial Wisdom in Unconventional Places is that we need an interdisciplinary toolkit to give us the diversity to make good decisions. This is not diversity in groups, but diversity in thinking. You need diverse cognitive tools to deal with diverse problems.

The book is a series of essays that Mauboussin wrote for a newsletter over a dozen years or so when he was at CSFB. Given his background in investment management, there is a heavy focus on investment decisions. However, the tools he discusses are relevant for most decision-making domains, be that as a manager, parent, employee, or so on.

Mauboussin draws his interdisciplinary tools from four main areas, around which the essays in the book are arranged.

The first set of essays, on investment philosophy, largely concern probabilistic thinking. Focus on the process, not the outcome. If you judge solely on results, you will be deterred from taking the risks necessary to make the right decision.

In this vein, don’t set target prices for shares – an estimate of how you will believe a company will perform. Rather, provide a range of prices with associated probabilities. This allows you to invest knowing the downside probability, and to assess your choices in the knowledge that some decisions will have unfavourable outcomes.

One interesting thread to these essays is what amounts to a defence of Mauboussin’s occupation, investment management. Many people (myself included) see investment management performance as largely the outcome of luck. Mauboussin argues for the presence of skill (in at least some cases), with long streaks requiring (to paraphrase Steven Jay Gould) extraordinary luck imposed on great skill.

One limb of Mauboussin’s argument is the 15 consecutive years of market out-performance by Bill Miller, a fund manager at Legg Mason (where Mauboussin worked at the time the book was published). Getting 15 consecutive heads when tossing a coin is a one in 32,000 proposition. If your coin has only a 44% chance of coming up heads (the average probability of a fund outperforming the market over that stretch), a streak of 15 has a probability of one in 223,000. That number balloons to one in 2.3 million if you take the average probability of a fund beating the market in each individual year (in a couple of years less than 10% of funds beat the market).

Given these odds, Mauboussin argues that it is unlikely that Miller was effectively flipping a coin. Miller’s skills meant the odds were actually less daunting. Yes, he needed luck, but there needed to be skill underneath to realise the streak.

I’m not sure I buy this argument. This 15 year window is only one of many available. There are many funds. (And I have just found this – someone doing the numbers to get the odds of 3 in 4 of a 15 year streak by someone at some time.)

A contrast to the Miller story comes later in the book, when Mauboussin notes the trading success of Victor Niederhoffer in a different light. Niederhoffer averaged 35% per year returns from 1972 to 1996 (says Wikipedia), but this all came crashing down to nothing in 1997. He built another fortune to then lose in the global financial crisis. Mauboussin uses Niederhoffer’s story as an example of the fat tails of asset price movements, a pattern of many small changes, and a small but larger than expected number of large changes. To use Nassim Taleb’s framing, Niederhoffer was picking up pennies in front of a steamroller. (And on that point, Miller’s record since his streak is not so great.)

The second set of essays draws on psychology. This partly draws on the heuristics and biases program of Daniel Kahneman and friends, but Mauboussin ranges over wider territory. He draws in literature on animal behaviour, such as the herding behaviour of ants and the stress response of animals, and on the literature in naturalistic decision-making. He also has a keen appreciation of the fact that many of these decisions occur in systems, meaning that individual decision making flaws don’t necessarily lead to poor aggregate outcomes.

The third set of essays innovation and competition, has a game theory and evolutionary thread. The last set is on complexity, which contains both a warning about seeing cause and effect in complex systems, and a suggestion that some of the work in the complexity field gives a lens to understand the patterns we see.

Some of these essays deserve posts of their own, so I won’t go into any in-depth except to make a general observation. I am a fan of interdisciplinary approaches to problems, but parts of the book, particularly these latter sections, hint at why they aren’t adopted. Many times you get an interesting angle of looking at an issue, but it is not clear what you should do differently.

Partly this is a result of the origins of the book. Each essay is around two thousand words (guessing), so each gives a taste of a topic but little depth. One essay tends not to build on another.

That said, much of the advice is to effectively do nothing. That is valuable advice. If you want to read media accounts about share market moves, recognise that this is entertainment, not information. Disentangling cause and effect in a complex system like the share market is difficult, if not impossible, so stop telling stories.

Mauboussin also warns in the introduction that some ideas may not be useful right away. Some may never be useful. You are building a toolkit for future problems that you haven’t seen yet.

As a closing note, Mauboussin references many other popular science books. Given some of the essays must be 20 years old, I had not heard of most (which might say something for the longevity of popular science books). I’ve added a few to the reading list, but it will be interesting to see how they have held up through time.

Susan Cain’s Quiet: The Power of Introverts in a World That Can’t Stop Talking

I have mixed views about Susan Cain’s Quiet: The Power of Introverts in a World That Can’t Stop Talking.

Cain makes an important point that many of our environments, social structures and workplaces are unsuited to “introverts” (and possibly even humans in general). We could design more productive and inclusive workplaces, schools and organisations if we considered the spectrum of personality types who will work, live and learn in them.

On the flip side, Cain expanded the definition of introversion to include a host of positive attributes that wouldn’t normally (at least by me) be grouped with introversion. This led to a degree of cheer-leading for introverts that was somewhat off-putting (despite my own introverted nature). The last couple of chapters of the book also fall into evidence-free story-telling.

But to the good first. I enjoyed Cain’s filleting of open workplaces. Open plan workplaces or “activity-based working” are often dressed up as a means to seed creativity and collaboration, but they are more accurately described as a shift to lower floor space per employee to save costs. The evidence for increased collaboration or creativity is scant. Innovation may occur in teams, but it also requires quiet.

Cain suggests the trend toward these open workspaces is built on a mis-understanding of some of the classic examples of collaboration associated with the rise of the web. Yes, Linux and Wikipedia were built by teams, not individuals. But these people did not share offices or even countries. Regardless, the collaboration ideal was extended to our physical spaces.

Cain catalogues the research on the poor productivity in open workplaces. I had seen the following research before, but it is a great case study:

… DeMarco and his colleague Timothy Lister devised a study called the Coding War Games. The purpose of the games was to identify the characteristics of the best and worst computer programmers; more than six hundred developers from ninety-two different companies participated. Each designed, coded, and tested a program, working in his normal office space during business hours. Each participant was also assigned a partner from the same company. The partners worked separately, however, without any communication, a feature of the games that turned out to be critical.

When the results came in, they revealed an enormous performance gap. The best outperformed the worst by a 10:1 ratio. The top programmers were also about 2.5 times better than the median. When DeMarco and Lister tried to figure out what accounted for this astonishing range, the factors that you’d think would matter—such as years of experience, salary, even the time spent completing the work—had little correlation to outcome. Programmers with ten years’ experience did no better than those with two years. The half who performed above the median earned less than 10 percent more than the half below—even though they were almost twice as good. The programmers who turned in “zero-defect” work took slightly less, not more, time to complete the exercise than those who made mistakes.

It was a mystery with one intriguing clue: programmers from the same companies performed at more or less the same level, even though they hadn’t worked together. That’s because top performers overwhelmingly worked for companies that gave their workers the most privacy, personal space, control over their physical environments, and freedom from interruption. Sixty-two percent of the best performers said that their workspace was acceptably private, compared to only 19 percent of the worst performers; 76 percent of the worst performers but only 38 percent of the top performers said that people often interrupted them needlessly.

One pillar to the case for quiet spaces comes from how we build expertise. Work by Anders Ericsson, of deliberate practice fame, has identified studying alone or practicing in solitude as the prime way to gain skill. You need to be alone to engage in deliberate practice, as this allows you to go directly to the part that is challenging you. Open workspaces are a poor place to tackle challenging problems.

Cain also includes some interesting material on the extension of this “collaborative” space design to schooling. Children are increasingly schooled in pods as part of a shift to “cooperative learning”. We’re preparing children for the sub-optimal workplaces they are about to enter by replicating that sub-optimal environment in their schools. What is particularly problematic is that there is little opportunity in school to opt out, whereas adults have more opportunity to choose their workplace and shape their environment.

One thread in the book, which features in the opening, is that society is in the thrall of an “extrovert ideal”. Cain argues that we have become more interested in how people perceive us than the content of our character – a shift from a culture of character to one of personality. Self-help guides used to focus on concepts such as citizenship, duty, work, honour, morals, manners and integrity. They now focus on being magnetic, fascinating, attractive and energetic. Being quiet is now a problem.

This is particularly reflected in what we look for in leaders. People who talk more tend to be rated as more intelligent. Good presenters often get ahead. But talking more or presentation skills might be weak indicators of the actual capabilities you want in your leaders.

Cain briefly touches on the genetics of introversion. Unsurprisingly, as for every behavioural trait, introversion is heritable. Around 40% to 50% of the variation in introversion is due to differences in genes. Cain also hints at cross-racial differences in introversion, noting that the waves of emigrants to a new continent would have the more extroverted traits of world travellers.

The least satisfying element to the book was Cain’s definition of introvert. At times, Cain’s definition seemed to expand to capture all that is good. From a typical definition of being reserved, reflective, or interested in one’s own mental self, her definition includes everyone who is thoughtful, cerebral, willing to listen to others, and immune to the pull of wealth and fame. Introverts are needed to save us from climate change. (“Without people like you, we will, quite literally, drown.”) Extroverts, in contrast, are thoughtless risk seekers with no self control. Extroverts caused the global financial crisis.

Cain does note her broad definition of introvert in an appendix to the book, A Note on the Words Introvert and Extrovert. It would have helped me a lot if this note had been at the front (or if I had realised it was there before reading the book). There she clarifies that she is not using the standard definition of introversion captured by the well-established Big 5 taxonomy. She states that she is extending introversion to include people with “a cerebral nature, a rich inner life, a strong conscience, some degree of anxiety (especially shyness), and a risk-averse nature”.

These traits would normally be considered to relate to the other Big 5 traits of openness, conscientiousness and neuroticism. This is particularly confusing, as in parts of the book she talks about the other big 5 traits as separate concepts. Cain’s definition also appears broader than that used by Carl Jung and in the Myer-Briggs test, which seem to be her foundation. (Although never explicitly endorsed, I get the feeling that Cain is a Myer-Briggs advocate.)

Once the definition is expanded to include these other dimensions, it is hard to see how one third of the population can be described as introverts. It also means that many parts of the book feel more an ode to conscientiousness, and possibly even intelligence, than to introversion.

This was most stark in the chapter on the differences between Asians and Americans. Cain attributes Asian achievement – such as high scores in international tests and their superior academic results – to the higher introversion of Asians. There is not one mention of the higher conscientiousness of East Asians, nor their higher IQ scores. Instead these seem bundled into the introvert basket of traits.

I also struggled with the final two substantive chapters of the book – on relationships and children. There Cain shifts from an approach generally built on research to one that is little more than storytelling. The chapters are full of unsourced statements or recommendations. For instance, she recommends that you gradually introduce your kids to new situations. This supposedly produces more confident kids than the alternatives of overprotection or pushing too hard, contrasting somewhat with the established literature on the lack of effect of parents.

*Disclosure of interest: Here are the percentiles for the last time I did a Big 5 test. I’m not far from Cain’s introvert ideal (possibly a touch low on neuroticism):

Openness: 88
Conscientiousness: 80
Extroversion: 29
Agreeableness: 51
Neuroticism: 48

Cass Sunstein and Reid Hastie’s Wiser: Getting Beyond Groupthink to Make Groups Smarter

Cass Sunstein and Reid Hastie’s Wiser: Getting Beyond Groupthink to Make Groups Smarter is not an exciting read. However, it is a good catalogue of group decision-making research (leading to this post to also be somewhat of a catalogue) and worth reading for an overview.

The book’s theme is that group decisions are often better than individual decisions, but that groups have weaknesses that can impair outcomes. Much of the analysis of failures in group decision-making follows a similar theme to the research into individual judgement and decision-making, in that the research has generated a long list of “biases” that groups are subject to. Most of the book, however, focuses on getting better decisions, and a lot of these (thankfully) don’t rest on identification of particular biases.

Two types of groups

Sunstein and Hastie look at two types of groups – statistical and deliberating groups.

In a statistical group, members give their inputs individually. Those inputs are then aggregated. Think voting (which works well as long as the majority is right).

There is no shortage of material about the wisdom of statistical groups. The story of Francis Galton, where he had people estimate the weight of an ox, is a classic example. The average of the individual predictions was right on the mark.

In deliberating groups, individuals provide input during deliberations. Those inputs can affect and be affected by the inputs of other group members. People aim to influence others. People might change their minds.

Even if most members of a group have the wrong answer or belief, you can picture a scenario where reason and discussion allow the right answer emerge. That is sometimes the case, but the evidence is that deliberating groups do not necessarily converge on the truth.

In one experiment, people answered questions individually before answering those same questions in groups. If the majority of the group knew the correct answer to a problem, the group’s decision was correct 79% of the time. (It’s impressive that the incorrect minority were able to derail the group 21% of the time.) If the majority of the group answered a question incorrectly when answering individually, the group converged on the right answer only 44% of the time. The result of this dynamic was that the average group decision was better, but only marginally so, than the average individual (66% versus 62%).

As a result, it may be easier to simple elicit people’s individual views and average them (or combine in some other novel way) than go through the effort of the group discussion. A statistical group may be a more efficient solution.

Why deliberating groups go wrong (or right)

Why do we get results such as this? Sunstein and Hastie describe plenty of problems that can derail deliberating groups. Group decisions can be poor due to both the rational conduct of group members and because of their “biases”. Here are a few problems that can occur for “rational” reasons:

  • Informational signals: It is sensible to take into account what others have said in a group deliberation. If you know Jane is knowledgeable and has good judgment, hearing that she supports a project is evidence that can affect your support. But if she is wrong, she can derail the group. Seeing other people make errors can also provide “social proof” to an error.
  • Self-censorship: People tend not to give information contradicting their preferred outcome. In one study of over 500 mock jury trials, the experimenters never once observed someone giving information in this circumstance.
  • Reputational cascades: People might know what is right (or what they think is right), but they go along with the group or certain members of the group due to concern for their reputation or standing.

Then there are the “irrational” (a lot of these points are based on single studies, so take with a grain of salt):

  • Deliberating groups are more likely to escalate commitment to a failing course of action. They are also more susceptible to the sunk cost fallacy, the consideration of past costs that should be irrelevant to the decision about future action
  • Groups can amplify the representativeness heuristic, where we judge probability based on resemblance or similarity
  • People in deliberating groups have more unrealistic “overconfidence” (looking at the abstract of the paper cited for this point – I can’t access the full paper – I think they are talking about over-precision)
  • Groups are more vulnerable to framing effects, varying their decision based on how a choice is framed (although looking at the paper Sunstein and Hastie cite, it states that there is little consistency between studies)
  • Group deliberation can make both groups and the individuals in those groups more extreme
  • Shared information has a disproportionate effect on group members. If information is distributed so that key material is unshared (held by only a few group members), this can cause deliberating groups to perform worse.

That said, deliberating groups can temper some biases:

  • Groups tend to rely less on the availability heuristic – a heuristic by which we judge probability by how easily examples readily come to mind. The heuristic is tempered possibly because the group members have different memories. Across the group the available memories may be somewhat more realistic. That said, groups can be subject to availability cascades. An idea held by one person can spread through the group, eventually producing a widespread belief.
  • Groups have a lower tendency to anchor, the over-reliance on the first piece of information with which they are presented (even if it is irrelevant to the decision at hand)
  • Groups tend to have reduced hindsight bias, possibly because not everyone revises their views in the same way
  • Groups tend to have reduced egocentric biases, the belief that others think like you. A group typically has a wider set of tastes to draw on, so you are more likely to have someone point out that your tastes are not shared.

Improving deliberation

The most interesting part of the book is when Sunstein and Hastie turn to their tactics to improve group decision. There are two groups of tactics: those designed to improve deliberation, and alternative decision-making methods. A common threads to these is diversity, although this is “not necessarily along demographic lines, but in terms of ideas and perspectives.”

They list eight ways to avoid problems in deliberating groups: (1) inquisitive and self-silencing leaders; (2) “priming” critical thinking (although we have seen how the priming literature is holding up); (3) rewarding group success (incentives are important, particularly to counter self-censorship and reputational cascades); (4) role assignment; (5) perspective changing; (6) devil’s advocates; (7) red teams; and (8) the Delphi method. A few are worth mentioning.

Role assignment involves giving people discrete roles, such as labelling someone as an “expert”. The purpose is to bring out unshared information by making it clear that the individual expert has a role to play.

Devil’s advocacy involves appointing some group members to deliberately advocate against the group’s inclinations. Sunstein and Hastie suggest that the research behind devil’s advocates is mixed. There is some evidence that devil’s advocacy can be helpful and can enhance group performance. But it requires genuine dissent. If the dissent is insincere (which is often the case if the role is assigned), people discount the dissent accordingly. The advocate also has little to gain by zealously challenging the dominant view. This means it may be better for groups to encourage real dissent.

Sunstein and Hastie are more optimistic about red teaming, the creation of a team tasked with criticising or defeating the preferred solution or plan. I can see how they might be occasionally useful, such as in mock trials, but it wasn’t clear where their optimism came from as they provided little evidence in support.

One option I find useful is the Delphi method. You ask people to state their opinions anonymously and independently before deliberation. These opinions are then made available to others. It is effectively a secret ballot plus reasons, and provides a basis for hidden information to emerge without reputational or informational cascades. Several rounds of this process can be held as the group converges on a solution. It’s a great way to flush out doubts and dissent.

Better decisions without deliberation

Much of the book is dedicated to methods to arrive at good decisions outside of, rather than within, the deliberation process. These include design thinking (as a way of eliciting as much information and as many ideas as possible), cost-benefit analysis, asking the public (public comment or consultation), tournaments, prediction markets, and harnessing experts. Some of these are effectively statistical groups with different models for combining inputs.

Unsurprisingly given Sunstein’s background, the authors are positive on cost benefit analysis. Having seen some cost-benefit sausages being made for government decision-making, I don’t quite share the same optimism, but can see the benefits in the right place.

Sunstein and Hastie are also boosters of use of tournaments. The dispersion of competitors leads to independence in inputs. Their winner take all nature incentivises divergent strategies. They can promote elite performance at the top of competitor’s capabilities.

A question not addressed in the book is to what extent tournaments can be scaled and be a widely used solution. There is a waste of resources inherent in tournaments – the input of the losing teams. A Kaggle competition uses a massive amount of data science capability, far more than the “prize”. At the moment, many candidates are happy to input this effort as there are other benefits, such as reputation. Could it be the standard way of doing things? In the case of government tournaments, they would want to pick the projects of most value to avoid over-stretching the resource.

As a tournament example, Sunstein and Hastie were underwhelmed by the IARPA prediction tournament, where teams competed to predict political and economic events. They felt that the winning solution from the Good Judgement Project was more focused on reducing noise and bias, rather than developing game changing methods that increase signal (tough crowd). (See my post on Superforecasting for more on that tournament.) Maybe the new hybrid forecasting tournament might be more to their liking.

The final technique I’ll note is effective harnessing of experts. This could be using experts who use statistics to develop accurate predictions or make decisions (often in turn drawing on other sources). It could involve identifying fields where expert knowledge is genuine (as identified in the work of Gary Klein). When doing this, however, it is often best to look at statistical groups of experts, rather than to chase a single expert. The average of experts is likely the best prediction. And there is no need to weight for an expert’s confidence in developing that average – it has no correlation with their accuracy.


Postscript 1: Sunstein and Hastie explore the question of collective intelligence (the “c factor”). That deserves to be the subject of another post.

Postscript 2: Sunstein and Hastie talk of “eureka” problems, where the right answer is clear to all once announced. Groups are good at these. They give the “trivial” example of “Why are manhole covers round?” Because “if they were almost any other shape, a loose cover could shift orientation and fall through the hole, potentially causing damage and injuries.” Is that really the logic behind their design? Or is this just a benefit? (I ask not just because most manhole covers in Australia are square or rectangular, and I have never seen a cover fall through the hole.) This example is famous as being used in Microsoft job interviews, but it is a question more focused on making the interviewer feel clever than actually predicting, say, good job performance.