Has the behavioural economics pendulum swung too far?

Over at Behavioral Scientist, as part of their “Nudge Turns 10” special issue, is my latest article When Everything Looks Like a Nail: Building Better “Behavioral Economics” Teams. Here’s the opening:

As someone who became an economist via a brief career as a lawyer, I did notice that my kind had privileged access to the halls of government and business. Whether this was because economics can speak the language of dollars, or that we simply claimed that we had all the answers, the economists were often the first consulted (though not necessarily listened to) on how we priced, regulated, and designed our policies, services, and products.

What I lacked, however, was a privileged understanding of behavior. So about a decade ago, with the shortcomings of economics as an academic discipline top of mind, I commenced a Ph.D. to develop that understanding. It was fortuitous timing. Decades of research by psychologists and “misbehaving” economists was creating a new wave of ideas that would wash out of academia and into the public and corporate spheres. Famously encapsulated in Richard Thaler and Cass Sunstein’s Nudge, there was now a recognized need to design our world for humans, not “econs.”

Following Nudge, a second wave found many organizations creating their own “nudge units.” The Behavioural Insights Team (BIT) within 10 Downing Street was the precursor to government behavioral teams around the world. Although the first dedicated corporate behavioral units predated the BIT, a similar, albeit less visible, pattern of growth can be seen in the private sphere. These teams are now tackling problems in areas as broad as tax policy, retail sales, app design, and social and environmental policy.

On net, these teams have been a positive and resulted in some excellent outcomes. But my experience working in and alongside nudge units has me asking: Has the pendulum swung too far? My education and experience have proven to me that economics and the study of human behavior are complements rather than substitutes. But I worry that in many government departments and businesses, behavioral teams have replaced rather than complemented economics teams. Policymakers and corporate executives, their minds rushing to highly available examples of “nudge team” successes, often turn first to behavioral units when they have a problem.

A world in which we take advice only from economists risks missing the richness of human behavior, designing for people who don’t exist. But a world in which policymakers and corporate executives turn first to behavioral units has not been without costs. A major source of these costs comes from how we have been building behavioral teams.

We have been building narrow teams. We have been building teams with only a subset of the skills required to solve the problems at hand. When you form a team with a single skillset, there is the risk that everything will start to look like a nail.

It’s now time for a third wave. We need to build multidisciplinary behavioral units. Otherwise we may have results such as those reflected in the observations below. Some of the observations relate to my own experiences and errors, some are observations by others. To protect identities, confidential projects, and egos (including my own), I have tweaked the stories. However, the lessons remain the same.

You can read the rest here.

I considered a few alternative angles for the special issue article. One was around the question of whether behavioural interventions that look impressive in isolation are less so if we consider the system-wide effects. Another angle I considered, hinted at in the published piece, is around replicability and publication bias in the public sphere. Maybe they can be topics for future articles.

I also considered an alternative introduction, but changed my approach on feedback from a friend who reviewed the first draft. Here’s the old introduction, which takes too long to get to the point and is too narrow for the ultimate thread of the article, but which makes the point about narrow approaches in a stronger way:

Economists have never been shy about applying the economic toolkit to what are normally considered the non-economic aspects of life. They have tackled discrimination, the family, crime, culture, religion, altruism, sports and war, to name a few.

This “economics imperialism” has often been controversial, but (at least in this author’s opinion) left many subjects better off. Some of the subjects benefited from a different approach, with the effort to repel the imperialists creating more robust disciplines.

But at times the economics imperialists simply missed the mark. Often this was because they lacked domain knowledge. Complexities invalidated their underlying assumptions or created a dynamic that they simply didn’t foresee.

Some economists also have a habit of leaving the complexities of their own body of work behind when they wander into new domains. A rich understanding of moral hazard, adverse selection, information asymmetries and principle-agent problems often becomes a simple declaration to let the price mechanism do its job.

One (almost caricatured) illustration of this occurred when Freakonomics authors Steven Levitt and Stephen Dubner met with Prime Minister David Cameron to discuss increasing health expenditure in the United Kingdom’s free healthcare system. As described in Think Like A Freak, they posed a thought experiment:

What if, for instance, every Briton were also entitled to a free, unlimited, lifetime supply of transportation? That is, what if everyone were allowed to go down to the car dealership whenever they wanted and pick out any new model, free of charge, and drive it home?

We expected him to light up and say, “Well, yes, that’d be patently absurd—there’d be no reason to maintain your old car, and everyone’s incentives would be skewed. I see your point about all this free health care we’re doling out!”

Instead, Cameron said nothing, offered a quick handshake and disappeared to “find a less-ridiculous set of people with whom to meet.”

Can Levitt and Dubner have expected a different response? Even if Levitt had a more serious proposal up his sleeve, Levitt and Dubner’s failure to engage seriously with the particular features of the healthcare market rendered the message useless. They had effectively ignored the complexities of the problem and hammered away in the hope they had found a nail.

A few years before the visit by the Freakonomics team, David Cameron’s Conservative Government established the Behavioural Insights Team, or “Nudge unit” within 10 Downing Street. The team was tasked with realising the Government’s intention to find “intelligent ways to encourage, support and enable people to make better choices for themselves”.

Now spun out of the Cabinet Office, the Behavioural Insights Team was the precursor to government based behavioural teams around the world. Although the first dedicated corporate behavioural units pre-dated the Behavioural Insights Team, a similar, albeit slower pattern of growth can be seen in the private sphere. These teams are now tackling issues as broad as tax evasion, customer conversion, domestic violence and climate change.

While the development of these teams has been a positive and resulted in some excellent outcomes, these teams have not been without weaknesses – in fact, some of the same weaknesses suffered by the economics imperialists. The primary one is that when you form a team around a central idea, there is the risk that everything will start to look like a nail.

The three faces of overconfidence

I have complained before about people being somewhat quick to label poor decisions as being due to “overconfidence”. For one, overconfidence has several distinct forms. It is a mistake to treat each as the same. Further, these forms vary in their pervasiveness.

The last time I made this complaint I drew on an article by Don Moore and Paul Healy, “The Trouble with Overconfidence” (pdf). A more recent article by Don Moore and Derek Schatz (pdf) provides some further colour on this point (HT: Julia Galef). It’s worth pulling out a few excerpts.

So what are these distinct forms? Overestimation, overplacement and overprecision. (It’s also useful to disambiguate overoptimism, which I’ll touch on at the end of this post.)

Overestimation is thinking that you’re better than you are. Donald Trump’s claim to be worth $10 billion (White, 2016) represented an overestimate relative to a more credible estimate of $4.5 billion by Forbes magazine (Peterson-Withorn, 2016). A second measure of overconfidence is overplacement: the exaggerated belief that you are better than others. When Trump claimed to have achieved the largest electoral victory since Ronald Reagan (Cummings, 2017), he was overplacing himself relative to other presidents. A third form of overconfidence is overprecision: being too sure you know the truth. Trump displays overprecision when he claims certainty about views which are contradicted by reality. For example, Trump claimed that thousands of Arab Americans in New Jersey publicly celebrated the fall of the World Trade Center on September 11th, 2001, without evidence supporting the certainty of his assertion (Fox News, 2015).

When people are diagnosing overconfidence, they can conflate the three. Pointing out that 90% of people believe they are better than average drivers (overplacement) is not evidence that a CEO was overconfident in their decision to acquire a competitor (possibly overestimation).


People tend to overestimate their performance on hard tasks. But when easy, they tend to underestimate.

In contrast to the widespread perception that the psychological research is rife with evidence of overestimation (Sharot, 2011), the evidence is in fact thin and inconsistent. Most notably, it is easy to find reversals in which people underestimate their performance, how good the future will be, or their chances of success (Moore & Small, 2008). When a task is easy, research finds that people tend to underestimate performance (Clark & Friesen, 2009). If you ask people to estimate their chances of surviving a bout of influenza, they will radically underestimate this high probability (Slovic, Fischhoff, & Lichtenstein, 1984). If you ask smokers their chances of avoiding lung cancer, they will radically underestimate this high probability (Viscusi, 1990).

The powerful influence of task difficulty (or the commonness of success) on over- and underestimations of performance has long been known as the hard-easy effect (Lichtenstein & Fischhoff, 1977). People tend to overestimate their performance on hard tasks and underestimate it on easy tasks. Any attempt to explain the evidence on overestimation must contend with the powerful effect of task difficulty.


In a reverse of the pattern for overestimation, people tend to overplace on easy tasks, but underplace on harder ones.

The evidence for “better-than-average” beliefs is so voluminous that it has led a number of researchers to conclude that overplacement is nearly universal (Beer & Hughes, 2010; Chamorro- Premuzic, 2013; Dunning, 2005; Sharot, 2011; Taylor, 1989). However, closer examination of this evidence suggests it suffers from a few troubling limitations (Harris & Hahn, 2011; Moore, 2007). Most of the studies measuring better-than-average beliefs use vague response scales that make it difficult to compare beliefs with reality. The most common measure asks university students to rate themselves relative to the average student of the same sex on a 7-point scale running from “Much worse than average” to “Much better than average.” Researchers are tempted to conclude that respondents are biased if more than half claim to be above average. But this conclusion is unwarranted (Benoît & Dubra, 2011). After all, in a skewed distribution the majority will be above average. Over 99% of the population has more legs than average.

Within the small set of studies not vulnerable to these critiques, the prevalence of overplacement shrinks. Indeed, underplacement is rife. People think they are less likely than others to win difficult competitions (Moore & Kim, 2003). When the teacher decides to make the exam harder for everyone, students expect their grades to be worse than others’ even when it is common knowledge that the exam will be graded on a forced curve (Windschitl, Kruger, & Simms, 2003). People believe they are worse jugglers than others, that they are less likely than others to win the lottery, and less likely than others to live past 100 (Kruger, 1999; Kruger & Burrus, 2004; Moore, Oesch, & Zietsma, 2007; Moore & Small, 2008). These underplacement results are striking, not only because they vitiate claims of universal overplacement, but also because they seem to contradict the hard-easy effect in overestimation, which finds that people most overestimate their performance on difficult tasks.

Moore and Healy offer an explanation for the different effects of task difficulty on overestimation and overplacement – myopia. I wrote about that in the earlier post.


Overprecision is pervasive but poorly understood.

A better approach to the study of overprecision asks people to specify a confidence interval around their estimates, such as a confidence interval that is wide enough that there is a 90% chance the right answer is inside it and only a 10% chance the right answer is outside it (Alpert & Raiffa, 1982). Results routinely find that hit rates inside 90% confidence intervals are below 50%, implying that people set their ranges too precisely—acting as if they are inappropriately confident their beliefs are accurate (Moore, Tenney, & Haran, 2016). This effect even holds across levels of expertise (Atir, Rosenzweig, & Dunning, 2015; McKenzie, Liersch, & Yaniv, 2008). However, one legitimate critique of this approach is that ordinary people are unfamiliar with confidence intervals (Juslin, Winman, & Olsson, 2000). That is not how we express confidence in our everyday lives, so maybe unfamiliarity contributes to errors.

Overprecision is the most pervasive but least understood form of overconfidence. Unfortunately, researchers use just a few paradigms to study it, and they rely on self-reports of beliefs using questions people are rarely called on to answer in daily life.

(Although not covered in Moore and Schatz’s paper, Gigerenzer also offers a critique that I’ll discuss in a forthcoming post.)


Moore and Healy don’t touch on overoptimism directly in their paper, but in an interview with Julia Galef on the Rationally Speaking podcast, Moore touches on this point:

Julia: Before we conclude this disambiguation portion of the podcast I want to ask about optimism, which I am using to mean thinking that some project of yours has a greater chance of success than you’re justified in thinking it does. How does that fit into that three‐way taxonomy?

Don: It is an excellent question, and optimism has been studied a great deal. Perhaps the most famous scholars of optimism are Charles Carver and Mike Shier who have a scale that assesses the personality trait of optimism. Their usage of the term is actually not that far from the colloquial usage of the term, where to be optimistic is just to believe that good things are going to happen. Optimism is distinctively about a forecast for the future, and whether you think good things or bad things are going to happen to you.

Interestingly, this trait of optimism seems very weakly related to actual specific measures of overconfidence. When I ask Mike Shier why his optimistic personality trait didn’t correlate with any of my measures of overconfidence he said, “Oh, I wouldn’t expect it to.”

Julia: I would expect it to!

Don: Yeah. My [reaction] actually was, “Well, what the heck does it mean, if it doesn’t correlate with any specific beliefs?”

I think it’s hard to reconcile those in any sort of coherent or rational framework of beliefs. But I have since had to concede that there is a real psychological phenomenology, wherein you can have this free floating, positive expectation that doesn’t commit you to any specific delusional beliefs.

Concern about the “tyranny of choice”? Or condescension towards others’ preferences?

sugdenI have been reading Robert Sugden’s book The Community of Advantage: A Behavioural Economist’s Defence of the Market in preparation for an upcoming webinar with Robert about the book, facilitated by Henry Leveson-Gower.

The webinar will be help at 1pm London time and 10pm Sydney time on Monday 3 September. Details about the webinar are here and you can register here. A video will be posted afterward.

I’ll also post an in-depth review later, but the book is a mix of philosophy, technical economics, and critique of applied behavioural economics. The critiques are great reading, the philosophy is interesting but tougher, and the technical economic sections are for aficionados only.

Here’s one snippet of critique as a taster:

An extreme version of the claim that choice overload is a serious problem in developed economies has been popularized by Barry Schwartz (2004) in a book whose premise is that when the number of options becomes too large, ‘choice no longer liberates, but debilitates. It may even be said to tyrannize’ (2004: 2). Researchers who investigate choice overload sometimes suggest that their findings reveal a fundamental failure of the market system—that it provides too much choice.

[T]he idea that markets offer too much choice seems to have some resonance in public debate, as evidenced by the success of Schwartz’s book and by the fame of Iyengar and Lepper’s experiment with jams. My sense is that it appeals to culturally conservative or snobbish attitudes of condescension towards some of the preferences to which markets cater. This may seem harmless fogeyism, as when Schwarz (2004: 1–2) begins his account of the tyranny of choice by complaining that Gap allows him to choose between too many different types of pairs of jeans (‘The jeans I chose turned out just fine, but it occurred to me that buying a pair of pants should not be a daylong project’). But it often reflects a misunderstanding of the facts of economic life, and a concealed interest in restricting other people’s opportunities to engage in mutually beneficial transactions.

Imagine you are asked to describe your ideal shopping environment. For many people, and I suspect for Schwartz, the description would be something like this. Your Perfect Shop is a small business, conveniently located in your own neighbourhood (perhaps just far enough away that you are not inconvenienced by other customers who might want to park their cars in front of your house). It stocks a small product range, tailored to your particular tastes and interests, but at prices that are similar to those charged by large supermarkets. There are some categories of goods (such as jeans if you are Schwartz) which you sometimes need to buy but whose detailed features do not much interest you. The Perfect Shop stocks a small but serviceable range of such items. There are other categories of goods (breakfast cereal might be an example) for which you have a strong preference for a specific brand and feel no need to try anything different; the Perfect Shop sells a limited range of this type of good, but your favourite brand is always on sale. However, there are a few categories of goods in which you are something of a connoisseur and like to experiment with different varieties. Here, the Perfect Shop offers a wide range of options, imaginatively selected to appeal to people who want to experiment in just the kinds of ways that you do. No shelf space is wasted on categories of goods which you have no desire to buy.

Compared with such an ideal, real shopping may well seem to offer too much choice, not to mention clutter and vulgarity. But, of course, in a world in which there are economies of scale in retailing and people have different tastes and interests, the idea that each of us can have a Perfect Shop is an economic fantasy. A less fantastic possibility is that there are Perfect Shops for some people, but everyone is constrained to use them. Because these shops are well-used, prices can be kept low. But then the viability of what are some people’s Perfect Shops depends on the absence of opportunities for other people to buy what they want. Restricting other people’s opportunities to buy goods that have no appeal to you can be a way of conserving your preferred shopping environment without your having to pay for it. Describing these restrictions as defences against the tyranny of choice can be a convenient camouflage for a form of protectionism.

Gerd Gigerenzer’s Gut Feelings: Short Cuts to Better Decision Making

gut_feelingsFor many years I have been influenced by Gerd Gigerenzer’s arguments about the power of simple heuristics and the underlying rationality to many human decisions. But I have contrasting reactions to different parts of Gerd Gigerenzer’s body of work.

His published collections of essays – Simple Heuristics That Make Us Smart (with Peter Todd and the ABC research group), Adaptive Thinking and Rationality for Mortals – are fantastic, although some people might find them a touch academic.

Gigerenzer’s popular books are more accessible, but the loss of some of the nuance, plus his greater stridency of argument, push them to a point where I find a fair bit to disagree with.

In his most recent book, Risk Savvy, I struggled with how far Gigerenzer extended his arguments about the power of human decision-making. I agree that the heuristics and biases approach can lead us to be overeager in labelling decisions as “irrational” or sub-optimal. “Biased” heuristics can find a better point on the bias-variance trade-off. They are designed to operate in an uncertain world, not in a lab. But there is little doubt that humans err in some cases – particularly in environments with no resemblance to those in which we evolved. Gigerenzer can be somewhat quick to disparage use of data and praise gut instinct in environments where there is little evidence that these instincts work.

Gigerenzer’s earlier Gut Feelings: Short Cuts to Better Decision Making strikes perhaps the best balance between nuance and accessibility. While it still leaves an impression about the accuracy of our instincts that I’m not completely in agreement with, it provides a good overview of how our gut feelings can lead to good decisions.

Gigerenzer defines a gut feeling – which you might also call an intuition or hunch – as a feeling that appears quickly in consciousness, with us unaware of the underlying reasons, but strong enough for us to act on. Gut feelings work through simple rules of thumb that take advantage of the evolved capacities of the brain. The skill of the unconscious is knowing what rule to apply at what time.

Let’s break this down.

The power of simple rules

Gut feelings can be powerful tools despite (and because of) their reliance rules of thumb. Often in decision-making, “less is more”, in that there is a beneficial degree of ignorance, or benefits to excluding information from consideration. The recognition heuristic is an example of this: if you recognise one option but not the other, infer that the recognised option has the higher value. The recognition heuristic only works if recognise one but not the other option.

In contrast, complex strategies can explain too much in hindsight. In an uncertain world where only part of the information is useful for the future, a simple rule that focuses on only the best or a limited subset of information has a good chance of hitting that useful information. Gigerenzer provides plenty of examples of the superiority or utility of simple rules of thumb, a point that many advocates of complex statistical methods and machine learning should hear.

But sometimes Gigerenzer’s examples drift toward becoming straw man competitions. For instance, he describes a competition between two models – multinomial regression and a heuristic called “Take the best” – in predicting school drop-out rates. Take the best operates by looking only at the cue which has the strongest relationship with drop-out rates (such as the attendance rate), and if one is higher than the other, you make a decision at that point. If the cues have the same value, move to the next cue and repeat.

The two models were trained on half the data, and tested against the other half of the data. Take the best achieved 65% accuracy in the training data, and 60% on the test data. In contrast, multinomial regression achieved 72% on training data, but this plunged to 54% on test data. (Gigerenzer only shows a chart in the book – I got the numbers from the related chapter of Simple Heuristics That Make Us Smart.) Multinomial regression overfit the training data.

This victory for Take the best sounds impressive, but there were observations for only 57 schools, with half the data used in training. Of course basing a prediction on a regression with 18 variables and twenty-odd observations is rubbish. I wouldn’t expect anything else. Gigerenzer often frames the victory of simple rules such as Take the Best as surprising to others (and originally to him), which it might be at a general level. But when you look at many of the specific examples and the numbers involved, the surprise doesn’t last long.

There is some more subtlety in the reporting of these results in Simple Heuristics That Make Us Smart, where the prediction of drop out rates was one of 20 “competitions” between Take the Best and multiple regression. The overall gap between Take the Best and multiple regression on the test data was 71% versus 68%, an impressive but narrow victory for Take the Best despite its reliance on far fewer cues.

That said, most of the competitions involved small samples – an area where the simple heuristics excel. Only three of the 20 had more than 30 examples available for training the model. The models also had access to dichotomised, not numerical, values, further decreasing the utility of regression. There is a tie at 76% apiece when numerical values were used. The tie is still an impressive result for the simple Take the Best heuristic, but this is now some way from the headline story we get in Gut Feelings. (Conversely, I should also note that the territory of these competitions was fairly stable, which might give more complex techniques an edge. Move to an uncertain dynamic environment, and the simple heuristics may gain an advantage even if the datasets are much larger.)

How humans use these heuristics

An important part of Gigerenzer’s argument is that these simple heuristics are used by humans. An example he provides is a picture of a boy’s head surrounded by four chocolate bars. Which bar does Charlie want? The one he is looking at. The simple heuristic is that “If a person looks at one alternative (longer than at others), it is likely the one the person desires.”

The gaze heuristic is another example. Someone seeking to catch a ball will run so as to maintain the angle of the ball in their gaze. The gaze heuristic will eventually lead them to where the ball will land. They don’t simply compute where the ball will land and then run there.

The question of whether humans use these heuristics has been tested in the lab. People have been demonstrated to rely heavily on the recognition heuristic when picking winners of tennis matches and football games, particularly where they are unfamiliar with the teams, or in determining which of two cities is larger. Less is more, as if you know all the teams or cities, you can’t use the recognition heuristic. This gives the people using these heuristics surprising predictive power, close (or superior) to more knowledgeable experts.

An interesting question about these heuristics is how someone knows when they should apply a particular heuristic. Gigerenzer notes that the skill of the unconscious is knowing, without thinking, what rule to apply at what time. This is the least satisfactory piece of the book, with little discussion as to how this process might work or be effective. It is fair to say the selection is unconscious – people are particularly poor at explaining what rule they applied – but are they skilful at this selection?

The other question mark relates to the inconsistency of our decisions. As Daniel Kahneman and friends have written about recently, human decisions are often noisy, with decisions varying across occasions. If we are applying heuristics, why do our decisions appear so haphazard in some environments? Does our selection of heuristics only work where we have had the right experience with feedback? More on that below.

Applied gut feelings

A point that Gigerenzer highlights – one of his important contributions to how we should think about the heuristics and biases approach – is that the structure of the environment is central to how well a rule of thumb works. A rule of thumb is not good or bad in itself, but depends on the environment in which it is used.

This point was earlier made by Herbert Simon, with his description of the capabilities of the decision maker, and the environment in which they are used, as blades on a pair of scissors. You cannot assess one without the other.

Where I find the discussion of rules of thumb becomes most interesting is in complex environments where we need to learn the rules of thumb to be applied. The heuristic of following someone else’s gaze to determine what they are talking about is something that one-year olds do. But consider a hospital, where a doctor is trying to determine whether someone is having a heart attack. Or a CEO deciding whether to support a merger.

Gigerenzer points out – as you can also see in work by others such as Gary Klein – that you need feedback to develop expertise. Absent feedback you are likely to fall back on rules that don’t work or that achieve other purposes. Gigerenzer gives the example of judges who are not given feedback on their parole decisions. They then fall back on the heuristic of protecting themselves from criticism by simply following the police and prosecution recommendation.

Gigerenzer offered a few examples where I was not clear on how that expertise could develop. One involves discussion of the benefits of strategies that involve incremental change toward a solution, rather than first computing the ideal solution and acting on it. The gaze heuristic is a good example of this, whereby someone seeking to catch a ball maintains the angle of the ball in their gaze, with this heuristic eventually leading them to where it will land. They don’t simply compute where the ball will land and then run there.

Gigerenzer extends this argument to the setting of company budgets:

Strategies relying on incremental changes also characterize how organizations decide on their yearly budgets. At the Max Planck Institute where I work, my colleagues and I make slight adjustments to last year’s budget, rather than calculating a new budget from scratch. Neither athletes nor business administrators need to know how to calculate the trajectory of the ball or the business. An intuitive “shortcut” will typically get them where they would like to be, and with a smaller chance of making grave errors.

The idea of lower probability of “grave error” might be right. But how does someone learn this skill? And here is Dan Lovallo and Olivier Sibony writing on the same concept:

It has been another long, exhausting budget meeting. As the presenters showed you their plans, you challenged every number, explored every assumption. In the end you raised their targets a little, but, if you’re honest, you have to admit it: the budget this unit will have to deliver next year is not very different from the one they proposed at the beginning of the budget process, which in turn is not very different from the latest forecast for this year.

What happened? The short answer is, you’ve been anchored. Anchoring is the psychological phenomenon that makes a number stick in your mind and influence you — even though you think you’re disregarding it.

I have some sympathy to the Lovallo and Sibony assessment, having sat in numerous organisations where it was near unanimously agreed that the budget needed to be reallocated, but the status quo prevailed. But I’m not overly convinced it was due to anchoring, rather than trenchant self-interest of those who might be affected, and a timidity and desire to avoid conflict on the behalf of the decision makers. It would be interesting to see a study on this. (Maybe it’s out there – I briefly searched, but not particularly hard).

An interesting story in the chapter about medical environments concerned doctors who were required to judge whether someone was having a heart attack. The doctors were doing a generally poor job, defensively sending 90% of people with chest pain to the coronary care unit.

Some researchers developed a process whereby doctors would use a complicated chart with 50-odd probabilities, a long formula and a pocket calculator to determine whether a patient should be admitted to the coronary care unit. The doctors didn’t like it and didn’t understand it, but its use improved their decision-making and reduced overcrowding in the coronary care unit.

The researchers then took the chart and calculator away from the doctors, with the expectation that the decision-making quality would decline back to what it was previously. But the decision quality did not drop. Exposure to the calculator had improved their intuition permanently. What the doctors needed was the cues that they could not learn from experience, but when provided with them, they applied them in a fast and frugal way that matched the accuracy of the more complicated procedure.

As an aside, the above is how the story is told in Gut Feelings, which might have been coloured by some discussion between Gigerenzer and the researchers. My reading of the related article (pdf minus charts) has a different chain of events. The researchers first developed the tool using patient data, and presented their results to the doctors. Seven months later, the tool was trialed. They found that admissions to the coronary care unit had declined following the presentation, but not on introduction of the tool, suggesting the doctors started using the cues after the presentation and could achieve equal superiority through their own decision processes. The paper notes that “Take the Best” and tallying – simply adding up the number of cues – would be good strategies. Gigerenzer takes the analysis further here.

As a second aside, this story is similar to one by Daniel Kahneman tells in Thinking Fast and Slow where military recruiters were asked to use a mechanical process to select candidates. After protesting that they were not robots, Kahneman suggested that after collecting the required data, the recruiters close their eyes, imagine the recruit as a soldier and assign a score of one to five. It turned out the “close your eyes” score was as accurate as the sum of the six factors that were collected, both being much better than the useless interviewing technique they had replaced. Intuition worked, but only after disciplined collection of data (cues).

And as a third aside and contrast, here’s a story from another study (quoted text from here):

During an 18 month period the authors used a computer-based diagnosis system which surpassed physicians in diagnostic accuracy. During the course of this research after each physician made a diagnosis, he or she was informed of the computer’s diagnosis. The diagnostic accuracy of the physicians gradually rose toward that of the computer during the 18 month period. The authors attributed this improvement in part to the “discipline” forced upon the physicians, the constraint of carefully collecting patient information, the “constant emphasis on reliability of clinical data collected, and the rapid ‘feedback’ from the computer,” which may have promoted learning. When the computer system was terminated, the physicians very quickly reverted to their previous lower level of diagnostic accuracy. Apparently discipline and reliability fell victim to creativity and inconsistency.

The rest of the book

Gigerenzer provides plenty of other thought-provoking material about the role of heuristics and gut feeling in various domains. Sometimes it feels a bit shallow Advertising is put down to the recognition heuristic. What about signalling, discussed shortly after in another context? The final couple of chapters relating to moral behaviour and social instincts seemed somewhat out-of-date when looked at next to the burgeoning literature on cultural transmission and learning. But there are enough interesting ideas in those chapters to make them worthwhile. And you can’t expect someone to pin every point down in-depth in a popular book.

So, if you want a dose of Gigerenzer, Gut Feelings is interesting and worth reading. But if you have the patience, I recommend starting with Simple Heuristics That Make Us Smart, Adaptive Thinking and Rationality for Mortals. Then if you want a slightly less “academic” Gigerenzer, move on to Gut Feelings.

Gerd Gigerenzer’s Rationality for Mortals: How People Cope with Uncertainty

RationalityGerd Gigerenzer’s collection of essays Rationality for Mortals: How People Cope with Uncertainty covers most of Gigerenzer’s typical turf: ecological rationality, heuristics that make us smart, understanding risk and so on.

Below are observations on three of the more interesting essays: the first on different approaches to decision making, the second on the power of simple heuristics, and the third on how biologists treat decision making.

Four ways to analyse decision making

In the first essay, Gigerenzer provides four approaches to decision making – unbounded rationality, optimisation under constraints, cognitive illusions (heuristics and biases) and ecological rationality.

1. Unbounded rationality

Unbounded rationality is the territory of neoclassical economics. Omniscient and omnipotent people optimise. They are omniscient in that they can see the future – or at least live in a world of risk where they can assign probabilities. They are omnipotent in that they have all the calculating power they need to make perfect decisions. With that foresight and power, they make optimal decisions.

Possibly the most important point about this model is that it is not designed to describe precisely how people make decisions, but rather to predict behaviour. And in many dimensions, it does quite well.

2. Optimisation under constraints

Under this approach, people are no longer omniscient. They need to search for information. As Gigerenzer points out, however, this attempt to inject realism creates another problem. Optimisation with constraints can be even harder to solve than optimisation with unbounded rationality. As a result, the cognitive power required is even greater.

Gigerenzer is adamant that optimisation under constraints is not bounded rationality – and if we use Herbert Simon’s definition of the term, I would agree – but analysis of this type commonly attracts the “boundedly rational” label.

3. Cognitive illusions – logical irrationality

The next category is the approach in much of behavioural science and behavioural economics. It is often labelled as the “heuristics and biases” program. This program looks to understand the processes under which people make judgments, and in many cases, seeks to show errors of judgment or cognitive illusions.

Gigerenzer picks two main shortcomings of this approach. First, although the program successfully shows failures of logic, it does not look at the underlying norms. Second, it tends not to produce testable theories of heuristics. As Gigerenzer states, “mere verbal labels for heuristics can be used post hoc to “explain” almost everything.”

An example is analysis of overconfidence bias. People are asked a question such as “Which city is farther north – New York or Rome?”, and asked to give their confidence that their answer is correct. When participants are 100 per cent certain of the answer, less than 100 per cent tend to be correct. That pattern of apparent overconfidence continues through lower probabilities.

There are several critiques of this analysis, but one of the common suggestions is that people are presented with questions that are unrepresentative of a typical sample. People typically use alternative cues to answer a question such as the above. In the case of latitude, temperature is a plausible cue. The overconfidence bias occurs because the selected cities are a biased sample where the cue fails more often than expected. If the cities are randomly sampled from the real world, the overconfidence disappears. The net result is that what appears to be a bias may be better explained by the nature of the environment in which the decision is made. (Kahneman and Tversky contest this point, suggesting that even when you take a representative sample, the problem remains.)

4. Ecological rationality

Ecological rationality departs from the heuristics and biases program by examining the relationship between mind and environment, rather than the mind and logic. Human behaviour is shaped by scissors with two blades – the cognitive capabilities of the actor, and the environment. You cannot understand human behaviour without understanding both the capabilities of the decision maker and the environment in which those capabilities are exercised. Gigerenzer would apply the bounded rationality label to this work.

There are three goals to the ecological rationality program. The first is to understand the adaptive toolbox – the heuristics of the decision maker and their building blocks. The second is to understand the environmental structures in which different heuristics are successful. The third is to use this analysis to improve decision making through designing better heuristics or changing the environment. This can only be done once you understand the adaptive toolbox and the environments in which different tools are successful.

Gigerenzer provides a neat example of how the ecological rationality departs from the heuristics and biases program in its analysis of a problem – in this case, optimal asset allocation. Harry Markowitz, who received a Nobel Memorial Prize in Economics for his work on optimal asset allocation, did not use the results of his analysis in his own investing. Instead, he invested his money using the 1/N rule – spread your assets equally across N assets.

The heuristics and biases program might look at this behaviour and note Markowitz is not following the optimal behaviour determined by himself. He is making important decisions without using all the available information. Perhaps it is due to cognitive limitations?

As Gigerenzer notes, optimisation is not always the best solution. Where the problem is computationally intractable or the optimisation solution lacks robustness due to estimation errors, heuristics may outperform. In the case of asset allocation, Gigerenzer notes work showing that 500 years of data would have been required for Markowitz’s optimisation rule to outperform his practice of 1/N. In a world of uncertainty, it can be beneficial to leave information on the table. Markowitz was using a simple heuristic for an important decision, but rightfully so as it is superior for the environment in which he is making the decision.

Simple heuristics make us smart

Gerd Gigerenzer is a strong advocate of the idea that simple heuristics can make us smart. We don’t need complex models of the world to make good decisions.

The classic example is the gaze heuristic. Rather than solving a complex equation to catch a ball, which requires us to know the ball’s speed and trajectory and the effect of the wind, a catcher can simply run to keep the ball at a constant angle in the air, leading them to the point where it will land.

Gigerenzer’s faith in heuristics is often taken to be based on the idea that people have limited processing capacity and are unable to solve the complex optimisation problems that would be needed in the absence of these rules. However, Gigerenzer points out this is perhaps the weakest argument for heuristics:

[W]e will start off by mentioning the weakest reason. With simple heuristics we can be more confident that our brains are capable of performing the necessary calculations. The weakness of this argument is that it is hard to judge what complexity of calculation or memory a brain might achieve. At the lower levels of processing, some human capabilities apparently involve calculations that seem surprisingly difficult (e.g., Bayesian estimation in a sensorimotor context: Körding & Wolpert, 2004). So if we can perform these calculations at that level in the hierarchy (abilities), why should we not be able to evolve similar complex strategies to replace simple heuristics?

Rather, the advantage of heuristics lies in their low information requirements, their speed and, importantly, their accuracy:

One answer is that simple heuristics often need access to less information (i.e. they are frugal) and can thus make a decision faster, at least if information search is external. Another answer – and a more important argument for simple heuristics – is the high accuracy they exhibit in our simulations. This accuracy may be because of, not just in spite of, their simplicity. In particular, because they have few parameters they avoid overfitting data in a learning sample and, consequently, generalize better across other samples. The extra parameters of more complex models often fit the noise rather than the signal. Of course, we are not saying that all simple heuristics are good; only some simple heuristics will perform well in any given environment.

As the last sentence indicates, Gigerenzer is careful not to make any claims that heuristics generally outperform. A statement that a heuristic is “good” is ill-conceived without considering the environment in which it will be used. This is the major departure of Gigerenzer’s ecological rationality from the standard approach in the behavioural sciences, where the failure of a heuristic to perform in an environment is taken as evidence of bias or irrationality.

Once you have noted what heuristic is being used in what environment, you can have more predictive power than in a well-solved optimisation model. For example. an optimisation model to catch a ball will simply predict that the catcher will be at the place and time where the ball lands. Once you understand that they use the gaze heuristic to catch the ball, you can also predict the path that they will take to get to the ball – including that they won’t simply run in a straight line to catch it. If a baseball or cricket coach took the optimisation model too seriously, they would tell the catcher that they are running inefficiently by not going straight to where it will land. Instructions telling them to run is a straight line will likely make their performance worse.

Biologists and decision making

Biologists are usually among the first to tell me that economists rely on unrealistic assumptions about human decision making. They laugh at the idea that people are rational optimisers who care only about maximising consumption.

But the funny thing is, biologists often do the same. Biologists tend to treat their subjects as optimisers.

Gigerenzer has a great chapter considering how biologists treat decision making, and in particular, to what extent biologists consider that animals use simple decision-making tools such as heuristics. Gigerenzer provides a few examples where biologists have examined heuristics, but much of the chapter asks whether biologists are missing something with their typical approach.

As a start, Gigerenzer notes that biologists are seeking to make predictions rather than accurate descriptions of decision making. However, Gigerenzer questions whether this “gambit” is successful.

Behavioral ecologists do believe that animals are using simple rules of thumb that achieve only an approximation of the optimal policy, but most often rules of thumb are not their interest. Nevertheless, it could be that the limitations of such rules of thumb would often constrain behavior enough to interfere with the fit with predictions. The optimality modeler’s gambit is that evolved rules of thumb can mimic optimal behavior well enough not to disrupt the fit by much, so that they can be left as a black box. It turns out that the power of natural selection is such that the gambit usually works to the level of accuracy that satisfies behavioral ecologists. Given that their models are often deliberately schematic, behavioral ecologists are usually satisfied that they understand the selective value of a behavior if they successfully predict merely the rough qualitative form of the policy or of the resultant patterns of behavior.

You could write a similar paragraph about economists. If you were to give the people in an economic model objectives shaped by evolution, it would be almost the same.

But Gigerenzer has another issue with the optimisation approach in biology. As for most analysis of human decision making, “missing from biology is the idea that simple heuristics may be superior to more complex methods, not just a necessary evil because of the simplicity of animal nervous systems.” Gigerenzer writes:

There are a number of situations where the optimal solution to a real-world problem cannot be determined. One problem is computational intractability, such as the notorious traveling salesman problem (Lawler et al., 1985). Another problem is if there are multiple criteria to optimize and we do not know the appropriate way to convert them into a common currency (such as fitness). Thirdly, in many real-world problems it is impossible to put probabilities on the various possible outcomes or even to recognize what all those outcomes might be. Think about optimizing the choice of a partner who will bear you many children; it is uncertain what partners are available, whether each one would be faithful, how long each will live, etc. This is true about many animal decisions too, of course, and biologists do not imagine their animals even attempting such optimality calculations.

Instead the behavioral ecologist’s solution is to find optima in deliberately simplified model environments. We note that this introduces much scope for misunderstanding, inconsistency, and loose thinking over whether “optimal policy” refers to a claim of optimality in the real world or just in a model. Calculating the optima even in the simplified model environments may still be beyond the capabilities of an animal, but the hope is that the optimal policy that emerges from the calculations may be generated instead, to a lesser level of accuracy, by a rule that is simple enough for an animal to follow. The animal might be hardwired with such a rule following its evolution through natural selection, or the animal might learn it through trial and error. There remains an interesting logical gap in the procedure: There is no guarantee that optimal solutions to simplified model environments will be good solutions to the original complex environments. The biologist might reply that often this does turn out to be the case; otherwise natural selection would not have allowed the good fit between the predictions and observations. Success with this approach undoubtedly depends on the modeler’s skill in simplifying the environment in a way that fairly represents the information available to the animal.

Again, Gigerenzer could equally be writing about economics. I think we should be thankful, however, that biologists don’t take their results and develop policy prescriptions on how to get the animals to behave in ways we believe they should.

One interesting question Gigerenzer asks is whether humans and animals use similar heuristics. Consideration of this question might uncover evidence of the parallel evolution of heuristics in other lineages facing similar environmental structures, or even indicate a common evolutionary history. This could form part of the evidence as to whether these human heuristics are evolved adaptations.

But are animals more likely to use heuristics than humans? Gigerenzer suggests the answer is not clear:

It is tempting to propose that since other animals have simpler brains than humans they are more likely to use simple heuristics. But a contrary argument is that humans are much more generalist than most animals and that animals may be able to devote more cognitive resources to tasks of particular importance. For instance, the memory capabilities of small food-storing birds seem astounding by the standards of how we expect ourselves to perform at the same task. Some better-examined biological examples suggest unexpected complexity. For instance, pigeons seem able to use a surprising diversity of methods to navigate, especially considering that they are not long-distance migrants. The greater specialism of other animals may also mean that the environments they deal with are more predictable and thus that the robustness of simple heuristics may not be such as advantage.

Another interesting question is whether animals are also predisposed to the “biases” of humans. Is it possible that “animals in their natural environments do not commit various fallacies because they do not need to generalize their rules of thumb to novel circumstances.” The equivalent for humans is mismatch theory, which proposes that a lot of modern behaviour (and likely the “biases” we exhibit) is due to a mismatch between the environment in which our decision making tools evolved and the environments we exercise them in today.

The difference between knowing the name of something and knowing something

In an excellent article over at Behavioral Scientist (read the whole piece), Koen Smets writes:

A widespread misconception is that biases explain or even produce behavior. They don’t—they describe behavior. The endowment effect does not cause people to demand more for a mug they received than a mug-less counterpart is prepared to pay for one. It is not because of the sunk cost fallacy that we hang on to a course of action we’ve invested a lot in already. Biases, fallacies, and so on are no more than labels for a particular type of observed behavior, often in a peculiar context, that contradicts traditional economics’ simplified view of behavior.

A related point was made by Owen Jones in his paper Why Behavioral Economics Isn’t Better, and How it Could Be:

[S]aying that the endowment effect is caused by Loss Aversion, as a function of Prospect Theory, is like saying that human sexual behavior is caused by Abstinence Aversion, as a function of Lust Theory. The latter provides no intellectual or analytic purchase, none, on why sexual behavior exists. Similarly, Prospect Theory and Loss Aversion – as valuable as they may be in describing the endowment effect phenomena and their interrelationship to one another – provide no intellectual or analytic purchase, none at all, on why the endowment effect exists. …

[Y]ou can’t provide a satisfying causal explanation for a behavior by merely positing that it is caused by some psychological force that operates to cause it. That’s like saying that the orbits of planets around the sun are caused by the “orbit-causing force.” …

[L]oss aversion rests on no theoretical foundation. Nothing in it explains why, when people behave irrationally with respect to exchanges, they would deviate in a pattern, rather than randomly. Nor does it explain why, if any pattern emerges, it should have been loss aversion rather than gain aversion. Were those two outcomes equally likely? If not, why not?

And here’s Richard Feynman on the point more generally (from What Do You Care What Other People Think):

We used to go to the Catskill Mountains, a place where people from New York City would go in the summer. The fathers would all return to New York to work during the week, and come back only for the weekend. On weekends, my father would take me for walks in the woods and he’d tell me about interesting things that were going on in the woods. When the other mothers saw this, they thought it was wonderful and that the other fathers should take their sons for walks. They tried to work on them but they didn’t get anywhere at first. They wanted my father to take all the kids, but he didn’t want to because he had a special relationship with me. So it ended up that the other fathers had to take their children for walks the next weekend.

The next Monday, when the fathers were all back at work, we kids were playing in a field. One kid says to me, “See that bird? What kind of bird is that?”

I said, “I haven’t the slightest idea what kind of a bird it is.”

He says, “It’s a brown-throated thrush. Your father doesn’t teach you anything!”

But it was the opposite. He had already taught me: “See that bird?” he says. “It’s a Spencer’s warbler.” (I knew he didn’t know the real name.) “Well, in Italian, it’s a Chutto Lapittida. In Portuguese, it’s a Bom da Peida. In Chinese, it’s a Chung-long-tah, and in Japanese, it’s a Katano Tekeda. You can know the name of that bird in all the languages of the world, but when you’re finished, you’ll know absolutely nothing whatever about the bird. You’ll only know about humans in different places, and what they call the bird. So let’s look at the bird and see what it’s doing—that’s what counts.” (I learned very early the difference between knowing the name of something and knowing something.)

Knowing the name of a “bias” such as loss aversion isn’t zero knowledge – at least you know it exists. But knowing something exists is a very shallow understanding.

And back to Koen Smets:

Learning the names of musical notes and of the various signs on a staff doesn’t mean you’re capable of composing a symphony. Likewise, learning a concise definition of a selection of cognitive effects, or having a diagram that lists them on your wall, does not magically give you the ability to analyze and diagnose a particular behavioral issue or to formulate and implement an effective intervention.

Behavioral economics is not magic: it’s rare for a single, simple nudge to have the full desired effect. And being able to recite the definitions of cognitive effects does not magically turn a person into a competent behavioral practitioner either. When it comes to understanding and influencing human behavior, there is no substitute for experience and deep knowledge. Nor, perhaps even more importantly, is there a substitute for intellectual rigor, humility, and a healthy appreciation of complexity and nuance.

Michael Mauboussin’s Think Twice: Harnessing the Power of Counterintuition

think_twiceMichael Mauboussin’s Think Twice: Harnessing the Power of Counterintuition is a multi-disciplinary book on how to improve your decision making. Framed around eight common decision-making mistakes, Mauboussin draws on disciplines including psychology, complexity theory and statistics.

Given the scope of the book, it does not reach great depth for most of its subject areas. But the interdisciplinary nature of the book means that most people are likely to find something new. I gained pointers to a lot of interesting reading, plus some new ways of thinking about familiar material. Below are a few interesting parts.

One early chapter contrasts the inside and outside views when making a judgement or prediction, a perspective I have often found helpful. The inside view uses the specific information about the problem at hand. The outside view looks at whether there are similar situations – a reference class – that can provide a statistical basis for the judgement. The simplest statistical basis is the “base rate” for that event – the probability of it generally occurring. The outside view, even a simple base rate, is typically a better indicator of the outcome than an estimate derived from the inside view.

Mauboussin points out that ignorance of the outside view is not the sole obstacle to its use. People will often ignore base rate information even when it is right in front of them. Mauboussin discusses an experiment by Freymuth and Ronan (pdf) where the experimental participants selected treatment for a fictitious disease. When the participants were able to choose a treatment with a 90% success rate that was paired with a positive anecdote, they chose it 90% of the time (choosing a control treatment with 50% efficacy the remaining 10% of the time). But when paired with a negative anecdote, only 39% chose the 90% efficacy treatment. Similarly, a treatment with 30% efficacy paired with a negative anecdote was chosen only 7% of the time, but this increased to 78% when it was paired with a positive anecdote. The stories drowned out the base rate information.

To elicit an outside view, Mauboussin suggests the simple trick of pretending you are predicting for someone else. Think about how the event will turn out for others. This will abstract you from the distracting inside view information and bring you closer to the more reliable outside view.

Mauboussin is at his most interesting, and differs from most standard examinations of decision making, when he considers decision making in complex systems (which happens to be the environment of many of our decisions).

One of his themes is it is nearly impossible to manage a complex system. Understanding any individual part may be of limited use in understanding the whole, and interfering with that part may have many unintended consequences. The century of bungling in Yellowstone National Park (via Alston Chase’s book Playing God in Yellowstone) provides an example. In an increasingly connected world, more of our decisions are going to be in these types of systems.

One barrier to understanding a complex system is that the agents in an apparently intelligent system may not be that intelligent themselves. Mauboussin quotes biologist Deborah Gordon:

If you watch an ant try to accomplish something, you’ll be impressed by how inept it is. Ants aren’t smart, ant colonies are.

Complex systems often perform well at a system level despite the dumb agents. No single ant understands what the colony is doing, yet the colony does well.

Mauboussin turns this point into a critique of behavioural finance, suggesting it is a mistake to look at individuals rather than the market:

Regrettably, this mistake also shows up in behavioral finance, a field that considers the role of psychology in economic decision making. Behavioral finance enthusiasts believe that since individuals are irrational—counter to classical economic theory—and markets are made up of individuals, then markets must be irrational. This is like saying, “We have studied ants and can show that they are bumbling and inept. Therefore, we can reason that ant colonies are bumbling and inept.” But that conclusion doesn’t hold if more is different—and it is. Market irrationality does not follow from individual irrationality. You and I both might be irrationally overconfident, for example, but if you are an overconfident buyer and I am an overconfident seller, our biases may cancel out. In dealing with systems, the collective behavior matters more. You must carefully consider the unit of analysis to make a proper decision.

Mauboussin’s discussion of the often misunderstood concept of reversion (regression) to the mean is also useful. Here are some snippets:

“Mediocrity tends to prevail in the conduct of competitive business,” wrote Horace Secrist, an economist at Northwestern University, in his 1933 book, The Triumph of Mediocrity in Business. With that stroke of the pen, Secrist became a lasting example of the second mistake associated with reversion to the mean—a misinterpretation of what the data says. Secrist’s book is truly impressive. Its four hundred-plus pages show mean-reversion in series after series in an apparent affirmation of the tendency toward mediocrity.

In contrast to Secrist’s suggestion, there is no tendency for all companies to migrate toward the average or for the variance to shrink. Indeed, a different but equally valid presentation of the data shows a “movement away from mediocrity and [toward] increasing variation.” A more accurate view of the data is that over time, luck reshuffles the same companies and places them in different spots on the distribution. Naturally, companies that had enjoyed extreme good or bad luck will likely revert to the mean, but the overall system looks very similar through time. …

A counterintuitive implication of mean reversion is that you get the same result whether you run the data forward or backward. So the parents of tall children tend to be tall, but not as tall as their children. Companies with high returns today had high returns in the past, but not as high as the present. …

Here’s how to think about it. Say results are part persistent skill and part transitory luck. Extreme results in any given period, reflecting really good or bad luck, will tend to be less extreme either before or after that period as the contribution of luck is less significant. …

On this last point, a simple test of whether your activity involves skill is whether you can lose on purpose. For example, try to build a stock portfolio that will do worse than the benchmark.

Mauboussin links reversion of the mean to the “halo effect” (I recommend reading Phil Rosenzweig’s book of that name). The halo effect is the tendency of impressions from one area to influence impressions of another. In business, if people see a company with good profits, they will tend to assess the CEO’s management style, communications, organisational structure, strategic direction as all being positive.

When the company’s performance later reverts to the mean, people then interpret all of these things as going bad, when it is quite possible nothing has changed. The result is that great results tend to be followed by glowing stories in the media followed by the fall:

Tom Arnold, John Earl, and David North, finance professors at the University of Richmond, reviewed the cover stories that Business-Week, Forbes, and Fortune had published over a period of twenty years. They categorized the articles about companies from most bullish to most bearish. Their analysis revealed that in the two years before the cover stories were published, the stocks of the companies featured in the bullish articles had generated abnormal positive returns of more than 42 percentage points, while companies in the bearish articles underperformed by nearly 35 percentage points, consistent with what you would expect. But for the two years following the articles, the stocks of the companies that the magazines criticized outperformed the companies they praised by a margin of nearly three to one.

And to close, Mauboussin provides a great example of bureaucratic kludge preventing the use of a checklist in medical treatment:

Toward the end of 2007, a federal agency called the Office for Human Research Protections charged that the Michigan program violated federal regulations. Its baffling rationale was that the checklist represented an alteration in medical care similar to an experimental drug and should continue only with federal monitoring and the explicit written approval of the patient. While the agency eventually allowed the work to continue, concerns about federal regulations needlessly delayed the program’s progress elsewhere in the United States. Bureaucratic inertia triumphed over a better approach.

Robert Sapolsky’s Why Zebra’s Don’t Get Ulcers

img_0084Before tackling Robert Sapolsky’s new book Behave: The Biology of Humans at Our Best and Worst, I decided to read Sapolsky’s earlier, well-regarded book Why Zebra’s Don’t Get Ulcers. I have been a fan of Sapolsky’s for some time, largely through his appearance on various podcasts. (This discussion with Sam Harris is excellent.)

Why Zebra’s Don’t Get Ulcers is a wonderful book. Sapolsky is a great writer, and the science is interesting. That Sapolsky did not sugarcoat the introduction to every chapter with a cute story, as seems to be a common formula today, made the book a pleasant contrast to a lot of my recent reading.

The core theme of the book is that chronic stress is bad for your health. It can lead to cardiovascular disease, destroy your sleep, age you faster, and so on. The one positive (relative to common beliefs) is that stress probably doesn’t cause cancer (with the possible exception of colon cancer).

The story linking stress with these health problems largely revolves around the hormones that trigger the stress response. I’ll give a quick synopsis of this story, as it helps give context to some of the snippets below.

When the stressor first arises, CRH (corticotropin releasing hormone) is released from the hypothalamus in the brain. CRH helps to turn on the sympathetic nervous system, with the nerve endings of the sympathetic nervous system releasing adrenaline (called epinephrine through the book). This all leads to increased heart rate, vigilance and arousal. It triggers the cessation of many bodily functions, such as digestion, repair and reproductive processes, and suppresses immunity, mobilising the body’s resources to solve the stressor at hand.

Fast forward 15 seconds, and the CRH has triggered the pituitary at the base of the brain to release ACTH (also known as corticotropin). A few minutes later the ACTH in turn triggers the release of glucocorticoids by the adrenal gland. The glucocorticoids increase the stress response, further arousing the sympathetic nervous system and raising circulating glucose. The glucocorticoids are also involved in recovery and the preparation for the next stressor. For instance, they stimulate appetite.

Many of the costs of stress arise through the actions of these hormones when the stress is intermittent or chronic. CRH is cleared from the body a couple of minutes after the end of the stressor. It can take hours for glucocorticoids to be cleared. Continued intermittent or chronic stressors results in permanently elevated glucocorticoid levels, subjecting the body to a stress response without pause. For instance, the stress response makes the heart work harder. If you are in chronic stress, this increased work effort is constant, leading to high blood pressure, and wearing out your blood vessels.

There are a raft of other hormones and processes involved in the stress response, each with their own roles, costs and benefits, but this basic picture, particularly the cost of ongoing high levels of glucocorticoids, forms the books central thread.

Although this sounds like a somewhat mechanical process, an important theme in the book is that the cost of stress is not just a mechanical equation, whereby stress causes a bodily response with various costs. The book balances a reductive view of biology, in which you can trace everything back to physical factors such as bacteria, viruses, genes, hormones and so on, with another view that is more psychologically grounded. In that latter view, stress can be purely psychological, affected by someone’s sense of control and so on.

The one part of the book that I found mildly unsatisfying was the chapter on the link between stress, poverty and health. Naturally, poverty and poor health are closely linked, with poverty associated with greater stress. Sapolsky asks about direction of causality: does poverty harm health, or does poor health lead to poverty. But (as he does in some other chapters), Sapolsky does not delve deeply into whether there might be other causal factors. I felt that that chapter deserves another book.

More generally, I don’t have the subject expertise to critique the book, but I highlighted a lot of interesting passages. Below is a selection.

On sex differences in stress response:

Taylor argues convincingly that the physiology of the stress-response can be quite different in females, built around the fact that in most species, females are typically less aggressive than males, and that having dependent young often precludes the option of flight. Showing that she can match the good old boys at coming up with a snappy sound bite, Taylor suggests that rather than the female stress-response being about fight-or-flight, it’s about “tend and befriend”—taking care of her young and seeking social affiliation.

A few critics of Taylor’s influential work have pointed out that sometimes the stress-response in females can be about fight-or-flight, rather than affiliation. For example, females are certainly capable of being wildly aggressive (often in the context of protecting their young), and often sprint for their lives or for a meal (among lions, for example, females do most of the hunting). Moreover, sometimes the stress-response in males can be about affiliation rather than fight-or-flight. This can take the form of creating affiliative coalitions with other males or, in those rare monogamous species (in which males typically do a fair amount of the child care), some of the same tending and befriending behaviors as seen among females. Nevertheless, amid these criticisms, there is a widespread acceptance of the idea that the body does not respond to stress merely by preparing for aggression or escape, and that there are important gender differences in the physiology and psychology of stress.

On stress making us both eat more and less:

The official numbers are that stress makes about two-thirds of people hyperphagic (eating more) and the rest hypophagic. Weirdly, when you stress lab rats, you get the same confusing picture, where some become hyperphagic, others hypophagic. So we can conclude with scientific certainty that stress can alter appetite. Which doesn’t teach us a whole lot, since it doesn’t tell us whether there’s an increase or decrease. …

The confusing issue is that one of the critical hormones of the stress-response stimulates appetite, while another inhibits it. … CRH inhibits appetite, glucocorticoids do the opposite. Yet they are both hormones secreted during stress. Timing turns out to be critical. …

Suppose that something truly stressful occurs, and a maximal signal to secrete CRH, ACTH, and glucocorticoids is initiated. If the stressor ends after, say, ten minutes, there will cumulatively be perhaps a twelve-minute burst of CRH exposure (ten minutes during the stressor, plus the seconds it takes to clear the CRH afterward) and a two-hour burst of exposure to glucocorticoids (the roughly eight minutes of secretion during the stressor plus the much longer time to clear the glucocorticoids). So the period where glucocorticoid levels are high and those of CRH are low is much longer than the period of CRH levels being high. A situation that winds up stimulating appetite. In contrast, suppose the stressor lasts for days, nonstop. In other words, days of elevated CRH and glucocorticoids, followed by a few hours of high glucocorticoids and low CRH, as the system recovers. The sort of setting where the most likely outcome is suppression of appetite. The type of stressor is key to whether the net result is hyper-or hypophagia. …

Take some crazed, maze-running rat of a human. He sleeps through the alarm clock first thing in the morning, total panic. Calms down when it looks like the commute isn’t so bad today, maybe he won’t be late for work after all. Gets panicked all over again when the commute then turns awful. Calms down at work when it looks like the boss is away for the day and she didn’t notice he was late. Panics all over again when it becomes clear the boss is there and did notice. So it goes throughout the day. … What this first person is actually experiencing is frequent intermittent stressors. And what’s going on hormonally in that scenario? Frequent bursts of CRH release throughout the day. As a result of the slow speed at which glucocorticoids are cleared from the circulation, elevated glucocorticoid levels are close to nonstop. Guess who’s going to be scarfing up Krispy Kremes all day at work?

So a big reason why most of us become hyperphagic during stress is our westernized human capacity to have intermittent psychological stressors throughout the day.

On the link between the brain and immunity:

The evidence for the brain’s influence on the immune system goes back at least a century, dating to the first demonstration that if you waved an artificial rose in front of someone who is highly allergic to roses (and who didn’t know it was a fake), they’d get an allergic response. … [T]he study that probably most solidified the link between the brain and the immune system used a paradigm called conditioned immunosuppression.

Give an animal a drug that suppresses the immune system. Along with it, provide, à la Pavlov’s experiments, a “conditioned stimulus”—for example, an artificially flavored drink, something that the animal will associate with the suppressive drug. A few days later, present the conditioned stimulus by itself—and down goes immune function. … The two researchers experimented with a strain of mice that spontaneously develop disease because of overactivity of their immune systems. Normally, the disease is controlled by treating the mice with an immunosuppressive drug. Ader and Cohen showed that by using their conditioning techniques, they could substitute the conditioned stimulus for the actual drug—and sufficiently alter immunity in these animals to extend their life spans.

Does acupuncture rely on a placebo effect?

[S]cientists noted that Chinese veterinarians used acupuncture to do surgery on animals, thereby refuting the argument that the painkilling characteristic of acupuncture was one big placebo effect ascribable to cultural conditioning (no cow on earth will go along with unanesthetized surgery just because it has a heavy investment in the cultural mores of the society in which it dwells).

On the anticipatory stress when you set an early alarm:

In the study, one group of volunteers was allowed to sleep for as long as they wanted, which turned out to be until around nine in the morning. As would be expected, their stress hormone levels began to rise around eight. How might you interpret that? These folks had enough sleep, happily restored and reenergized, and by about eight in the morning, their brains knew it. Start secreting those stress hormones to prepare to end the sleep. But the second group of volunteers went to sleep at the same time but were told that they would be woken up at six in the morning. And what happened with them? At five in the morning, their stress hormone levels began to rise. This is important. Did their stress hormone levels rise three hours earlier than the other group because they needed three hours less sleep? Obviously not. … Their brains were feeling that anticipatory stress while sleeping, demonstrating that a sleeping brain is still a working brain.

On the importance of having outlets for stress, even if that outlet is someone else:

An organism is subjected to a painful stimulus, and you are interested in how great a stress-response will be triggered. The bioengineers had been all over that one, mapping the relationship between the intensity and duration of the stimulus and the response. But this time, when the painful stimulus occurs, the organism under study can reach out for its mommy and cry in her arms. Under these circumstances, this organism shows less of a stress-response. …

Two identical stressors with the same extent of allostatic disruption can be perceived, can be appraised differently, and the whole show changes from there. …

The subject of one experiment is a rat that receives mild electric shocks (roughly equivalent to the static shock you might get from scuffing your foot on a carpet). Over a series of these, the rat develops a prolonged stress-response: its heart rate and glucocorticoid secretion rate go up, for example. For convenience, we can express the long-term consequences by how likely the rat is to get an ulcer, and in this situation, the probability soars. In the next room, a different rat gets the same series of shocks—identical pattern and intensity; its allostatic balance is challenged to exactly the same extent. But this time, whenever the rat gets a shock, it can run over to a bar of wood and gnaw on it. The rat in this situation is far less likely to get an ulcer. You have given it an outlet for frustration. Other types of outlets work as well—let the stressed rat eat something, drink water, or sprint on a running wheel, and it is less likely to develop an ulcer. …

A variant of Weiss’s experiment uncovers a special feature of the outlet-for-frustration reaction. This time, when the rat gets the identical series of electric shocks and is upset, it can run across the cage, sit next to another rat and… bite the hell out of it. Stress-induced displacement of aggression: the practice works wonders at minimizing the stressfulness of a stressor.

On how predictability can make stressors less stressful:

During the onset of the Nazi blitzkrieg bombings of England, London was hit every night like clockwork. Lots of stress. In the suburbs the bombings were far more sporadic, occurring perhaps once a week. Fewer stressors, but much less predictability. There was a significant increase in the incidence of ulcers during that time. Who developed more ulcers? The suburban population. (As another measure of the importance of unpredictability, by the third month of the bombing, ulcer rates in all the hospitals had dropped back to normal.)

On the link between low SES and poor health – it is more about someone’s beliefs than their actual level of poverty:

[T]he SES/ health gradient is not really about a distribution that bottoms out at being poor. It’s not about being poor. It’s about feeling poor, which is to say, it’s about feeling poorer than others around you. …

Instead of just looking at the relationship between SES and health, Adler looks at what health has to do with what someone thinks and feels their SES is—their “subjective SES.” Show someone a ladder with ten rungs on it and ask them, “In society, where on this ladder would you rank yourself in terms of how well you’re doing?” Simple. First off, if people were purely accurate and rational, the answers across a group should average out to the middle of the ladder’s rungs. But cultural distortions come in—expansive, self-congratulatory European-Americans average out at higher than the middle rung (what Adler calls her Lake Wobegon Effect, where all the children are above average); in contrast, Chinese-Americans, from a culture with less chest-thumping individualism, average out to below the middle rung. …

Amazingly, it is at least as good a predictor of these health measures as is one’s actual SES, and, in some cases, it is even better.

Tom Griffiths on Gigerenzer versus Kahneman and Tversky. Plus a neat explanation on why the availability heuristic can be optimal

From an interview of Tom Griffiths by Julia Galef on the generally excellent Rationally Speaking podcast (transcript here):

Julia: There’s this ongoing debate in the heuristics and biases field and related fields. I’ll simplify here, but between, on the one hand, the traditional Kahneman and Tversky model of biases as the ways that human reasoning deviates from ideal reasoning, systematic mistakes that we make, and then on the other side of the debate are people, like for example Gigerenzer, who argue, “No, no, no, the human brain isn’t really biased. We’re not really irrational. These are actually optimal solutions to the problems that the brain evolved to face and to problems that we have limited time and processing power to deal with, so it’s not really appropriate to call the brain irrational, it’s just optimized for particular problems and under particular constraints.”

It sounds like your research is pointing towards the second of those positions, but I guess it’s not clear to me what the tension actually is with Kahneman and Tversky in what you’ve said so far.

Tom: Importantly, I think, we were using pieces of both of those ideas. I don’t think there’s necessarily a significant tension with the Kahneman and Tversky perspective.

Here’s one way of characterizing this. Gigerenzer’s argument has focused on one particular idea which comes from statistics, which is called the bias‐variance trade off. The basic idea of this principle is that you don’t necessarily want to use the most complex model when you’re trying to solve a problem. You don’t necessarily want to use the most complex algorithm.

If you’re trying to build a predictive model, including more predictors into the model can be something which makes the model actually worse, provided you are doing something like trying to minimize the errors that you’re making in accounting for the data that you’ve seen so far. The problem is that, as your model gets more complicated, it can overfit the data. It can end up producing predictions which are driven by noise that appears in the data that you’re seeing, because it’s got such a greater expressive capacity.

The idea is, by having a simpler model, you’re not going to get into that problem of ending up doing a good job of modeling the noise, and as a consequence you’re going to end up making better predictions and potentially doing a better job of solving those problems.

Gigerenzer’s argument is that some of these heuristics, which you can think about as strategies that end up being perhaps simpler than other kinds of cognitive strategies you can engage in, they’re going to work better than a more complex strategy ‐‐ precisely because of the bias‐variance trade off, precisely because they take us in that direction of minimizing the amount that we’re going to be overfitting the data.

The reason why it’s called the bias‐variance trade off is that, as you go in that direction, you add bias to your model. You’re going to be able to do a less good job of fitting data sets in general, but you’re reducing variance ‐‐ you’re reducing the amount which the answers you’re going to get are going to vary around depending on the particular data that you see. Those two things are things that are both bad for making predictions, and so the idea is you want to find the point which is the right trade off between those two kinds of errors.

What’s interesting about that is that you basically get this one explanatory dimension where it says making things simpler is going to be good, but it doesn’t necessarily explain why you get all the way to the very, very simple kinds of strategies that Gigerenzer tends to advocate. Because basically what the bias‐ variance trade off tells you is that you don’t want to use the most complex thing, but you probably also don’t want to use the simplest thing. You actually want to use something which is somewhere in between, and that might end up being more complex than perhaps the simpler sorts of strategies that Gigerenzer has identified, things that, say, rely on just using a single predictor when you’re trying to make a decision.

Kahneman and Tversky, on the other hand, emphasized heuristics as basically a means of dealing with cognitive effort, or the way that I think about it is computational effort. Doing probabilistic reasoning is something which, as a computational problem, is really hard. It’s Bayesian inference… It falls into the categories of problems which are things that we don’t have efficient algorithms to get computers to do, so it’s no surprise that they’d be things that would be challenging for people as well. The idea is, maybe people can follow some simpler strategies that are reducing the cognitive effort they need to use to solve problems.

Gigerenzer argued against that. He argued against people being, I think the way he characterized it was being “lazy,” and said instead, “No, we’re doing a good job with solving these problems.”

I think the position that I have is that I think both of those perspectives are important and they’re going to be important for explaining different aspects of the heuristics that we end up using. If you add in this third factor of cognitive effort, that’s something which does maybe push you a little bit further in terms of going in the direction of simplicity, but it’s also something that we can use to explain other kinds of heuristics.

Griffiths later provides a great explanation of why the availability heuristic can be a good decision-making tool:

Tom: The basic idea behind availability is that if I ask you to judge the probability of something, to make a decision which depends on probabilities of outcomes, and then you do that by basically using those outcomes which come to mind most easily.

An example of this is, say, if you’re going to make a decision as to whether you should go snorkeling on holiday. You might end up thinking not just about the colorful fish you’re going to see, but also about the possibility of shark attacks. Or, if you’re going to go on a plane flight, you’ll probably end up thinking about terrorists more than you should. These are things which are very salient to us and jump out at us, and so as a consequence we end up overestimating their probabilities when we’re trying to make decisions.

What Falk did was look at this question from the perspective of trying to think about a computational solution to the problem of calculating an expected utility. If you’re acting rationally, what you should be doing when you’re trying to make a decision as to whether you want to do something or not, is to work out what’s the probabilities of all of the different outcomes that could happen? What’s the utility that you assign to those outcomes? And then average together those utilities weighted by their probabilities. Then that gives you the value of that particular option.

That’s obviously a really computationally demanding thing, particularly for the kinds of problems that we face as human beings where there could be many possible outcomes, and so on and so on.

A reasonable way that you could try and solve that problem instead is by sampling, by generating some sample of outcomes and then evaluating utilities of those outcomes and then adding those up.

Then you have this question, which is, well, what distribution should you be sampling those outcomes from? I think the immediate intuitive response is to say, “Well, you should just generate those outcomes with the probability that they occur in the world. You should just generate an unbiased sample.” Indeed, if you do that, you’ll get an unbiased estimate of the expected utility.

The problem with that is that if you are in a situation where there are some outcomes that are extreme outcomes ‐‐ that, say, occur with relatively lower probability, which is I think the sort of context that we often face in the sorts of decisions that we make as humans ‐‐ then that strategy is going to not work very well. Because there’s a chance that you don’t generate those extreme outcomes, because you’re sampling from this distribution, and those things might have relatively low chance of happening.

The answer is, in order to deal with that problem, you probably want to generate from a different distribution. And we can ask, what’s the best distribution to generate from, from the perspective of minimizing the variance in the estimates? Because in this case it’s the variance which really kills you, it’s the variability across those different samples. The answer is: Add a little bit of bias. It’s the bias‐variance trade off again. You generate from a biased distribution, that results in a biased estimate.

The optimal distribution to generate from, from the perspective of minimizing variance, is the distribution where the probability of generating an outcome is proportional to the probability of that outcome occurring in the world, multiplied by the absolute value of its utility.

Basically, the idea is that you want to generate from a distribution where those extreme events that are either extremely good or extremely bad are given greater weight ‐‐ and that’s exactly what we end up doing when we’re answering questions using those available examples. Because the things that we tend to focus on, and the things that we tend to store in our memory, are those things which really have extreme utilities.

Can we make the availability heuristic work better for us?

I think the other idea is that, to the extent that we’ve already adopted these algorithms and these end up being strategies that we end up using, you can also ask the question of how we might structure our environments in ways that we end up doing a better job of solving the problems we want to solve, because we’ve changed the nature of the inputs to those algorithms. If intervening on the algorithms themselves is difficult, intervening on our environments might be easier, and might be the kind of thing that makes us able to do a better job of making these sorts of inferences.

To return to your example of shark attacks and so on, I think you could expect that there’s even more bias than the optimal amount of bias in availability‐based decisions because what’s available to us has changed. One of the things that’s happened is you can hear about shark attacks on the news, and you can see plane crashes and you can see all of these different kinds of things. The statistics of the environment that we operate in are also just completely messed up with respect to what’s relevant for making our own decisions.

So a basic recommendation that would come out of that is, if this is the way that your mind tends to work, try and put yourself in an environment where you get exposed to the right kind of statistics. I think the way you were characterizing that was in terms of you find out what the facts are on shark attacks and so on.

Listen to the full episode – or in fact, much of the Rationally Speaking back catalogue. I’m still only partway through, but recommend the interviews with Daniel Lakens on p-hacking, Paul Bloom on empathy, Bryan Caplan on parenting, Phil Tetlock on forecasting, Tom Griffiths’s reappearance with his Algorithms to Live By co-author, Brian Christian, and Don Moore on overconfidence. Julia Galef is a great interviewer – I like the sceptical manner in which she probes her guests and digs into the points she wants to understand.

Opposing biases

From the preface of one print of Philip Tetlock’s Expert Political Judgement (hat tip to Robert Wiblin who quoted this passage in the introduction to an 80,000 hours podcast episode):

The experts surest of their big-picture grasp of the deep drivers of history, the Isaiah Berlin–style “hedgehogs,” performed worse than their more diffident colleagues, or “foxes,” who stuck closer to the data at hand and saw merit in clashing schools of thought. That differential was particularly pronounced for long-range forecasts inside experts’ domains of expertise.

Hedgehogs were not always the worst forecasters. Tempting though it is to mock their belief-system defenses for their often too-bold forecasts—like “off-on-timing” (the outcome I predicted hasn’t happened yet, but it will) or the close-call counterfactual (the outcome I predicted would have happened but for a fluky exogenous shock)—some of these defenses proved quite defensible. And, though less opinionated, foxes were not always the best forecasters. Some were so open to alternative scenarios (in chapter 7) that their probability estimates of exclusive and exhaustive sets of possible futures summed to well over 1.0. Good judgment requires balancing opposing biases. Over-confidence and belief perseverance may be the more common errors in human judgment but we set the stage for over-correction if we focus solely on these errors and ignore the mirror image mistakes, of under-confidence and excessive volatility.

I can see why this idea of opposing biases makes correction of “biases” difficult.

But before we get to the correction of biases, this concept of opposing biases points at a major difficulty with behavioural analyses of decision making. When you have, say, both loss aversion and overconfidence in your bag of explanations for poor decision making, you can explain almost anything after the fact. The gamble turned out poorly? Overconfidence. Didn’t take the gamble? Loss aversion.

Recently I’ve heard a lot of people talking of action bias. There is also a status quo bias. Again, a pair of biases with which we can explain anything.