A problem in the world or a problem in the model

In reviewing Michael Lewis’s The Undoing Project, John Kay writes:

Since Paul Samuelson’s Foundations of Economic Analysis, published in 1947, mainstream economics has focused on an axiomatic approach to rational behaviour. The overriding requirement is for consistency of choice: if A is chosen when B is available, B will never be selected when A is available. If choices are consistent in this sense, their outcomes can be described as the result of optimisation in the light of a well-defined preference ordering.

In an impressive feat of marketing, economists appropriated the term “rationality” to describe conformity with these axioms. Such consistency is not, however, the everyday meaning of rationality; it is not rational, though it is consistent, to maintain the belief that there are fairies at the bottom of the garden in spite of all evidence to the contrary. …

… In the 1970s, however, Kahneman and Tversky began research that documented extensive inconsistency with those rational choice axioms.

What they did, as is common practice in experimental psychology, was to set puzzles to small groups of students. The students often came up with what the economics of rational choice would describe as the “wrong” answer. These failures of the predictions of the theory clearly demand an explanation. But Lewis—like many others who have written about behavioural economics—does not progress far beyond compiling a list of these so-called “irrationalities.”

This taxonomic approach fails to address crucial issues. Is rational choice theory intended to be positive—a description of how people do in fact behave—or normative—a recommendation as to how they should behave? Since few people would wish to be labelled irrational, the appropriation of the term “rationality” conflates these perspectives from the outset. Do the observations of allegedly persistent irrationality represent a wide-ranging attack on the quality of human decision-making—or a critique of the economist’s concept of rationality? The normal assumption of economists is the former; the failure of observation to correspond with theory identifies a problem in the world, not a problem in the model. Kahneman and Tversky broadly subscribe to that position; their claim is that people—persistently—make stupid mistakes.

I have seen many presentations with an opening line of “economists assume we are rational”, quickly followed by conclusions about poor human decision-making, the two being conflated. More often than not, it’s better to ignore economics as a starting point and to simply examine the evidence for poor decision making. That evidence is, of course, much richer – and debatable – than a simple refutation of the basic economics axioms.

One of those debates concerns the Linda problem. Kay continues:

Take, for example, the famous “Linda Problem.” As Kahneman frames it: “Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which of the following is more likely? ‘Linda is a bank teller,’ ‘Linda is a bank teller and is active in the feminist movement.’”

The common answer is that the second alternative—that Linda is more likely to be a feminist bank teller than a bank teller—is plainly wrong, because the rules of probability state that a compound probability of two events cannot exceed the probability of either single event. But to the horror of Kahneman and his colleagues, many people continue to assert that the second description is the more likely even after their “error” is pointed out.

But it does not require knowledge of the philosopher Paul Grice’s maxims of conversation—although perhaps it helps—to understand what is going on here. The meaning of discourse depends not just on the words and phrases used, but on their context. The description that begins with Linda’s biography and ends with “Linda is a bank teller” is not, without more information, a satisfactory account. Faced with such a narrative in real life, one would seek further explanation to resolve the apparent incongruity and, absent of such explanation, be reluctant to believe, far less act on, the information presented.

Kahneman and Tversky recognised that we prefer to tell stories than to think in terms of probability. But this should not be assumed to represent a cognitive failure. Storytelling is how we make sense of a complex world of which we often know, and understand, little.

So we should be wary in our interpretation of the findings of behavioural economists. The environment in which these experiments are conducted is highly artificial. A well-defined problem with an identifiable “right” answer is framed in a manner specifically designed to elucidate the “irrationality” of behaviour that the experimenter triumphantly identifies. This is a very different exercise from one which demonstrates that people make persistently bad decisions in real-world situations, where the issues are typically imperfectly defined and where it is often not clear even after the event what the best course of action would have been.

Kay also touches on the more general criticisms:

Lewis’s uncritical adulation of Kahneman and Tversky gives no credit to either of the main strands of criticism of their work. Many mainstream economists would acknowledge that people do sometimes behave irrationally, but contend that even if such irrationalities are common in the basements of psychology labs, they are sufficiently unimportant in practice to matter for the purposes of economic analysis. At worst, a few tweaks to the standard theory can restore its validity.

From another perspective, it may be argued that persistent irrationalities are perhaps not irrational at all. We cope with an uncertain world, not by attempting to describe it with models whose parameters and relevance we do not know, but by employing practical rules and procedures which seem to work well enough most of the time. The most effective writer in this camp has been the German evolutionary psychologist Gerd Gigerenzer, and the title of one of his books, Simple Heuristics That Make Us Smart, conveys the flavour of his argument. The discovery that these practical rules fail in some stylised experiments tells us little, if anything, about the overall utility of Gigerenzer’s “fast and frugal” rules of behaviour.

Perhaps it is significant that I have heard some mainstream economists dismiss the work of Kahneman in terms not very different from those in which Kahneman reportedly dismisses the work of Gigerenzer. An economic mainstream has come into being in which rational choice modelling has become an ideology rather than an empirical claim about the best ways of explaining the world, and those who dissent are considered not just wrong, but ignorant or malign. An outcome in which people shout at each other from inside their own self-referential communities is not conducive to constructive discourse.

The Rhetoric of Irrationality

From the opening of Lola Lopes’s 1991 article The Rhetoric of Irrationality (pdf) on the heuristics and biases literature:

Not long ago, Newsweek ran a feature article describing how researchers at a major midwestern business school are exploring the process of choice in hopes of helping business executives and business students improve their ‘often rudimentary decision-making skills’

[T]he researchers have, in the author’s words, ‘sadly’ concluded that ‘most people’ are ‘woefully muddled information processors who stumble along ill-chosen shortcuts to reach bad conclusions’. Poor ‘saps’ and ‘suckers’ that we are, a list of our typical decision flaws would be so lengthy as to ‘demoralize’ Solomon.

This is a powerful message, sweeping in its generality and heavy in its social and political implications. It is also a strange message, for it concerns something that we might suppose could not be meaningfully studied in the laboratory, that being the fundamental adequacy or inadequacy of people’s capacity to choose and plan wisely in everyday life. Nonetheless, the message did originate in the laboratory, in studies that have no greater claim to relevance than hundreds of others that are published yearly in scholarly journals. My goal of this article is to trace how this message of irrationality has been selected out of the literature and how it has been changed and amplified in passing through the logical and expository layers that exist between experimental conception and popularization.

Below are some of the more interesting passages. First:

Prior to 1970 or so, most researchers in judgment and decision-making believed that people are pretty good decision-makers. In fact, the most frequently cited summary paper of that era was titled ‘Man as an intuitive statistician’ (Peterson & Beach, 1967). Since then, however, opinion has taken a decided turn for the worse, though the decline was not in any sense demanded by experimental results. Subjects did not suddenly become any less adept at experimental tasks nor did experimentalists begin to grade their performance against a tougher standard. Instead, researchers began selectively to emphasize some results at the expense of others.

The Science article [Kahneman and Tversky’s 1974 article (pdf)] is the primary conduit through which the laboratory results made their way our of psychology and into other branches of the social sciences. … About 20 percent of the citations were in sources outside psychology. Of these, all used the citation to support the unqualified claim that people are poor decision-makers.

Acceptance of this sort is not the norm for psychological research. Scholars from other fields in the social sciences such as sociology, political science, law, economics, business and anthropology look with suspicion on the tightly controlled experimental tasks that psychologists study in the laboratories, particularly when the studies are carried out using student volunteers. In the case of the biases and heuristics literature, however, the issue of generalizability is seldom raised and it is rarely so much as mentioned that the cited conclusions are based on laboratory research. Human incompetence is presented as a fact, like gravity.

If you think of it, this is a great trick, for the studies in question have managed to shed their experimental details without sacrificing scientific authority. Somehow the message of irrationality has been sprung free of its factual supports, allowing it to be seen entire, unobstructed by the hopeful assumptions and tedious methodologies that brace up all laboratory research.

One interesting thread concerns the purpose of the experiments and the contrasting conclusions drawn from them. For this discussion, Lopes looks at six of the experiments in four of Kahneman and Tversky papers published between 1971 and 1973, plus a summary article in Science from 1974. One example involved this question:

Consider the letter R. Is R more likely to appear in the first position of a word or the third position of a word?

This problem involves the availability heuristic, the tendency to estimate the probability of an event by the ease with which instances of the event can be remembered or constructed in the imagination. Under the availability hypothesis, people will see how many words they can generate with R in the first or third position. It is easier to think of words with R in the first position than the third, leading them to conclude – in error – that R is more common in the first.

Lopes writes:

[T]he question is posed so that there are only two possible results. One of these will occur if the subject reasons in accord with probability theory, and the other, if the subject reasons heuristically. …

By this logic, the implications of Figure 1 [a summary of the results] are clear: subjects reason heuristically and not according to probability theory. That is the result, signed, sealed and delivered, courtesy of strong inference. But the main contribution of the research is not this result since few would have supposed that naive people know much about combinations or variances of binomial proportions or how often R appears in the third position of words. Instead, the research commands attention and respect because the various problems function as thought experiments, strengthening our grasp of the task domain by revealing critical psychological variables that do not show up in the normative analysis. …

There is, however, another way to construe this set of studies and that is by considering the predictions of the two processing modes at a higher level of abstraction. If we think about performance in terms of correctness, we see that in every case the probability mode predicts correct answers and the heuristic mode predicts errors. … [T]he sheer weight of all the wrong answers tend to deform the basic conclusion, bending it away from an evaluatively neutral description of the process and toward something more like ‘people use heuristics to judge probabilities and they are wrong’, or even ‘people make mistakes when they judge probabilities because they use heuristics’.

Happily, conclusions like these do not hold up. This is because the tuning that is necessary for constructing problems that allow strong inference on processing questions is systematically misleading when it comes to asking evaluative questions. For example, consider the letter R problem. Why was R chosen for study and not, say, B? … Of the 20 possible consonants, 12 are more common in the first position and 8 are more common in the third position. All of the consonants that Kahneman and Tversky studied were taken from the third-position group even though there are more consonants in the first-position group.

The selection of consonants was not malicious. Their use is dictated by the strong inference logic since only they yield unambiguous answers to the processing question. In other words, when a subject says that R occurs more frequently in the first position, we know that he or she must be basing the judgment on availability, since the actual frequency information would lead to the opposite conclusion. Had we used B, instead, and had the subject also judged it to occur more often in the first position, we would not be able to tell whether the judgment reflect availability or factual knowledge since B is, in fact, more likely to occur in the first position.

We see, then, that the experimental logic constrains the interpretation of the data. We can conclude that people use heuristics instead of probability theory but we cannot conclude that their judgments are generally poor. All the same, it is the latter, unwarranted conclusion that is most often conveyed by this literature, particularly in settings outside psychology.

Lopes then turns her attention onto Kahneman and Tversky’s famous Science article.

In the original experimental reports, there is plenty of language to suggest that human judgments are often wrong, but the exposition focuses mostly on the delineation of process. In the Science article, however, Tversky and Kahneman (1974) shift their attention from heuristic processing to biased processing. In the introduction they tell us: ‘This article shows that people rely on a limited number of heuristic principles which reduce the complex tasks of assessing probabilities and predicting values to simpler judgmental operations’ (p. 1124). By the time we get to the discussion, however, the emphasis has changed. Now they say: ‘This article has been concerned with cognitive biases that stem from the reliance on judgmental heuristics’ (p. 1130).

Examination of the body of the paper shows that the retrospective account is the correct one: the paper is more concerned with biases than with heuristics even though the experiments bear more on heuristics than on biases.

There is plenty more of interest in Lopes’s article. I recommend reading the full article (pdf).

Genoeconomics and designer babies: The rise of the polygenic score

When genome-wide association studies (GWAS) were first used to study complex polygenic traits, the results were underwhelming. Few genes with any predictive power were found, and those that were typically explained only a fraction of the genetic effects that twin studies suggested were there.

This led to divergent responses, ranging from continued resistance to the idea that genes affect anything, to a quiet confidence that once sample sizes became large enough those genetic effects would be found.

Increasingly large samples are now showing that the quiet confidence was justified, with a steady flow of papers emerging finding material genetic effects on traits including educational attainment, intelligence and height.

One source of this work are “genoeconomists”. From Jacob Ward in the New York Times:

Once a G.W.A.S. shows genetic effects across a group, a “polygenic score” can be assigned to individuals, summarizing the genetic patterns that correlate to outcomes found in the group. Although no one genetic marker might predict anything, this combined score based on the entire genome can be a predictor of all sorts of things. And here’s why it’s so useful: People outside that sample can then have their DNA screened, and are assigned their own polygenic score, and the predictions tend to carry over. This, Benjamin realized, was the sort of statistical tool an economist could use.

As an economist, however, Benjamin wasn’t interested in medical outcomes. He wanted to see if our genes predict social outcomes.

In 2011, with a grant from the National Science Foundation, Benjamin launched the Social Science Genetic Association Consortium, an unprecedented effort to gather unconnected genetic databases into one enormous sample that could be studied by researchers from outside the world of genetic science. In July 2018, Benjamin and four senior co-authors, drawing on that database, published a landmark study in Nature Genetics. More than 80 authors from more than 50 institutions, including the private company 23andMe, gathered and studied the DNA of over 1.1 million people. It was the largest genetics study ever published, and the subject was not height or heart disease, but how far we go in school.

The researchers assigned each participant a polygenic score based on how broad genetic variations correlated with what’s called “educational attainment.” (They chose it because intake forms in medical offices tend to ask patients what education they’ve completed.) The predictive power of the polygenic score was very small — it predicts more accurately than the parents’ income level, but not as accurately as the parents’ own level of educational attainment — and it’s useless for making individual predictions.

One of the most interesting possibilities for using polygenic scores is to use them to control for heterogeneity in research subjects. Ward writes:

Several researchers involved in the project mentioned to me the possibility of using polygenic scores to sharpen the results of studies like the ongoing Perry Preschool Project, which, starting in the early 1960s, began tracking 123 preschool students and suggested that early education plays a large role in determining a child’s success in school and life. Benjamin and other co-authors say that perhaps sampling the DNA of the Perry Preschool participants could improve the accuracy of the findings, by controlling for those in the group that were genetically predisposed to go further in school.

In a world with easy access to genetic samples, it could become common to include genetic controls in analysis of interesting societal outcomes, in the same way we now control for parental traits.

A couple of times in the article, Ward notes that “scores aren’t individually predictive”. He writes that “The predictive power of the polygenic score was very small — it predicts more accurately than the parents’ income level, but not as accurately as the parents’ own level of educational attainment — and it’s useless for making individual predictions.”

I’m not sure what Ward’s definition of “predictive” is for an individual, but take this example from the article:

The authors calculated, for instance, that those in the top fifth of polygenic scores had a 57 percent chance of earning a four-year degree, while those in the bottom fifth had a 12 percent chance. And with that degree of correlation, the authors wrote, polygenic scores can improve the accuracy of other studies of education.

That looks like predictive power to me. Take an individual from the sample or an equivalent population, look at their polygenic score, and then assign a probability of whether they will obtain a four-year degree.

I recommend reading the whole article.

A related story getting ample press is that Genomic Prediction has started to offer intelligence screening for embryos. Polygenic scores have been used with success in livestock breeding for a while now, which is often a better place to look for evidence of the future possibilities than listening to those afraid of the human implications of genetic research. From The Guardian:

The company says it is only offering such testing to spot embryos with an IQ low enough to be classed as a disability, and won’t conduct analyses for high IQ. But the technology the company is using will permit that in principle, and co-founder Stephen Hsu, who has long advocated for the prediction of traits from genes, is quoted as saying: “If we don’t do it, some other company will.”

The development must be set, too, against what is already possible and permitted in IVF embryo screening. The procedure called pre-implantation genetic diagnosis (PGD) involves extracting cells from embryos at a very early stage and “reading” their genomes before choosing which to implant. It has been enabled by rapid advances in genome-sequencing technology, making the process fast and relatively cheap. In the UK, PGD is strictly regulated by the Human Fertilisation and Embryology Authority (HFEA), which permits its use to identify embryos with several hundred rare genetic diseases of which the parents are known to be carriers. PGD for other purposes is illegal.

In the US it’s a very different picture. Restrictive laws about what can be done in embryo and stem-cell research using federal funding sit alongside a largely unregulated, laissez-faire private sector, including IVF clinics. PGD to select an embryo’s sex for “family balancing” is permitted, for example. There is nothing in US law to prevent PGD for selecting embryos with “high IQ”.

Ball also expresses a scepticism about the value of the polygenic scores:

These relationships are, however, statistical. If you have a polygenic score that places you in the top 10% of academic achievers, that doesn’t mean you will ace your exams without effort. Even setting aside the substantial proportion of intelligence (typically around 50%) that seems to be due to the environment and not inherited, there are wide variations for a given polygenic score, one reason being that there’s plenty of unpredictability in brain wiring during growth and development.

So the service offered by Genomic Prediction, while it might help to spot extreme low-IQ outliers, is of very limited value for predicting which of several “normal” embryos will be smartest. Imagine, though, the misplaced burden of expectation on a child “selected” to be bright who doesn’t live up to it. If embryo selection for high IQ goes ahead, this will happen.

Despite Ball’s scepticism about comparing “normal” embryos, I expect it won’t be long before Genomic Prediction or a counterpart is doing just that.

Steve Hsu, co-founder of Genomic Prediction, comments on the press here (and provides some links to other articles). He closes by saying:

“Expert” opinion seems to have evolved as follows:

1. Of course babies can’t be “designed” because genes don’t really affect anything — we’re all products of our environment!

2. Gulp, even if genes do affect things it’s much too complicated to ever figure out!

3. Anyone who wants to use this technology (hmm… it works) needs to tread carefully, and to seriously consider the ethical issues.

Only point 3 is actually correct, although there are still plenty of people who believe 1 and 2 :-(

How happy is a paraplegic a year after losing the use of their legs?

From Dan Gilbert’s 2004 TED talk, now viewed over 16 million times:

Let’s see how your experience simulators are working. Let’s just run a quick diagnostic before I proceed with the rest of the talk. Here’s two different futures that I invite you to contemplate. You can try to simulate them and tell me which one you think you might prefer. One of them is winning the lottery. This is about 314 million dollars. And the other is becoming paraplegic.

Just give it a moment of thought. You probably don’t feel like you need a moment of thought.

Interestingly, there are data on these two groups of people, data on how happy they are. And this is exactly what you expected, isn’t it? But these aren’t the data. I made these up!

These are the data. You failed the pop quiz, and you’re hardly five minutes into the lecture. Because the fact is that a year after losing the use of their legs, and a year after winning the lotto, lottery winners and paraplegics are equally happy with their lives.

And here’s Dan Gilbert reflecting on this statement 10 years later:

The first mistake occurred when I misstated the facts about the 1978 study by Brickman, Coates and Janoff-Bulman on lottery winners and paraplegics.

At 2:54 I said, “… a year after losing the use of their legs, and a year after winning the lotto, lottery winners and paraplegics are equally happy with their lives.” In fact, the two groups were not equally happy: Although the lottery winners (M=4.00) were no happier than controls (M=3.82), both lottery winner and controls were slightly happier than paraplegics (M=2.96).

So why has this study become the poster child for the concept of hedonic adaptation? First, most of us would expect lottery winners to be much happier than controls, and they weren’t. Second, most of us would expect paraplegics to be wildly less happy than either controls or lottery winners, and in fact they were only slightly less happy (though it is admittedly difficult to interpret numerical differences on rating scales like the ones used in this study). As the authors of the paper noted, “In general, lottery winners rated winning the lottery as a highly positive event, and paraplegics rated their accident as a highly negative event, though neither outcome was rated as extremely as might have been expected.” Almost 40 years later, I suspect that most psychologists would agree that this study produced rather weak and inconclusive findings, but that the point it made about the unanticipated power of hedonic adaptation has now been confirmed by many more powerful and methodologically superior studies. You can read the original study here.

It’s great that he is able to step back and admit his mistakes. One thing that perplexes me, however, is that he purports to show the real data on a slide:


As you can see, this runs on a scale reaching up to 70, with both measured at 50. The actual measure was on a 5-point scale. Where did these numbers come from? Did Gilbert simply make these data up?

If this were just a case of misstating the point of the study, I would feel much sympathy. As he states:

When I gave this talk in 2004, the idea that videos might someday be “posted on the internet” seemed rather remote. There was no Netflix or YouTube, and indeed, it would be two years before the first TED Talk was put online. So I thought I was speaking to a small group of people who’d come to a relatively unknown conference in Monterey, California, and had I realized that ten years later more than 8 million people would have heard what I said that day, I would have (a) rehearsed and (b) dressed better.

That’s a lie. I never dress better. But I would have rehearsed. Back then, TED talks were considerably less important events and therefore a lot more improvisational, so I just grabbed some PowerPoint slides from previous lectures, rearranged them on the airplane to California, and then took the stage and winged it. I had no idea that on that day I was delivering the most important lecture of my life.

But if that chart was made up, my sympathy somewhat fades away.

How likely is “likely”?

From Andrew Mauboussin and Michael Mauboussin:

In a famous example (at least, it’s famous if you’re into this kind of thing), in March 1951, the CIA’s Office of National Estimates published a document suggesting that a Soviet attack on Yugoslavia within the year was a “serious possibility.” Sherman Kent, a professor of history at Yale who was called to Washington, D.C. to co-run the Office of National Estimates, was puzzled about what, exactly, “serious possibility” meant. He interpreted it as meaning that the chance of attack was around 65%. But when he asked members of the Board of National Estimates what they thought, he heard figures from 20% to 80%. Such a wide range was clearly a problem, as the policy implications of those extremes were markedly different. Kent recognized that the solution was to use numbers, noting ruefully, “We did not use numbers…and it appeared that we were misusing the words.”

Not much has changed since then. Today people in the worlds of business, investing, and politics continue to use vague words to describe possible outcomes.

To examine this problem in more depth, team Mauboussin asked 1700 people to attach probabilities to a range of words or phrases. For instance, if a future event is likely to happen, what percentage of the time would you estimate it ends up happening? Or what if the future event has a real possibility of happening?

Unsurprisingly, the answers are all over the place. The HBR article has a nice chart of the distribution of responses, and you see more detailed results here. (You can also take the survey there too).

What is the range of answers for an event that is “likely”? The 90% probability range for “likely” – that is the range that 90% of the answers fell within (and 5% of the answers were above, and 5% below) was 55% to 90%. “Real possibility” had a probability range of between 20% and 80% – the phrase in near meaningless. Even “always” is ambiguous, with a probability range of 90% to 100%.

An interesting finding of the survey was that men and women differ in their interpretations. Women are more likely to take a phrase as indicating a higher probability.

So what does team Mauboussin suggest we should do? Use numbers. Pin down those subjective probabilities using objective benchmarks. Practice.

And to close with another piece of Sherman Kent wisdom:

Said R. Jack Smith:  Sherm, I don’t like what I see in our recent papers. A 2-to-1 chance of this; 50-50 odds on that. You are turning us into the biggest bookie shop in town.

Replied Kent:  R.J., I’d rather be a bookie than a [blank-blank] poet.

Avoiding trite lists of biases and pictures of human brains on PowerPoint slides

From a book chapter by Greg Davies and Peter Brooks, Practical Challenges of Implementing Behavioral Finance: Reflections from the Field (quotes taken from a pre-print):

Taken in isolation, the ideas and concepts that comprise the field of behavioral finance are of very little practical use. Indeed, many of the attempts to apply these ideas amount to little more than a trite list of biases and pictures of human brains on PowerPoint slides. Talking a good game in the arena of behavioral finance is easy, which often leads to the misperception that it is superficial. Yet, making behavioral finance work in practice is much more challenging: it requires integrating these ideas with working models, information technology (IT) systems, business processes, and organizational culture.

Substitute the word “behavioural finance” with “behavioural economics” and its kin, and the message reads the same.

On the “bias” bias:

Today, extremely long lists of biases are available, which do little to convey the underlying sophistication, complexity, and thoroughness of more than half a century of highly robust experimental and theoretical work. These lists provide no real framework for potential practitioners to deploy when approaching a tangible problem. And many of these biases appear to overlap or conflict with each other, which can make behavioral finance appear either very superficial or highly confused.

The easily accessible examples that academics have used to illustrate these biases to wide audiences have sometimes led to the impression that behavioral economics is an easy field to master. This misrepresentation leads to inevitable disappointment when categorizing biases proves not to be an easy panacea. A perception of the field as “just anecdotes and parlor games” reduces the willingness of the commercial world to put substantial investments of time and resource into building applications grounded on the underlying ideas. Building behavioral finance ideas into commercial applications requires both depth and breadth of understanding of the theory and, in many cases, large resource commitments.

On whether there is a grand unified theory:

A commonly expressed concern, at least in the mainstream press, is that there exists no grand unified theory of behavioral economics, and that the field is thus merely a chaotic collection of unconnected and often contradictory findings. For the purpose of practical implementation, the notion that this is, or needs to be, a clearly defined field should be eliminated, reducing the desire to erode it with arbitrary labels and definitions. Human behavior operates at multiple levels from the neurological to complex social interactions. Any quest for a grand unified theory to mirror that of physical sciences may well be entirely misguided, together with the notion that such a theory is necessary for the broad field to be useful. Much more effective is an approach of treating the full range of behavioral findings as a rich toolbox that can be applied to, and tested on, a range of practical concerns.

On the superficial application:

The first major challenge is that behavioral finance is not particularly effective if applied superficially. Yet, superficial attempts are commonplace. Some seek to do little more than offer a checklist of biases, hoping that informing people of poor decision-making can solve the problem. Instead, a central theme of decision science is the consistent finding that merely informing people of their adverse behavioral proclivities is very seldom effective in combating them.

Because behavioral finance is both topical and fascinating to many people, it attracts ‘hobbyists’ who can readily recite a number of biases, but who neither have the depth of knowledge of the field overall, nor a solid grasp of the theoretical underpinnings of the more technical aspects of the field. …

This chapter is not an attempt to erect barriers to entry amongst behavioral practitioners and claim that only those with advanced degrees in the field should be taken seriously. On the contrary, the effect of greater academic training can cause its beneficiaries to hold on too closely to narrow and technical interpretations of the field to make them effective practitioners. Indeed, some of the most effective practitioners do not have an extensive academic background in the field. However, they have invested considerable time and effort getting to know and deeply understand the breadth and depth of the field.

And on naive buyers:

Limited study of behavioral finance through reading the popular books on the topic may equip one to sound knowledgeable and appear convincing. However, as a relatively new field, the purchasers of behavioral expertise are seldom equipped to know the difference and may be unable to tell a superficially convincing approach from approaches that embody true understanding. This leaves the field open to consultants peddling ‘behavioral expertise’ but having in their toolkit little more than a list of biases that they apply sequentially and with little variation to each problem encountered. Warning flags should go up whenever the proposal rests heavily on catalogues of behavioral biases or contains a preponderance of pictures of brains.

Chris Voss’s Never Split the Difference: Negotiating as if your life depended on it

Summary: Interesting ideas on how to approach negotiation, but I don’t know how much weight to give them. How much expertise could be developed in hostage negotiations? Can that expertise be distilled into principles, or is much of it tacit knowledge?

Chris Voss’s Never Split the Difference: Negotiating as if your life depended on it (written with Tahl Raz) is a distillation of Voss’s approach to negotiation, developed through 15 years negotiating hostage situations for the FBI. Voss was the FBI’s lead international kidnapping negotiator, and for the last decade he has run a consulting firm that guides organisations through negotiations.

I am not sure how I should rate the book. There are elements I like, elements that seem logical, and yet a sense that much is just storytelling. I don’t know enough of the negotiation literature to understand what other support there might be for Voss’s approach – and Voss generally doesn’t draw on the literature – so it is not clear what weight I should give to his arguments.

Voss’s central thread is that we should not approach negotiation as though it is a purely rational exercise. No matter how you frame the negotiation in advance, there is no escaping the humans that will be engaging in that negotiation.

This argument seems obvious, as in many negotiations you will be dealing with emotional people. Yet a flip through some of the classic negotiating texts, such as Getting to Yes, shows that the consideration of emotion is often shallow. Emotion is largely discussed as something to be overcome so that a mutually beneficial deal can be reached.

A deeper level to understanding the role emotion is to see how integral it is to the negotiating process. Emotion and decision-making cannot be disentangled.

In the opening chapter, Voss links this need to consider emotions to the work of Daniel Kahneman and Amos Tversky (unfortunately described as University of Chicago professors who discovered more than 150 cognitive biases). Voss draws on Kahneman’s distinction between the two modes of thought described in Thinking, Fast and Slow: the fast, instinctive and emotional System 1, and the slow, deliberative and logical System 2. If you go into a negotiation with all the tools to deal with System 2 without the tools to read, understand and manipulate System 1, you were trying to make an omelette without cracking an egg.

Despite being prominent in the opening, Kahneman and Tversky’s work is only briefly considered in other parts of the book, mainly in one chapter that includes examination of anchoring and loss aversion. By manipulating someone’s reference point and capitalising on their fear of loss, you can shift the terms of what they will agree to.

For instance, Voss suggests that you might initially anchor the other side’s expectations through an “accusation audit”, whereby you list every terrible thing the other side could say about you in advance. You then create a frame so that the agreement is to avoid loss. Putting those together, you might start out by saying that you have a horrible deal for them, but still want to bring it to them before you give it to somebody else. By taking the sting out of the low offer and framing acceptance of that offer as an opportunity to avoid loss, you might induce acceptance.

Voss also discusses the idea of setting a very high or low anchor early in negotiations, although he notes that this comes at a cost. It might be effective against the inexperienced, but you lose the opportunity of learning from the other side when they go first. If prepared, you can resist their anchor, and if you are in a low information environment, you might be pleasantly surprised.

Voss recognises the human desire for fairness in another important factor. While Voss draws on the academic literature to demonstrate that desire, his proposed approaches to fairness in negotiation are not put in the context of that literature. As a result, I don’t have much of a grip on whether his ideas – such as avoiding accusations of unfairness, and giving the other side permission to stop you at any time of they feel you are being unfair – are effective. It’s polite, sounds reasonable, but does it work?

The concept that gets the most attention in the book is tactical empathy. This involves active listening, with tools such as mirroring (repeating the last few words someone said to induce them to keep explaining), labelling (giving a name to their feelings) and summarising their position back to them. I am partial to these ideas. By listening, you can learn a lot. I have always found that simple repetition of concepts, whether through mirroring, labelling or summarising, are powerful tools to get people to open up and to understand their position.

Another thread to the book is the idea of saying no without saying no, generally through the use of calibrated questions. Calibrated questions are questions with no fixed answer, and that can’t be answered with a yes or no. They typically start with “how” or “what”, rather than “is” or “does”. They can be used to give the other side the illusion of control while at the same time pushing them to think about solving your problem. If the price is higher than you want to pay, you might say “how am I supposed to pay that?” Calibrated questions also have broader use through the negotiation to learn more from your counterpart.

Ideas such as this seem attractive, but I don’t know how much weight I should put on Voss’s arguments. This is largely because I don’t how much expertise you could develop in hostage negotiation, and the degree to which that expertise is tacit knowledge. Voss notes that his expertise is built from experience, not from textbooks, and that his approach is designed for the real world. Can a human build skills for this real world? Is there rapid feedback on decisions, with an opportunity to learn?

In one sense there is feedback, with the hostages released or not, and the terms of that release known. But each negotiation would involve a multitude of decisions and factors. Conversations might extend for days or weeks. How effectively can you isolate the cause of the outcome? How stable is that cause-effect relationship across different negotiations?

In a podcast episode with Sam Harris, Voss mentioned that he had been involved around 150 hostage negotiations around the world. That would seem a fair number to start to be able to identify patterns, particularly if you consider that through a negotiation there might be many smaller opportunities of feedback, such as extracting information. But as Voss’s stories through the book show, these negotiations span across many different countries and contexts. How many of those elements are common and stable enough for true expertise to develop? Most of his experience involved international kidnapping – a commodity business involving financial transactions. Can the lessons from these be applied elsewhere?

Voss (and the FBI more generally) would have had a broader range of examples to draw on, and Voss’s more recent experience in consulting on negotiation could provide further opportunities to develop expertise. But it’s not obvious how that experience is incorporated into expertise that in turn can be effectively distilled into a book.

Me on Rationally Speaking, plus some additional thoughts

My conversation with Julia Galef on Rationally Speaking is out, exploring territory on how behavioural economics and its applications could be better.

There are links to a lot of the academic articles we discuss on the Rationally Speaking site. We also talk about several of my own articles, including:

Please not another bias! An evolutionary take on behavioural economics (This article is my presentation script for a marketing conference. I’ve been meaning to rewrite it as an article for some time to remove the marketing flavour and to replace the evolutionary discussion with something more robust. Much of the evolutionary psychology experimental literature relies on priming, and I’m not confident the particular experiments I reference will replicate.)

Rationalizing the “Irrational”

When Everything Looks Like a Nail: Building Better “Behavioral Economics” Teams

The difference between knowing the name of something and knowing something

There were a couple of questions for which I could have given different (better) answers, so here are some additional thoughts.

An example of evolutionary biology “explaining” a bias: I gave an example of the availability heuristic, but one for which more work has been done explicitly on the evolutionary angle is loss aversion. Let me quote from an article by Owen Jones, who has worked directly on this:

To test the idea that a variety of departures from rational choice predictions might reflect evolved adaptations, I had the good fortune to team up with primatologist Sarah Brosnan.

The perspective from behavioral biology on the endowment effect is simple: in environments that lacked today’s novel features (such as reliable property rights, third-party enforcement mechanisms, and the like) it is inherently risky to give up what you have for something that might be slightly better. Nothing guarantees that your trading partner will perform. So in giving up one thing for another you might wind up with neither.

So the hypothesis is that natural selection would historically have favored a tendency to discount what you might acquire or trade for, compared to what you already have in hand, even if that tendency leads to irrational outcomes in the current (demonstrably different) environment. The basis of the hypothesis is a variation of the maxim that a bird in the hand has been, historically, worth two in the bush.

First, we predicted that if the endowment effect were in fact an example of Time-Shifted Rationality then the endowment effect would likely be observable in at least some other species. Here’s why, in a nutshell. … If the endowment effect tends to lead on average to behaviors that were adaptive when there are asymmetric risks of keeping versus exchanging, then this isn’t likely to be true only for humans. It should at a minimum be observable in some or all of our closest primate relatives, i.e. the other 4 of the 5 great apes: chimpanzees, orangutans, gorillas, and bonobos.

Second, we predicted that if the endowment effect were in fact an example of Time-Shifted Rationality, the prevalence of the endowment effect in other species is likely to vary across categories of items. This follows because selection pressures can, and very often do, narrowly tailor behavioral predispositions that vary as a function of the evolutionary salience (i.e., significance) of the circumstance. Put another way, evolved behavioral adaptations can be “facultative,” consisting of a hierarchical set of “if-then” predispositions, which lead to alternate behaviors in alternate circumstances. Because no animal evolved to treat all objects the same, there’s no reason to expect that they, or humans, would exhibit the endowment effect equally for all objects, or equally in all circumstances. Some classes of items are obviously more evolutionarily significant than others – simply because value is not distributed evenly across all the items a primate encounters.

Third (and as a logical consequence of prediction (2)), we predicted that the prevalence of the endowment effect will correlate – increasing or decreasing, respectively – with the increasing or decreasing evolutionary salience of the item in question. Evolutionary salience refers to the extent to which the item, under historical conditions, would contribute positively to the survival, thriving, and reproducing of the organism acquiring it.

To test these predictions, we conducted a series of experiments with chimpanzee subjects. No other extant theory generated this set of three predictions. And the results of our experiments corroborated all three.

Our results provided the first evidence of an endowment effect in another species. Specifically, and as predicted, chimpanzees exhibit an endowment effect consonant with many of the human studies that find the effect. As predicted, the prevalence of the effect varies considerably according to the class of item. And, as predicted most specifically, the prevalence was far greater (fourteen times greater, in fact) within a class of trading evolutionarily salient items – here, food items – for each other than it was when trading within a class of non- evolutionarily salient items – here toys. Put another way, our subjects were fourteen times more likely to hang onto their less-preferred evolutionarily salient item, when they could trade it for their more-preferred evolutionarily salient item, than they were to hang onto their less-preferred item with corresponding, but not evolutionarily salient, items.

On the role of signalling: Costly signalling theory is the idea that for a signal of quality to be honest, it should impose a cost on the signaller that someone without that quality would not be able to bear. For instance, peacocks have large unwieldy tails that consume a lot of resources, with only the highest quality males able to incur this cost.

To understand how signalling might affect our understanding of human behaviour, I tend to categorise the possible failures to understand their behaviour in the following three ways.

First, we can simply fail to understand the objective. If a person’s objective is status, and we try to understand their actions as attempts to maximise income, we might mistakenly characterise their decisions as errors.

Second, we might understand the proximate objective, but fail to realise that there is an underlying ultimate objective. Someone might care about, say, income, but if it is in the context of achieving another objective, such as getting enough income to make a specific purchase, we might similarly fail to properly assess the “rationality” of their decisions. For example, there might be a minimum threshold that leads us to take “risky” actions in relation to our income.

Signalling sometimes falls into this second basket, as the proximate objective is the signal for the ultimate objective. For instance, if we use education as a signal of our cognitive and social skills to get a job, viewing the objective as getting a good education misses the point.

Third, even if we understand the proximate and ultimate objectives, we might fail to understand the mechanism by which the objective is achieved. Signalling can lead to complicated mechanisms that are often overlooked.

To illustrate, even if we know that someone is only seeking further education to increase their employment prospects, you would expect different behaviour if education was a signal and not a source of skills for use on the job. If education is purely a signal, people may only care about getting the credential, not what they learn. If education serves a more direct purpose, we would expect students to invest much more in learning.

I discuss a couple of these points in my Behavioral Scientist article Rationalizing the Irrational”. Evolutionary biology is a great source of material on signalling, although as I have written about before, economists did have at least one crucial insight earlier.

Finally, I’ve plugged Rationally Speaking before, and here is a list of some of the episodes I enjoyed the most (there are transcripts if you prefer to read):

Tom Griffiths on how our biases might be signs of rationality

Daniel Lakens on p-hacking

Paul Bloom on empathy

Bryan Caplan on parenting

Phil Tetlock on forecasting

Tom Griffiths and Brian Christian on Algorithms to Live By

Don Moore on overconfidence

Jason Brennan on “Against Democracy”

Jessica Flanigan on self-medication

Alison Gopnik on parenting

Christopher Chabris on collective intelligence

I limited myself to eleven here – there are a lot of other great episodes worth listening to.

An evolutionary projection of global fertility and population: My new paper (with Lionel Page) in Evolution & Human Behavior

Forecasting fertility is a mug’s game. Here is a picture of fertility forecasts by the US Census Bureau through the baby boom and subsequent fertility drop (from Lee and Tuljapurkar’s Population Forecasting for Fiscal Planning: Issues and Innovations). The dark line is actual, the dotted line the various forecasts.

US Census forecasts

I am not sure that the science of fertility forecasting in developed countries has made substantial progress since any of those forecasts were made. But that doesn’t stop a lot of people from trying.

One of the most high profile forecasts of fertility and population comes from the United Nations, which publishes global population forecasts through to 2100. Individual country forecasts are currently developed using a Bayesian methodology, which are then aggregated to form a global picture. The development of this methodology led to a heavily cited 2014 paper titled “World population stabilization unlikely this century” (pdf) and the conclusion that there was only a 30% probability that global population growth would cease this century.

These projections contain an important fertility assumption. For countries that have undergone the demographic transition to low fertility, the assumption is that their fertility rate will oscillate around a long-term mean. While there has been some debate around whether this long-term mean would be the replacement rate or lower, the (almost theory-free) assumption of oscillation around a long-term level dominates the forecasts.

There is at least one theoretical basis for doubting this assumption. In a 2013 working paper (co-authored with Oliver Richards), we argued that as fertility was heritable, this would tend to increase fertility and population growth. Those with a preference for higher fertility would have more children, with their children in turn having a preference for more children. This high-fertility type would eventually come to dominate the population, leading to markedly higher population that forecast.

As I noted when the working paper was released, we were hardly the first to propose this idea. Fisher noted the power of higher fertility groups in The Genetical Theory of Natural Selection. I had seen Razib Khan, Robin Hanson and John Hawks mention the idea. Murphy and Wang examined the concept in a microsimulation. Many papers on the heritability of fertility hint at it. Rowthorn’s paper on fertility and religiosity also points in this direction. We simply added a touch of quantitative modelling to explore the speed of the change, and have now been followed by others with different approaches (such as this).

Shortly after I posted about the working paper, I received an email from Lionel Page suggesting that we should turn this idea into more detailed simulation of world population. Five years after Lionel’s email, that simulation has just been released in a paper published in Evolution & Human Behavior. Here is the abstract:

The forecasting of the future growth of world population is of critical importance to anticipate and address a wide range of global challenges. The United Nations produces forecasts of fertility and world population every two years. As part of these forecasts, they model fertility levels in post-demographic transition countries as tending toward a long-term mean, leading to forecasts of flat or declining population in these countries. We substitute this assumption of constant long-term fertility with a dynamic model, theoretically founded in evolutionary biology, with heritable fertility. Rather than stabilizing around a long-term level for post-demographic transition countries, fertility tends to increase as children from larger families represent a larger share of the population and partly share their parents’ trait of having more offspring. Our results suggest that world population will grow larger in the future than currently anticipated.

Our methodology is almost identical to the United Nations methodology, except we substitute the equation by which fertility converges to a long-term mean with the breeder’s equation, which captures the response to selection of a trait.

And here are a few charts showing the simulation results: grey is the base United Nations simulation, black the evolutionary simulations, the dashed lines the 90% confidence intervals. First, European total fertility rate (TFR) and population, which shifts from terminal decline to growth:


Next, North America, which increases its rate of growth:


Next, Asia:


And finally, the global result:


The punchline is that the probability of global population stabilisation this century becomes less than 5%. Europe and North America that are most effected within this century. Asia is less effected, but still shifts from a scenario of decline to one of growth, and due to its size has the largest effect on the global projections.

Having opened by saying that fertility forecasting is a mug’s game, should the same be said about these forecasts? The answer to that question is largely yes. Cultural and technological change, environmental shocks and the like will almost certainly lead to a different outcome to the one the United Nations or we have forecast. We effectively argue this in the section of the paper on cultural evolution (which was added following some helpful reviewer comments).

But to get lost in the specific numbers is to lose sight of the exercise. We are arguing that an important assumption underpinning the United Nations exercise should be reconsidered. We’ve given a rough idea of how far that assumption could shift the fertility and population outcomes, and they are of a magnitude that would see some parts of the world looking quite different by the end of the century. If we assume constant fertility despite this evolutionary dynamic, we risk a material downward bias in projecting future fertility and population.

As an aside, the freely available methodology and R packages that underpin the United Nations forecasts greatly facilitated our efforts. We spent a lot of time considering how to implement the simulations, but on discovering the openness of the United Nations approach, we found a great place to implement our tweaked approach. In that spirit, you can access our modified packages and the data used to generate them here at OSF.

If you can’t access the paper through the paywall and would like me to email you a copy, hit me up in the comments below.

The Paradox of Trust

In a chapter of Robert Sugden’s The Community of Advantage: A Behavioural Economist’s Defence of the Market, he makes some interesting arguments about how we should interpret the results of the trust game. (This is the last post

First, what is the trust game:

The ‘Trust Game’ was first investigated experimentally by Joyce Berg, John Dickhaut, and Kevin McCabe (1995). … In Berg et al.’ s game, two players (A and B) are in separate rooms and never know one another’s identity. Each player is given $10 in one-dollar bills as a ‘show up fee’. A puts any number of these bills, from zero to ten, in an envelope which will be sent to B; he keeps the rest of the money for himself. The experimenter supplements this transfer so that B receives three times what A chose to send. B then puts any number of the bills she has received into another envelope, which is returned to A; she keeps the rest of the money for herself. The game is played once only, and the experiment is set up so that no one (including the experimenter) can know what any other identifiable person chooses to do. The game is interesting to theorists of rational choice because it provides the two players with an opportunity for mutual gain, but if the players are rational and self-interested, and if each knows that this is true of the other, no money will be transferred. (It is rational for B to keep everything she is sent; knowing this, it is rational for A to send nothing.)

There is a sizeable body of empirical evidence that player A often does send money and B often returns money. How can this be explained? One option is to draw on the concept of reciprocity.

In this literature, it is a standard modelling strategy to follow Matthew Rabin (1993) in characterizing intentions as kind or unkind. … The greater the degree to which one player benefits the other by forgoing his own payoffs, the kinder he is. Rabin’s hypothesis is that individuals derive utility from their own payoffs, from being kind towards people who are being kind to them, and from being unkind towards people who are being unkind to them.

But if you think this hypothesis through, there is a problem, which Sugden calls the Paradox of Trust.

[I]t seems that any reasonable extension of Rabin’s theory will have the following implication for the Trust Game: It cannot be the case that A plays send, expecting B to play return with probability 1, and that B, knowing that A has played send, plays return. To see why not, suppose that A chooses send, believing that B will choose return with probability 1.

A has not faced any trade-off between his payoffs and B’s, and so has not had the opportunity to display kindness or unkindness.

Since Rabin often describes positive reciprocity as ‘rewarding’ kind behaviour (and describes negative reciprocity as ‘punishing’ unkind behaviour), the idea seems to be that B’s choice of return is her way of rewarding A for the goodness of send. But if A’s action was self-interested, it is not clear why it deserves reward.

It may seem paradoxical that, in a theory in which individuals are motivated by reciprocity, two individuals cannot have common knowledge that they will both participate in a practice of trust. Nevertheless, this conclusion reflects the fundamental logic of a modelling strategy in which pro-social motivations are represented as preferences that are acted on by individually rational agents. It is an essential feature of (send, return), understood as a practice of trust, that both players benefit from both players’ adherence to the practice. If A plays his part in the practice, expecting B to play hers, he must believe and intend that his action will lead to an outcome that will in fact benefit both of them. Thus, if pro-sociality is interpreted as kindness—as a willingness to forgo one’s own interests to benefit others—A’s choice of send cannot signal pro-social intentions, and so cannot induce reciprocal kindness from B. I will call this the Paradox of Trust.

Is there an alternative way of seeing this problem? Sugden turns to the idea of mutually beneficial exchange.

The escape route from the Paradox of Trust is to recognize that mutually beneficial cooperation between two individuals is not the same thing as the coincidence of two acts of kindness. When A chooses send in the Trust Game, his intention is not to be kind to B: it is to play his part in a mutually beneficial scheme of cooperation, defined by the joint action (send, return). … If A is completely confident that B will reciprocate, and if that confidence is in fact justified, A’s choice of send is in his own interests, while B’s choice of return is not in hers. Nevertheless, both players can understand their interaction as a mutually beneficial cooperative scheme in which each is playing his or her part.

This interpretation has implications for how we should view market exchange.

Theorists of social preferences sometimes comment on the fact that behaviour in market environments, unlike behaviour in Trust and Public Good Games, does not seem to reveal the preferences for equality, fairness and reciprocity that their models are designed to represent. The explanation usually offered is that people have social preferences in all economic interactions, but the rules of the market are such that individuals with such preferences have no way of bringing about the fair outcomes that they really desire.

Could it be that behaviour in markets expresses the same intentions for reciprocity as are expressed in Trust and Public Good Games, but that these intentions are misrepresented in theories of social preference?