The case against loss aversion

Summary: Much of the evidence for loss aversion is weak or ambiguous. The endowment effect and status quo bias are subject to multiple alternative explanations, including inertia. There is possibly better evidence for loss aversion in the response to risky bets, but what emerges does not appear to be a general principle of loss aversion. Rather, “loss aversion” is a conditional effect that most typically emerges when rejecting the bet is not the status quo and the stakes are material.

[As a postscript, a week after publishing this post, a working paper for a forthcoming Journal of Consumer Psychology article was released. That paper addresses some of the below points. A post on that paper is in the works.]

In a previous post I flagged three critiques of loss aversion that had emerged in recent years. The focus of that post was Eldad Yechiam’s analysis of the assumption of loss aversion in Kahneman and Tversky’s classic 1979 prospect theory paper.

The second critique, and the focus of this post, is an article by David Gal and Derek Rucker The Loss of Loss Aversion: Will It Loom Larger Than Its Gain (pdf). Its abstract:

Loss aversion, the principle that losses loom larger than gains, is among the most widely accepted ideas in the social sciences. The first part of this article introduces and discusses the construct of loss aversion. The second part of this article reviews evidence in support of loss aversion. The upshot of this review is that current evidence does not support that losses, on balance, tend to be any more impactful than gains. The third part of this article aims to address the question of why acceptance of loss aversion as a general principle remains pervasive and persistent among social scientists, including consumer psychologists, despite evidence to the contrary. This analysis aims to connect the persistence of a belief in loss aversion to more general ideas about belief acceptance and persistence in science. The final part of the article discusses how a more contextualized perspective of the relative impact of losses versus gains can open new areas of inquiry that are squarely in the domain of consumer psychology.

The release of Gal and Rucker’s paper was accompanied by a Scientific American article by Gal, Why the Most Important Idea in Behavioral Decision-Making Is a Fallacy. It uses somewhat stronger language. Here’s a snippet:

[T]here is no general cognitive bias that leads people to avoid losses more vigorously than to pursue gains. Contrary to claims based on loss aversion, price increases (ie, losses for consumers) do not impact consumer behavior more than price decreases (ie, gains for consumers). Messages that frame an appeal in terms of a loss (eg, “you will lose out by not buying our product”) are no more persuasive than messages that frame an appeal in terms of a gain (eg, “you will gain by buying our product”).

People do not rate the pain of losing $10 to be more intense than the pleasure of gaining $10. People do not report their favorite sports team losing a game will be more impactful than their favorite sports team winning a game. And people are not particularly likely to sell a stock they believe has even odds of going up or down in price (in fact, in one study I performed, over 80 percent of participants said they would hold on to it).

This critique of loss aversion is not completely new. David Gal has been making related arguments since 2006. In this more recent article, however, Gal and Rucker draw on a larger literature and some additional experiments to expand the critique.

To frame their argument, they describe three potential versions of loss aversion:

  • The strong version: losses always loom larger than gains
  • The weak version: losses on balance loom larger than gains
  • The contextual version: Depending on context, losses can loom larger than gains, they can have equal weighting, gains can loom larger than losses

The strong version appears to be a straw man that few would defend, but there is some subtlety in Gal and Rucker’s definition. They write:

This strong version does not require that losses must outweigh gains in all circumstances, as factors such as measurement error and boundary conditions might obscure or reduce the fundamental propensity for losses to be weighted more than equivalent gains.

An interesting point by Gal and Rucker is that for most research on the boundaries or moderators of loss aversion, loss aversion is the general principle around which the exceptions are framed. If people don’t exhibit loss aversion, it is usually argued that the person is not enoding the transaction as a loss, so loss aversion does not apply. The alternative that the gains have equal weight to (or greater weight than) the loss is not put forward. So although few would defend a blunt reading of the strong version, many researchers take it as though people are loss averse unless certain circumstances are present.

Establishing the weak version seems difficult. Tallying studies in which losses loom larger and where gains dominate would provide evidence more on the focus of research than the presence of a general principle of loss aversion. It’s not even clear how you would compare across different contexts.

Despite this difficulty (or possibly because of it), Gal and Rucker come down firmly in favour of the contextual version. They do this not through tallying or comparing the contexts in which losses or gains loom larger, but by arguing that most evidence of loss aversion is ambiguous at best.

Loss aversion as evidence for loss aversion

The principle of loss aversion is descriptive. It is a label applied to an empirical phenomena. It is not an explanation. Similarly, the endowment effect, our tendency to ascribe more value to items that we have than to those we don’t, is a label applied to an empirical phenomena.

Despite being descriptive, Gal and Rucker note that loss aversion is often used as an explanation for choices. For example, loss aversion is often used as an explanation for the endowment effect. But using a descriptive label as an explanation provides no analytical value, with what appears to be an explanation simply application of a different label. (Owen Jones suggests that stating the endowment effect is due to loss aversion is no more useful than labelling human sexual behaviour as being due to abstinence aversion. I personally think it is marginally more useful, if only for the fact there is now a debate as to whether loss aversion and the endowment effect are related. The transfer of label shows that you believe these empirical phenomena have the same psychological basis.)

Gal and Rucker argue that the application of the loss aversion label to the endowment effect leads to circular arguments. The endowment effect is used as evidence for loss aversion, and, as noted above, loss aversion is commonly used to explain the endowment effect. This results in an unjustified reinforcement of the concept, and a degree of neglect of alternative explanations for the phenomena.

I have some sympathy for this claim, although am not overly concerned by it. The endowment effect has multiple explanations (as will be discussed below), so it is weak evidence of loss aversion at best. However, it is rare that the endowment effect is the sole piece of evidence presented for the existence of loss aversion. It is more often one of a series of stylised facts for which a common foundation is sought. So although there is circularity, the case for loss aversion does not rest solely on that circular argument.

Risky versus riskless choice

Much of Gal and Rucker’s examination of the evidence for loss aversion is divided between riskless and risky choice. Riskless choice involves known options and payoffs with certainty. Would you like to keep your chocolate or exchange it for a coffee mug? In risky choice, the result of the choice involves a payoff that becomes known only after the choice. Would you like to accept a 50:50 bet to win $110, lose $100?

Below is a collection of their arguments as to why loss aversion is not the best explanation for many observed empirical results sorted across those two categories.

Riskless choice – status quo bias and the endowment effect

Gal and Rucker’s examination of riskless choice centres on the closely related concepts of status quo bias and the endowment effect. Status quo bias is the propensity for someone to stick with the status quo option. The endowment effect is the propensity for someone to value an object they own over an object that they would need to acquire.

Status quo bias and the endowment effect are often examined in an exchange paradigm. You have been given a coffee mug. Would you like to exchange it for a chocolate? The propensity to retain the coffee mug (or the chocolate if that was what they were given first) is labelled as either status quo bias or the endowment effect. Loss aversion is often used to explain this decision, as the person would lose the status quo option or their current endowment when they choose an alternative.

Gal and Rucker suggests that rather than being driven by loss aversion, status quo bias in this exchange paradigm is instead due to a preference for inaction over action (call this inertia). A person needs a psychological motive for action. Gal examined this in his 2006 paper when he asked experimental subjects to imagine that they had a quarter minted in one city, and then whether they would be willing to change it for a nickel minted in another. Following speculation by Kahneman and others that people do not experience loss aversion when exchanging identical goods, Gal considered that a propensity for the status quo absent loss aversion would indicate the presence of inertia.

Gal found that despite there being no loss in the exchange of quarters, the experimental subjects preferred the status quo of keeping the coin they had. Gal and Rucker replicated this result on Amazon Turk, offering to exchange one hypothetical $20 bill for another. They took this as evidence of inertia.

Apart from the question of what weight you should give an experiment involving hypothetical coins, notes and exchanges, I don’t find this a convincing demonstration that inertia lies behind the status quo bias. Exchange does involve some transaction costs (in the form of effort, however minimal, even if you are told to assume they are insignificant).
In his 2006 paper, Gal reports other research where people traded identical goods when paid a nickel to cover “transaction costs”. The token amount spurred action.

Those experiments, however, involved transactions of goods with known values. The value of a quarter is clear. In contrast, Gal’s exploration for the status quo bias in his 2006 paper involved goods without an obvious face value. This is important, as Gal argued that people have “fuzzy preferences” that are often ill-defined and constructed on an ad hoc basis. If we do not precisely judge the attractiveness of chocolate or a mug, we may not have a precise ordering of preference between the two that would justify choosing one after another. Under Gal’s concept of inertia, the lack of a psychological motive to change results in us sticking with the status quo mug.

Contrasting this with the exchange of quarters, there the addition of a nickel to cover trading expenses allows for a precise ordering of the two options, as they are easily comparable monetary sums. In the case of a mug and chocolate, addition of a nickel is unlikely to make the choice any easier as the degree of fuzziness extends over a much larger range.

The other paradigm under which the endowment effect is explored is the valuation paradigm. The valuation paradigm involves asking someone what they would be willing to pay to purchase or acquire an item, or how much they would require to be paid to accept an offer to purchase an item in their possession. The gap between this willingness to pay and the typically larger willingness to accept is the additional value given to the endowed good. (For some people this is how status quo bias and the endowment effect are differentiated. Status quo bias is the maintenance of the status quo in an exchange paradigm, the endowment effect is the higher valuation of endowed goods in the valuation paradigm. However, many also label the exchange paradigm outcome as being due to the endowment effect. Across the literature they are often used interchangeably.)

This difference between willingness to pay and accept in the valuation paradigm is often cited as evidence of loss aversion. But as Gal and Rucker argue, this difference has alternative explanations. Fundamentally different questions are asked when seeking an individual’s willingness to accept (what is the market value?) and their willingness to pay (what is their personal utility?). Only willingness to pay is affected by budget constraints.

Although not mentioned in the 2018 paper, Gal’s 2006 paper suggests this gap may also be due to fuzzy preferences, with the willingness to pay and willingness to accept representing the two end points of the fuzzy range of valuation. Willingness to pay is the lower bound. For any higher amount, they are either indifferent (the range of fuzzy preferences) or would prefer the monetary sum in their hand. Willingness to accept is the upper bound. For any lower amount they are either indifferent (the range of fuzzy preferences) or would prefer to keep the good.

There are plenty of experiments in the broader literature seeking to tease out whether the endowment effect is due to loss aversion or alternative explanations of the type above. Gal and Rucker report their own (unpublished) set of experiments where they seek to isolate inertia as the driver of the difference between willingness to pay and willingness to accept. They asked for experimental subjects’ willingness to pay to obtain a good, versus their willingness to retain a good. For example, they compared subjects’ willingness to pay to fix a phone versus their willingness to pay to get a repaired phone. They asked about their willingness to expend time to drive to get a new notebook they left behind versus their willingness to drive to get a brand new notebook. They asked about their willingness to pay for fibre optic internet versus their willingness to pay to retain fibre optic internet that they already had. For each choices the subject needs to act to get the good, so inertia is removed as a possible explanation of a preference to retain an endowed good.

With fuzzy preferences under this experimental set up, both willingness to pay and willingness to retain would be the lower bound, as any higher payment would lead to indifference or preference of the monetary sum. Here Gal and Rucker found little difference between willingness to pay and willingnes to accept.

Gal and Rucker characterise each of the options as involving choices between losses and gains, and survey questions put to the experimental subjects confirmed that most were framing the choices in that way. This allowed them to point to this experiment as evidence against loss aversion driving the endowment effect. Remove inertia but leave the loss/gain framing, and the effect disappears.

However, the experimental implementation of this idea is artificial. Importantly, the decisions are hypothetical and unincentivised. Whether coded as a loss or gain, the experimental subjects were never endowed with the good and weren’t experiencing a loss.

More convincing evidence, however, came from Gal and Rucker’s application of this idea in an exchange paradigm. In one scenario, people were endowed with a pen or chocolate bar. They were then asked to choose between keeping the pen or swapping for the chocolate bar, so an active choice was required for either option. Gal and Rucker found that regardless of the starting point, roughly the same proportion chose the pen or chocolate bar. This constrasts with a more typical endowment effect experimental setup that they also ran, in which they simply asked people given a pen or chocolate bar whether they would like to exchange. Here the usual endowment effect pattern emerged, with people more likely to keep the endowed good.

Like the endowment effect experiments they critique, this result is subject to alternative explanations, the simplest (although not necessarily convincing) being that the reference point has been changed by the framing of the question. By changing the status quo, you also change the reference point. (I should say this type of argument involving ad hoc stories about changes in reference points is one of the least satisfactory elements of prospect theory.)

Despite the potential for alternative explanations, these experiments are the beginning of a body of evidence for inertia driving some results currently attributed to loss aversion. Gal and Rucker’s argument against use of the endowment effect as evidence of loss aversion is even stronger. There are many alternative explanations to loss aversion for the status quo bias and endowment effect. The evidence for loss aversion is better found elsewhere.

Risky choice

Gal and Rucker’s argument concerning risky bets resembles that for riskless choice. Many experiments in the literature involve an offer of a bet, such as a 50:50 chance to win $100 or lose $100, which the experimental subject can accept or reject. Rejection is the status quo, so inertia could be an explanation for the decision to reject.

Gal and Rucker describe an alternative experiment in which people can choose between a certain return of 3% or a risky bet with expected value of zero. As they must make a choice, there is not a status quo option. 80% of people allocate at least some money to the risky bet, suggesting an absence of loss aversion. This type of finding is reflected across a broader literature.

They also report a body of research where the risky bet is not the sole option to opt into, but rather one of two options for which an active choice must be made. For example, would you like $0 with certainty, or a 50:50 bet to win $10, lose $10. In this case, little evidence for loss aversion emerges unless the stakes are large.

This framing of the safe option as the status quo is one of many conditions under which loss aversion tends to emerge. Gal and Rucker reference a paper by Eyal Ert and Ido Erev, who identified that in addition to emerging when the safe option is the status quo, loss aversion also tends to emerge with:

  • high nominal payoffs
  • when the stakes are large
  • when there are bets present in the choice list that create a contrast effect, and
  • in long experiments without feedback where the computation of the expected payoff is difficult.

Ert and Erev described a series of experiments where they remove these features and eliminate loss aversion.

Gal and Rucker also reference a paper by Yechiam and Hochman pdf, who surveyed the loss aversion literature involving balanced 50:50 bets. For experiential tasks, where decision makers are required to repeatedly select between options with no prior description of the outcomes of probabilities (effectively learning the probabilities with experience), there is no evidence of loss aversion. For descriptive tasks, where a choice is made between fully-described options, loss aversion most typically arises for “high-stakes” hypothetical amounts, and is often absent for lower sums (which are also generally hypothetical).

For the higher stakes bets, Yechiam and Hochman suggest risk aversion may explain the choices. However, what Yechiam and Hochman call high stakes aren’t that high; for example $600 versus $500. As I described in my previous post on the Rabin Paradox, risk aversion at stakes of that size can only be shoehorned into the traditional expected utility model with severe contortions (although it can be done). Rejecting that bet is a high level of risk aversion for anyone with more than minimal wealth (although these experimental subjects may have low wealth as they are generally students). Loss aversion is one alternative explanation.

Regardless, under the concept of loss aversion as presented in prospect theory, we should see loss aversion for low stakes bets. Once you are arguing that “loss aversion” will emerge if the bet is large enough, this is a different conception of loss aversion to that in the academic literature.

Other phenomena that may not involve loss aversion

At the end of the paper, Gal and Rucker mention a couple of other phenomena incorrectly attributed to or not necessarily caused by loss aversion.

The first of these is the Asian disease problem. In this problem, experimental subjects are asked:

Imagine that the U.S. is preparing for the outbreak of an unusual Asian disease, which is expected to kill 600 people. Two alternative programs to combat the disease have been proposed. Assume that the exact scientific estimate of the consequences of the programs are as follows:

If Program A is adopted, 200 people will be saved.

If Program B is adopted, there is 1/3 probability that 600 people will be saved, and 2/3 probability that no people will be saved.

Which of the two programs would you favor?

Most people tend to prefer program A.

Then ask another set of people the following:

If Program C is adopted 400 people will die.

If Program D is adopted there is 1/3 probability that nobody will die, and 2/3 probability that 600 people will die.

Which of the two programs would you favor?

Most people prefer program D, despite C and D being a reframed version of programs A and B. The reason for this change is usually attributed to the second set of options being a loss frame, with people preferring to gamble to avoid the loss.

This, however, is not loss aversion. There is, after all, no potential for gain in the second set of questions against which the strength of the losses can be compared. Rather, this is the “reflection effect”.

Tversky and Kahneman recognised this when they presented the Asian disease problem in their 1981 Science article pdf, but the translation into public discourse has missed this difference, with the Asian disease problem often presented as an example of loss aversion.

Gal and Rucker point out some other examples of phenomena that may be incorrectly attributed to loss aversion. The disposition effect – people tend to sell winning investment and retain losing investments – could also be explained by the reflection effect, or by lay beliefs about mean reversion. The sunk cost effect involves a refusal to recognise losses rather than a greater impact of losses relative to gains, as no comparison to a gain is made.

Losses don’t hurt more than gains

Beyond the thoughtful argument in the paper, Gal’s Scientific American article goes somewhat further. For instance, Gal writes:

People do not rate the pain of losing $10 to be more intense than the pleasure of gaining $10. People do not report their favorite sports team losing a game will be more impactful than their favorite sports team winning a game.

I find it useful to distinguish two points. The first is the question of the psychological impact of a loss. Does a loss generate a different feeling, or level of attention, to an equivalent gain? The second is how that psychological response manifests itself in a decision. Do people treat losses and gains differently, resulting in loss aversion of the type described in prospect theory?

The lack of differentiation between these two points often clouds the discussion of loss aversion. The first point accords with our instinct. We feel the pain of a loss. But that pain does not necessarily mean that we will be loss averse in our decisions.

Gal and Rucker’s article largely focuses on the second of these points through its examination of a series of choice experiments. Yet the types of claims in the Scientific American article, as in the above quote, are more about the first.

This is the point where I disagree with Gal. Although contextual (isn’t everything), the evidence of the greater psychological impact of losses appears solid. In fact, the Yechiam and Hochman article pdf, quoted by Gal and Rucker for its survey of the loss aversion literature, was an attempt to reconcile the disconnect between the evidence for the effect of losses on performance, arousal, frontal cortical activation, and behavioral consistency with the lack of evidence for loss aversion. Yechiam’s article on the assumption of loss aversion by Kahneman and Tversky (the subject of a previous post) closes with a section reconciling his argument with the evidence of the effect of small stake losses on learning and performance.

To be able to make claims that the evidence of psychological impact of losses is as weak and contextual as the evidence for loss aversion, Gal and Rucker would need to provide a much deeper review of the literature. But in the absence of that, my reading of the literature does not support those claims.

Unfortunately, these points in the Scientific American article have been the focus of the limited responses to Gal and Rucker’s article, leaving us with a somewhat unsatisfactory debate (as I discuss further below).

Hey, we’re overthrowing the old paradigm!

The third part of Gal and Rucker’s paper concerns what they call the “Sociology of Loss Aversion”. I don’t have much to say on their particular arguments in this section, except that I have a gut reaction against authors discussing Thomas Kuhn and contextualising their work as overthrowing the entrenched paradigm. Maybe it’s the lack of modesty in failing to acknowledge they could be wrong (like most outsiders complaining about their ideas being ignored and quoting Kuhn). Just build your case overthrowing the damn paradigm!

That said, the few responses to Gal and Rucker’s paper that I have seen are underwhelming. Barry Ritholtz wrote a column, labelled by Richard Thaler as a “Good takedown of recent overwrought editorial“, which basically said an extraordinary claim such as this requires extraordinary evidence, and that that standard has not been met.

Unfortunately, the lines in Gal’s Scientific American article on the psychological effect of losses were the focus of Ritholtz’s response, rather than the evidence in the Gal and Rucker article. Further, Ritholtz didn’t show much sign of having read the paper. For instance, in response to Gal’s claim that “people are not particularly likely to sell a stock they believe has even odds of going up or down in price”, Ritholtz responded that “the endowment effect easily explains why we place greater financial value on that which we already possess”. But, as noted above, (a solid) part of Gal and Rucker’s argument is that the endowment effect may not be the result of loss aversion. (It’s probably worth noting here that Gal and Rucker did effectively replicate the endowment effect many times over. The endowment effect is a solid phenomena.)

Another thread of response, linked by Ritholz, came from Albert Bridge Capital’s Drew Dickson. One part of Dickson’s 20-tweet thread runs as follows:

13| So, sure, a billionaire will not distinguish between a $100 loss and a $100 gain as much as Taleb’s at-risk baker with a child in college; but add a few zeros, and the billionaire will start caring.

4| Critics can pretend that Kahneman, Tversky and @R_Thaler haven’t considered this, but they of course have. From some starting point of wealth, there is some other number where loss aversion matters. For everyone. Even Gal. Even Rucker. Even Taleb.

15| Losses (that are significant to the one suffering the loss) feel much worse than similarly-sized gains feel good. Just do the test on yourself.

But this idea that you will be loss averse if the stakes are high enough is not “loss aversion”, or at least not the version of loss aversion from prospect theory, which applies to even the smallest of losses. It’s closer to the concept of “minimal requirements”, whereby people avoid bets that would be ruinous, not because losses hurt more than gains.

Thaler himself threw out a tweet in response, stating that:

No minor point about terminology. Nothing of substance. WTA > WTP remains.

That willingness to accept (WTA) is greater than willingness to pay (WTP) when framed as the status quo is not a point Gal and Rucker would disagree with. But is it due to loss aversion?

Thankfully, the publication of Gal and Rucker’s article was accompanied by two responses, one of which tackled some of the substantive issues (the other response built on rather than critiqued Gal and Rucker’s work). That substantive response (pdf), by Itamar Simonson and Ran Kivetz, would best be described as supporting the weak version of loss aversion.

Simonson and Kivetz largely agreed that status quo bias and the endowment effect do not offer reliable support for loss aversion, particularly given the alternative explanations for the phenomena. However, they were less convinced of Gal and Rucker’s experiments to identify inertia as the basis of these phenomena, suggesting the experiments involved “unrealistic experimental manipulations that are susceptible to confounds and give rise to simple alternative explanations”, although they leave those simple alternative explanations unspecified.

Simonson and Kivetz also disagreed with Gal and Rucker on the evidence concerning risky bets, describing as ad hoc and unsupported the assumption that not accepting the bet is the status quo. It’s not clear to me how they could describe that assumption as unsupported given Gal and Rucker’s experimental evidence (nor the evidence Gal and Rucker cite) about the absence of loss aversion for small stakes when rejecting the bet is not framed as the status quo. Loss aversion only emerges for larger bets.

I should say, however, that I do have some sympathy for Simonson and Kivetz’s resistance to accepting Gal and Rucker’s sweeping of the risky bet premium into the status quo bucket. Even those larger bets for which loss aversion arises aren’t that large (as noted above, they’re often in the range of $500). Risk aversion is a somewhat unsatisfactory alternative explanation (a topic I discuss in my post on Rabin’s Paradox), and I sense that some form of loss aversion kicks in, although here we may again be talking about a minimal requirements type of loss aversion, not the loss aversion of prospect theory.

Despite their views on risky bets, Simonson and Kivetz were more than willing to approve of Gal and Rucker’s case that loss aversion was a contingent phenomena. They would simply argue that loss aversion occurs “on average”. As noted above, I’m not sure how you would weight the relative instances of gains or losses having greater weight, so I’ll leave that debate for now.

Funnily enough, a final comment by Simonson and Kivetz on risky bets is that “the notion that losses do tend to loom larger than gains is most likely correct; it certainly resonates and “feels” consistent with personal experience, though intuitive reactions are a weak form of evidence.” As noted above, we should distinguish feelings and a decision exhibiting loss aversion.

Unfortunately, I haven’t found anything else that attempts to pick apart Gal and Rucker’s article, so it is hard to gauge the broader reception to the article or whether it has resonated in academic circles at all.

Where does this leave us on loss aversion?

Putting this together, I would summarise the case for loss aversion as follows:

  • The conditions for loss aversion are more restrictive than is typically thought or stated in discussion outside academia
  • Some of the claimed evidence for loss aversion, such as the endowment effect, have alternative explanations. The evidence is better found elsewhere
  • There is sound evidence for the psychological impact of losses, but this does not necessarily manifest itself in loss aversion
  • Most of the loss aversion literature does a poor job of distinguishing between loss aversion in its pure sense and what might be called a “minimal requirements” effect, whereby people are avoiding the gamble due to the threat of ruin.

This is a more restricted conception of loss aversion than I held when I started writing this post.

The loss aversion series of posts

My next post will be on the topic of ergodicity, which involves the concept that people are not maximising the expected value of a series of gambles, but rather the time average (explanation on what that means to come). If people maximise the latter, not the former as many approaches assume, you don’t need risk or loss aversion to explain their decisions.

My other posts on loss aversion can be found here:

  1. Kahneman and Tversky’s debatable loss aversion assumption
  2. What can we infer about someone who rejects a 50:50 bet to win $110 or lose $100? The Rabin paradox explored
  3. The case against loss aversion (this post)
  4. Ergodicity – to come.

What can we infer about someone who rejects a 50:50 bet to win $110 or lose $100? The Rabin paradox explored

Consider the following claim:

We don’t need loss aversion to explain a person’s decision to reject a 50:50 bet to win $110 or lose $100. That just simple risk aversion as in expected utility theory.

Risk aversion is the concept that we prefer certainty to a gamble with the same expected value. For example, a risk averse person would prefer $100 for certain over a 50-50 gamble between $0 and $200, which has an expected value of $100. The higher their risk aversion, the less they would value the 50:50 bet. They would also be willing to reject some positive expected value bets.

Loss aversion is the concept that losses loom larger than gains. If the loss is weighted more heavily that the gain – it is often said that losses hurt twice as much as gains bring us joy – then this could also explain the decision to reject a 50:50 bet of the type above. Loss aversion is distinct from risk aversion as its full force applies to the first dollar either side of the reference point from which the person is assessing the change (and at which point risk aversion should be negligible).

So, do we need loss aversion to explain the rejection of this bet, or does risk aversion suffice?

One typical response to the above claim is loosely based on the Rabin Paradox, which comes from a paper published in 2000 by Matthew Rabin:

An expected utility maximiser who rejects this bet is exhibiting a level of risk aversion that would lead them to reject bets that no one in their right mind would reject. It can’t be the case that this is simply risk aversion.

For the remainder of this post I am going to pull apart Rabin’s argument from his justifiably famous paper Risk Aversion and Expected-Utility Theory: A Calibration Theorem (pdf). A more more readable version of this argument was also published in 2001 in an article by Rabin and Richard Thaler.

To understand Rabin’s point, I have worked through the math in his paper. You can see my mathematical workings in an Appendix at the bottom of this post. There were quite a few minor errors in the paper – and some major errors in the formulas – but I believe I’ve captured the crux of the argument. (I’d be grateful for some second opinions on this).

I started working through these two articles with an impression that Rabin’s argument was a fatal blow to the idea that expected utility theory accurately describes the rejection of bets such as that above. I would have been comfortable making the above response. However, after playing with the numbers and developing a better understanding of the paper, I would say that the above response is not strictly true. Rabin’s paper makes an important point, but it is far from a fatal blow by itself. (That fatal blow does come, just not solely from here.)

Describing Rabin’s argument

Rabin’s argument starts with a simple bet: suppose you are offered a 50:50 bet to win $110 or lose $100, and you turn it down. Suppose further that you would reject this bet no matter what your wealth (this is an assumption we will turn to in more detail later). What can you infer about your response to other bets?

This depends on what decision making model you are using.

For an expected utility maximiser – someone who maximises the probability weighted subjective value of these bets – we can infer that they will turn down any 50:50 bet of losing $1,000 and gaining any amount of money. For example, they would reject a 50:50 bet to lose $1,000, win one billion dollars.

On its face value, that is ridiculous, and that is the crux of Rabin’s argument. Rejection of the low value bet to win $110 and lose $100 would lead to absurd responses to higher value bets. This leads Rabin to argue that risk aversion or the diminishing value of money has nothing to do with rejection of the low value bets.

The intuition behind Rabin’s argument is relatively simple. Suppose we have someone that rejects a 50:50 bet for gain $11, lose $10. They are an expected utility maximiser with a weakly concave utility curve: that is, they are risk neutral or risk averse at all levels of wealth.

From this, we can infer that they weight the average of each dollar between their current wealth (W) and their wealth if they win the bet (W+11) only 10/11 as much as they weight the average dollar of the last $10 of their current wealth (between W-10 and W). We can also say that they therefore weight their W+11th dollar at most 10/11 as much as their W-10th dollar (relying on the weak concavity here).

Suppose their wealth is now W+21. We have assumed that they will reject the bet at all levels of wealth, so they will also reject at this wealth. Iterating the previous calculations, we can say that they will weight their W+32nd dollar only 10/11 as much as their W+11th dollar. This means they value their W+32nd dollar only (10/11)2 as much as their W-10th dollar.

Keep iterating in this way and you end up with some ridiculous results. You value the 210th dollar above your current wealth only 40% as much as your last current dollar of your wealth [reducing by a constant factor of 10/11 every $21 – (10/11)10]. Or you value the 900th dollar above your current wealth at only 2% of your last current dollar [(10/11)40]. This is an absurd rate of discounting.

Those numbers are from the 2001 Rabin and Thaler paper. In his 2000 paper, Rabin gives figures of 3/20 for the 220th and 1/2000 for the 880th dollar, effectively calculating (10/11)20 and (10/11)80, which is a reduction by a factor of 10/11 every 11 dollars. This degree of discounting could be justified and reflects the equations provided in the Appendix to his paper, but it requires a slightly different intuition than the one relating to the comparison between every 21st dollar. If instead you note that the $11 above a reference point are valued less than the $10 below, you only need iterate up $11 to get another discount of 10/11, as the next $11 is valued at most as much as the previous $10.

Regardless of whether you use the numbers from the 2000 or 2001 paper, taking this iteration to the extreme, it doesn’t take long for additional money to have effectively zero value. Hence the result, reject the 50:50 win $110, lose $100 and you’ll reject the win any amount, lose $1,000 bet.

What is the utility curve of this person?

This argument sounds compelling, but we need to examine the assumption that you will reject the bet at all levels of wealth.

If someone rejects the bet at all levels of wealth, what is the least risk averse they could be? They would be close to indifferent to the bet at all levels of wealth. If that was the case across the whole utility curve, their absolute level of risk aversion is constant.

The equation used to represent utility with constant absolute risk aversion is exponential utility (with a>0). A feature of the exponential utility function is that, for a risk averse person, utility caps out at a maximum. Beyond a certain level of wealth, they gain no additional utility – hence Rabin’s ability to define bets where they reject infinite gains.

The need for utility to cap out is also apparent from the fact that someone might reject a bet that involves the potential for infinite gain. The utility of infinite wealth cannot be infinite, as any bet involving that the potential for infinite utility would be accepted.

In the 2000 paper, Rabin brings the constant absolute risk aversion function into his argument more explicitly when he examines what proportion of their portfolio a person with an exponential utility function would invest in stocks (under some particular return assumptions). There he shows a ridiculous level of risk aversion and states that “While it is widely believed that investors are too cautious in their investment behavior, no one believes they are this risk averse.”

However, this effective (or explicit) assumption of constant absolute risk aversion is not particularly well grounded. Most empirical evidence is that people exhibit decreasing absolute risk aversion, not constant. Exponential utility functions are used more for mathematical tractability than for realistically reflecting the decision making processes that people use.

Yet, under Rabin’s assumption of rejecting the bet at all levels of wealth, constant absolute risk aversion and a utility function such as the exponential is the most accommodating assumption we can make. While Rabin states that “no one believes they are this risk averse”, it’s not clear that anyone believes Rabin’s underlying assumption either.

This ultimately means that the ridiculous implications for rejecting low-value bets is the result of Rabin’s unrealistic assumption of rejecting the bet no matter what their wealth.

Relaxing the “all levels of wealth” assumption

Rabin is, of course, aware that the assumption of rejecting the bet at all levels of wealth is a weakness, so he provides a further example that applies to someone who only rejects this bet for all levels of wealth below $300,000.

This generates less extreme, but still clearly problematic bets that the bettor can be inferred to also reject.

For example, consider someone who rejects the 50:50 bet to win $110, lose $100 when they have $290,000 of wealth, and who would also reject that bet up to a wealth of $300,000. As for the previous example, each time you iterate up $110, each dollar in that $110 is valued at most 10/11 of the previous $110. It takes 90 iterations of $110 to cover that $10,000, meaning that a dollar around wealth $300,000 will be valued only (10/11)90 (0.02%) of a dollar at wealth $290,000. Each dollar above $300,000 is not discounted any further, but by then the damage has already been done, with that money of almost no utility.

For instance, this person will reject a bet of gain $718,190, lose $1,000. Again, this person would be out of their mind.

You might now ask whether a person with a wealth of $290,000 to $300,000 actually rejects bets of this nature? If not, isn’t this just another unjustifiable assumption designed to generate a ridiculous result?

It is possible to make this scenario more realistic. Rabin doesn’t mention this in his paper (nor do Rabin and Thaler), but we can generate the same result at much lower levels of wealth. All we need to find is someone who will reject that bet over a range of $10,000, and still have enough wealth to bear the loss – say someone who will reject that bet up to a wealth of $11,000. That person will also reject a win $718,190 lose $1,000 bet.

Rejection of the win $110, lose $100 bet over that range does not seem as unrealistic, and I could imagine a person with that preference existing. If we empirically tested this, we would also need to examine liquid wealth and cash flow, but the example does provide a sense that we could find some people whose rejection of low value bets would generate absurd results under expected utility maximisation.

The log utility function

Let’s compare Rabin’s example utility function with a more commonly assumed utility function, that of log utility. Log utility has decreasing absolute risk aversion (and constant relative risk aversion), so is both more empirically defensible and does not generate utility that asymptotes to a maximum like the exponential utility function.

A person with log utility would reject the 50:50 bet to win $110, lose $100 up to a wealth of $1,100. Beyond that, they would accept the bet. So, for log utility we should see most people accept this bet.

A person with log utility will reject some quite unbalanced bets: such as a 50:50 bet to win $1 million, lose $90,900, but only up to a wealth of $100,000, beyond which they would accept. Rejection only occurs when a loss is near ruinous.

The result is that log utility does not generate the types of rejected bets that Rabin labels as ridiculous, but would also fail to provide much of an explanation for the rejection of low-value bets with positive expected value.

The empirical evidence

Do people actually turn down 50:50 bets of win $110, lose $100? Surprisingly, I couldn’t find an example of this bet (if someone knows a paper that directly tests this, let me know).

Most examinations of loss aversion examine symmetric 50:50 bets where the potential gain and the loss are the same. They compare a bet centred around 0 (e.g. gain $100 or lose $100) and a similar bet in a gain frame (e.g. gain $100 or gain $300, or take $200 for certain). If more people reject the first bet than the latter, then this is evidence of loss aversion.

It makes sense that this is the experimental approach. If the bet is not symmetric, it becomes hard to tease out loss aversion from risk aversion.

However, there is a pattern in the literature that people often reject risky bets with a positive expected value in the ranges explored by Rabin. We don’t know a lot about their wealth (or liquidity), but Rabin’s illustrative numbers for rejected bets don’t seem completely unrealistic. It’s the range of wealth over which the rejection occurs that is questionable.

Rather than me floundering around on this point, there are papers that explicitly ask whether we can observe a set of bets for a group of experimental subjects and map a curve to those choices that resembles expected utility.

For instance, Holt and Laury’s 2002 AER paper (pdf) examined a set of hypothetical and incentivised bets over a range of stakes (finding among other things that hypothetical predictions of their response to incentivised high-stakes bets were not very accurate). They found that if you are flexible about the form of the expected utility function that is used, rejection of small gambles does not result in absurd conclusions on large gambles. The pattern of bets could be made consistent with expected utility, assuming you correctly parameterise the equation. Over subsequent years there was some back and forth on whether this finding was robust [see here (pdf) and here (pdf)], but the basic result seemed to hold.

The utility curve that best matched Holt and Laury’s experimental findings had increasing relative risk aversion, and decreasing absolute risk aversion. By having decreasing absolute risk aversion, the absurd implications of Rabin’s paper are avoided.

Papers such as this suggest that while Rabin’s paper makes an important point, its underlying assumptions are not consistent with empirical evidence. It is possible to have an expected utility maximiser reject low value bets without generating ridiculous outcomes.

So what can you infer about our bettor who has rejected the win $110, lose $100 bet?

From the argument above, I would say not much. We could craft a utility function to accommodate this bet without leading to ridiculous consequences. I personally feel this defence is laboured (that’s a subject for another day), but the bet is not in itself fatal to the argument that they are an expected utility maximiser.


The utility of a gain

Let’s suppose someone will reject a 50:50 bet with gain g and loss l for any level of wealth. What utility will they get from a gain of x ? Rabin defines an upper bound of the utility of gaining x to be:




This formula effectively breaks down x into g size components, successively discounting each additional g at \frac{l}{g} of the previous g .

You need k^{**}(x)+1 lots of g to cover x . For instance, if x was 32 and we had a 50:50 bet for win $11, lose $10, \left(\frac{32}{11}\right)=2 . You need 2+1 lots of 11 to fully cover 32. It actually covers a touch more than 32, hence the calculation being for an upper bound.

In the paper, Rabin defines k^{**}(x)=int\left(\left(\frac{x}{g}\right)+1\right) This seems to better capture the required number of g to fully cover x , but the iterations in the above formula start at i=0 . The calculations I run with my version of the formula replicate Rabin’s, supporting the suggestion that the addition of 1 in the paper is an error.

r(w) is shorthand for the amount of utility sacrificed from losing the gamble (i.e. losing l  ). We know that the utility of the gain g is less than this, as the bet is rejected. If we let r(w)=1 , the equation can be thought of as giving you the maximum utility you could get from the gain of x relative to the utility of the loss of l .

Putting this together, the upper bound of the utility of the possible gain x is therefore less than, first, the upper bound of the relative utility from the first $11, \left(\frac{10}{11}\right)^0r(w)=r(w) , the upper bound of utility from the next $11, \left(\frac{10}{11}\right)^1r(w) , and the upper bound of the utility from the remaining $10 – taking a conservative approach this is calculated as though it were a full $11: \left(\frac{10}{11}\right)^2r(w) .

The utility of a loss

Rabin also gives us a lower bound of the utility of a loss of x for this person who will reject a 50:50 bet with gain g and loss l for any level of wealth:



The intuition behind k^{*}(x)  comes from Rabin’s desire to provide a relatively uncomplicated proof for the proposition. Effectively, the utility scales down with each step of g by at least \frac{g}{l} . Since Rabin wants to express this in terms of losses, he defines 2l\geq{g}\geq{l} . He can thereby say that utility scales down by at least \frac{g}{l} every 2 lots of l .

Otherwise, the intuition for this loss formula is the same as that for the gain. The summation starts at i=1 as this formula is providing a lower bound, so does not require the final iteration to fully cover x . The formula is also multiplied by 2 as each iteration covers two lots of l , whereby r(w) is for a single span of l .

Running some numbers

The below R code implements the above two formulas as a function, calculating the potential utility gain for a win of G or a loss of L for a person who rejects a 50:50 bet win g , lose l at all levels of wealth. It then states whether we know the person will reject a win G , lose L bet – we can’t state they will accept as we have upper and lower bounds of the utility change from the gain and loss.

Rabin_bet <- function(g, l, G, L){

    k_2star <- as.integer(G/g)
    k_star <- as.integer(L/(2*l))

    U_gain <- 0
    for (i in 0:k_2star) {
        U_step <- (l/g)^i
        U_gain <- U_gain + U_step

    U_loss <- 0
    for (i in 1:k_star) {
        U_step <- 2*(g/l)^(i-1)
        U_loss <- U_loss + U_step

    ifelse(U_gain < U_loss,
    print(paste0("Max U from gain =", U_gain))
    print(paste0("Min U from loss =", U_loss))

Take a person who will reject a 50:50 bet to win $110, lose $100. Taking the table from the paper, they would reject a win $1,000,000,000, lose $1,000 bet.

Rabin_bet(110, 100, 1000000000, 1000)
[1] "REJECT"
[1] "Max U from gain =11"
[1] "Min U from loss =12.2102"

Relaxing the wealth assumption

In the Appendix of his paper, Rabin defines his proof where the bet is rejected over a range of wealth w\in(\bar w, \underline{w}) . In that case, relative utility for each additional gain of size g is \frac{l}{g} of the previous g until \bar w . Beyond that point, each additional gain of g gives constant utility until x is reached. The formula for the upper bound on the utility gain is:

U(w+x)-U(w)\leq \begin{cases} \sum_{i=0}^{k^{**}(x)}\left(\frac{l}{g}\right)^ir(w) & if\quad x\leq{\bar w}-w\\ \\ \sum_{i=0}^{k^{**}(\bar w)}\left(\frac{l}{g}\right)^{i}r(w)+\left[\frac{x-(\bar w-w)}{g}\right]\left(\frac{l}{g}\right)^{k^{**}(\bar w)}r(w) & if\quad x\geq{\bar w}-w \end{cases}

The first term of the equation where x\geq\bar w-w involves iterated discounting as per the situation where the bet is rejected for all levels of wealth, but here the iteration is only up to wealth \bar w . The second term of that equation captures the gain beyond \bar w discounted at a constant rate.

There is an error in Rabin’s formula in the paper. Rather than the term \left[\frac{x-(\bar w-w)}{g}\right] in the second equation, Rabin has it as [x-\bar w] . As for the previous equations, we need to know the number of iterations of the gain, not total dollars, and we need this between \bar w and w+x .

When Rabin provides the examples in Table II of the paper, from the numbers he provides I believe he actually uses a formula of the type int\left[\frac{x-(w-\underline w)}{g}+1\right] , which reflects a desire to calculate the upper-bound utility across the stretch above \bar w in a similar manner to below, although this is not strictly necessary given the discount is constant across this range. I have implemented as per my formula, which means that a bet for gain G is rejected g higher than for Rabin (which given their scale is not material).

Similarly, for the loss:

U(w)-U(w-x)\geq \begin{cases} {2}\sum_{i=1}^{k^{*}(x)}\left(\frac{g}{l}\right)^{i-1}{r(w)} & if\quad {w-\underline w+2l}\geq{x}\geq{2l}\\ \\ {2}\sum_{i=1}^{k^{*}(w-\underline w+2l)}\left(\frac{g}{l}\right)^{i-1}{r(w)}+\ \quad\left[\frac{x-(w-\underline w+l)}{2l}\right]\left(\frac{g}{l}\right)^{k^{*}(w-\underline w+2l)}{r(w)} & if\quad x\geq{w-\underline w+2l} \end{cases}

There is a similar error here, with Rabin using the term \left[x-(w-\underline w+l)\right] rather than \left[\frac{x-(w-\underline w+l)}{2l}\right] . We can’t determine how this was implemented by Rabin as his examples do not examine behaviour below a lower bound \underline w .

Running some more numbers

The below code implements the above two formulas as a function, calculating the potential utility gain for a win of G or a loss of L for a person who rejects a 50:50 bet win g , lose l at wealth w\in(\bar w, \underline{w}) . It then states whether we know the person will reject a win G , lose L bet – as before, we can’t state they will accept as we have upper and lower bounds of the utility change from the gain and loss.

Rabin_bet_general <- function(g, l, G, L, w, w_max, w_min){

        G <= (w_max-w),
        k_2star <- as.integer(G/g),
        k_2star <- as.integer((w_max-w)/g))

    ifelse(w-w_min+2*l >= L
        k_star <- as.integer(L/(2*l)),
        k_star <- as.integer((w-w_min+2*l)/(2*l))

    U_gain <- 0
    for (i in 0:k_2star){
        U_step <- (l/g)^i
        U_gain <- U_gain + U_step

        G <= (w_max-w),
        U_gain <- U_gain,
        U_gain <- U_gain + ((G-(w_max-w))/g)*(l/g)^k_2star

    U_loss <- 0
    for (i in 1:k_star) {
        U_step <- 2*(g/l)^(i-1)
        U_loss <- U_loss + U_step

    ifelse(w-w_min+2l >= L,
        U_loss <- U_loss,
        U_loss <- U_loss + ((L-(w-w_min+l))/(2*l))*(g/l)^k_star

    ifelse(U_gain < U_loss,
        print("CANNOT CONFIRM REJECT")

    print(paste0("Max U from gain =", U_gain))
    print(paste0("Min U from loss =", U_loss))

Imagine someone who turns down the win $110, lose $100 bet with a wealth of $290,000, but who would only reject this bet up to $300,000. They will reject a win $718,190, lose $1000 bet.

Rabin_bet_general(110, 100, 718190, 1000, 290000, 300000, 0)
[1] "REJECT"
[1] "Max U from gain =12.2098745626936"
[1] "Min U from loss =12.2102"

The nature of Rabin’s calculation means that we can scale this calculation to anywhere on the wealth curve. We need only say that someone who rejects this bet over (roughly) a range of $10,000 plus the size of the potential loss will exhibit the same decisions. For example a person with $10,000 wealth who would reject the bet up to $20,000 wealth would also reject the win $718,190, lose $1000 bet.

Rabin_bet_general(110, 100, 718190, 1000, 10000, 20000, 0)
[1] "REJECT"
[1] "Max U from gain =12.2098745626936"
[1] "Min U from loss =12.2102"

Comparison with log utility

The below is an example with log utility, which is U(W)=ln(W) . This function determines whether someone of wealth w will reject of accepta 50:50 bet for gain g and loss l .

log_utility <- function(g, l, w){

    log_gain <- log(w+g)
    log_loss <- log(w-l)

    EU_bet <- 0.5*log_gain + 0.5*log_loss
    EU_certain <- log(w)

    ifelse(EU_certain == EU_bet,
        ifelse(EU_certain > EU_bet,

    print(paste0("Expected utility of bet = ", EU_bet))
    print(paste0("Utility of current wealth = ", EU_certain))

Testing a few numbers, someone with log utility is indifferent about a 50:50 win $110, lose $100 bet at wealth $1100. They would accept for any level of wealth above that level.

log_utility(110, 100, 1100)
[1] "Expected utility of bet = 7.00306545878646"
[1] "Utility of current wealth = 7.00306545878646"

That same person will always accept a 50:50 win $1100, lose $1000 bet above $11,000 in wealth.

log_utility(1100, 1000, 11000)
[1] "ACCEPT"
[1] "Expected utility of bet = 9.30565055178051"
[1] "Utility of current wealth = 9.30565055178051"

Can we generate any bets that don’t seem quite right? It’s quite hard unless you have a bet that will bring the person to ruin or near ruin. For instance, for a 50:50 bet with a chance to win $1 million, a person with log utility and $100,000 wealth would still accept the bet with a potential loss of $90,900, which brings them to less than 10% of their wealth.

log_utility(1000000, 90900, 100000)
[1] "ACCEPT"
[1] "Expected utility of bet = 11.5134252151368"
[1] "Utility of current wealth = 11.5129254649702"

The problem with log utility is not the ability to generate ridiculous bets that would be rejected. Rather, it’s that someone with log utility would tend to accept most positive value bets (in fact, they would always take a non-zero share if they could). Only if the bet brings them near ruin (either through size or their lack of wealth) would they turn down the bet.

The isoelastic utility function – of which log utility is a special case – is a broader class of function that exhibits constant relative risk aversion:


If \rho=1 , this simplifies to log utility (you need to use L’Hopital’s rule to get this as the fraction is undefined when \rho=1 .) The higher \rho , the higher the level of risk aversion. We implement this function as follows:

CRRA_utility <- function(g, l, w, rho=2){

        print("function undefined"),

    log_gain <- ((w+g)^(1-rho)-1)/(1-rho)
    log_loss <- ((w-l)^(1-rho)-1)/(1-rho)

    EU_bet <- 0.5*log_gain + 0.5*log_loss
    EU_certain <- (w^(1-rho)-1)/(1-rho)

    ifelse(EU_certain == EU_bet,
        ifelse(EU_certain > EU_bet,

    print(paste0("Expected utility of bet = ", EU_bet))
    print(paste0("Utility of current wealth = ", EU_certain))

If we increase \rho , we can increase the proportion of low value bets that are rejected.

For example, a person with \rho=2 will reject the 50:50 win $110, lose $100 bet up to a wealth of $2200. The rejection point scales with \rho .

CRRA_utility(110, 100, 2200, 2)
[1] "Expected utility of bet = 0.999545454545455"
[1] "Utility of current wealth = 0.999545454545455"

For a 50:50 chance to win $1 million at wealth $100,000, the person with \rho=2 is willing to risk a far smaller loss, and rejects even when the loss is only $48,000, or less than half their wealth (which admittedly is still a fair chunk).

CRRA_utility(1000000, 48000, 100000, 2)
[1] "REJECT"
[1] "Expected utility of bet = 0.99998993006993"
[1] "Utility of current wealth = 0.99999"

Higher values of \rho start to become completely unrealistic as utility is almost flat beyond an initial level of wealth.

It is also possible to have values of \rho between 0 (risk neutrality) and 1. These would result in even fewer rejected low value bets than log utility, and fewer rejected bets with highly unbalanced potential gains and losses.

My latest article at Behavioral Scientist: Principles for the Application of Human Intelligence

I am somewhat slow in posting this – the article has been up more than a week – but my latest article is up at Behavioral Scientist.

The article is basically an argument that the scrutiny we are applying to algorithmic decision making should also be applied to human decision making systems. Our objective should be good decisions, whatever the source of the decision.

The introduction to the article is below.

Principles for the Application of Human Intelligence

Recognition of the powerful pattern matching ability of humans is growing. As a result, humans are increasingly being deployed to make decisions that affect the well-being of other humans. We are starting to see the use of human decision makers in courts, in university admissions offices, in loan application departments, and in recruitment. Soon humans will be the primary gateway to many core services.

The use of humans undoubtedly comes with benefits relative to the data-derived algorithms that we have used in the past. The human ability to spot anomalies that are missed by our rigid algorithms is unparalleled. A human decision maker also allows us to hold someone directly accountable for the decisions.

However, the replacement of algorithms with a powerful technology in the form of the human brain is not without risks. Before humans become the standard way in which we make decisions, we need to consider the risks and ensure implementation of human decision-making systems does not cause widespread harm. To this end, we need to develop principles for the application for the human intelligence to decision making.

Read the rest of the article here.

Kahneman and Tversky’s “debatable” loss aversion assumption

Loss aversion is the idea that losses loom larger than gains. It is one of the foundational concepts in the judgment and decision making literature. In Thinking, Fast and Slow, Daniel Kahneman wrote “The concept of loss aversion is certainly the most significant contribution of psychology to behavioral economics.”

Yet, over the last couple of years several critiques have emerged that question the foundations of loss aversion and whether loss aversion is a phenomena at all.

One is an article by Eldad Yechiam, titled Acceptable losses: the debatable origins of loss aversion (pdf). Framed in one case as a spread of the replication crisis to loss aversion, the abstract reads as follows:

It is often claimed that negative events carry a larger weight than positive events. Loss aversion is the manifestation of this argument in monetary outcomes. In this review, we examine early studies of the utility function of gains and losses, and in particular the original evidence for loss aversion reported by Kahneman and Tversky (Econometrica  47:263–291, 1979). We suggest that loss aversion proponents have over-interpreted these findings. Specifically, the early studies of utility functions have shown that while very large losses are overweighted, smaller losses are often not. In addition, the findings of some of these studies have been systematically misrepresented to reflect loss aversion, though they did not find it. These findings shed light both on the inability of modern studies to reproduce loss aversion as well as a second literature arguing strongly for it.

A second, The Loss of Loss Aversion: Will It Loom Larger Than Its Gain (pdf), by David Gal and Derek Rucker, attacks the concept of loss aversion more generally (supposedly the “death knell“):

Loss aversion, the principle that losses loom larger than gains, is among the most widely accepted ideas in the social sciences. The first part of this article introduces and discusses the construct of loss aversion. The second part of this article reviews evidence in support of loss aversion. The upshot of this review is that current evidence does not support that losses, on balance, tend to be any more impactful than gains. The third part of this article aims to address the question of why acceptance of loss aversion as a general principle remains pervasive and persistent among social scientists, including consumer psychologists, despite evidence to the contrary. This analysis aims to connect the persistence of a belief in loss aversion to more general ideas about belief acceptance and persistence in science. The final part of the article discusses how a more contextualized perspective of the relative impact of losses versus gains can open new areas of inquiry that are squarely in the domain of consumer psychology.

A third strain of criticism relates to the concept of ergodicity. Put forward by Ole Peters, the basic claim is that people are not maximising the expected value of a series of gambles, but rather the time average. If people maximise the latter, not the former as many approaches assume, you don’t need risk or loss aversion to explain the decisions. (I’ll leave explaining what exactly this means to a later post.)

I’m as sceptical and cynical about the some of the findings in the behavioural sciences as most (here’s my critical behavioural economics and behavioural science reading list), but I’m not sure I am fully on board with these arguments, particularly the stronger statements of Gal and Rucker. This post is the first of a few rummaging through these critiques to make sense of the debate, starting with Yechiam’s paper on the foundations of loss aversion in prospect theory.

Acceptable losses: the debatable origins of loss aversion

One of the most cited papers in the social sciences is Daniel Kahneman and Amos Tversky’s 1979 paper Prospect Theory: An Analysis of Decision under Risk (pdf). Prospect theory is intended to be a descriptive model of how people make decisions under risk, and an alternative to expected utility theory.

Under expected utility theory, people assign a utility value to each possible outcome of a lottery or gamble, with that outcome typically relating to a final level of wealth. The expected utility for a decision under risk is simply the probability weighted sum of these utilities. The utility of a 50% chance of $0 and a 50% chance of $200 is simply the sum of 50% of the utility of each of $0 and $200.

When utility is assumed to increase at a decreasing rate with each additional dollar of additional wealth – as is typically the case – it leads to risk averse behaviour, with a certain sum preferred to a gamble with an equivalent expected value. For example, a risk averse person would prefer $100 for certain that the 50-50 gamble for $0 or $200.

In their 1979 paper, Kahneman and Tversky described a number of departures from expected utility theory. These included:

  • The certainty effect: People overweight outcomes that are considered certain, relative to outcomes which are merely probable.
  • The reflection effect: Relative to a reference point, people are risk averse when considering gains, but risk seeking when facing losses.
  • The isolation effect: People focus on the elements that differ between options rather than those components that are shared.
  • Loss aversion: Losses loom larger than gains – relative to a reference point, a loss is more painful than a gain of the same magnitude.

Loss aversion and the reflection effect result in the following famous diagram of how people weight losses and gains under prospect theory. Loss aversion leads to a kink in the utility curve at the reference point. The curve is steeper below the reference point than above. The reflection effect results in the curve being concave above the reference point, and convex below.

Through the paper, Kahneman and Tversky describe experiments on each of the certainty effect, reflection effect, and isolation effect. However, as pointed out by Eldad Yechiam in his paper Acceptable losses: the debatable origins of loss aversion, loss aversion is taken as a stylised fact. Yechiam writes:

[I]n their 1979 paper, Kahneman and Tversky (1979) strongly argued for loss aversion, even though, at the time, they had not reported any experiments to support it. By indicating that this was a robust finding in earlier research, Kahneman and Tversky (1979) were able to rely upon it as a stylized fact. They begin their discussion on losses by stating that “a salient characteristic of attitudes to changes in welfare is that losses loom larger than gains” (p. 279), which suggests that this stylized fact is based on earlier findings. They then follow with the (much cited) sentence that “the aggravation that one experiences in losing a sum of money appears to be greater than the pleasure associated with gaining the same amount [17]” (p. 279). Most people who cite this sentence do so without the end quote of Galenter and Pliner (1974). Galenter and Pliner (1974) are, therefore, the first empirical study used to support the notion of loss aversion.

So what did Galenter and Pliner find? Yechiam writes:

Summing up their findings, Galenter and Pliner (1974) reported as follows: “We now turn to the question of the possible asymmetry of the positive and negative limbs of the utility function. On the basis of intuition and anecdote, one would expect the negative limb of the utility function to decrease more sharply than the positive limb increases… what we have observed if anything is an asymmetry of much less magnitude than would have been expected … the curvature of the function does not change in going from positive to negative” (p. 75).

Thus, our search for the historical foundations of loss aversion turns into a dead end on this particular branch: Galenter and Pliner (1974) did not observe such an asymmetry; and their study was quoted erroneously.

Effectively, the primary reference for the claim that we are loss averse does not support it.

So what other sources did Kahneman and Tversky rely on? Yechiam continues:

They argue that “the main properties ascribed to the value function have been observed in a detailed analysis of von Neumann–Morgenstern utility functions for changes of wealth [14].” (p. 281). The citation refers to Fishburn and Kochenberger’s forthcoming paper (at the time; published 1979). Fishburn and Kochenberger’s (1979) study reviews data of five other papers (Grayson, 1960; Green, 1963; Swalm, 1966; Halter & Dean, 1971; Barnes & Reinmuth, 1976) also cited by Kahneman and Tversky (1979). Summing up all of these findings, Kahneman and Tversky (1979) argue that “with a single exception, utility functions were considerably steeper for losses than for gains.” (p. 281). The “single exception” refers to a single participant who was reported not to show loss aversion, while the remaining one apparently did.

These five studies all involved very small samples, involving a total of 30 subjects.

Yechiam walks through three of the studies. On Swalm (1966):

The results of the 13 individuals examined by Swalm … appear at the first glance to be consistent with an asymmetric utility function implying overweighting of losses compared to gains (i.e., loss aversion). Notice, however, that amounts are in the thousands, such that the smallest amount used was set above $1000 and typically above $5000, because it was derived from the participant’s “planning horizon”. Moreover, for more than half of the participants, the utility curve near the origin …, which spans the two smallest gains and two smallest losses for each person, was linear. This deviates from the notion of loss aversion which implies that asymmetries should also be observed for small amounts as well.

This point reflects an argument that Yechiam and other have made in several papers (including here and here) that loss aversion is only apparent in high-stakes gambles. When the stakes are low, loss aversion does not appear.

On Grayson (1960):

A similar pattern is observed in Grayson’s utility functions … The amounts used were also extreme high, with only one or two points below the $50,000 range. For the points above $100,000, the pattern seems to show a clear asymmetry between gains and losses consistent with loss aversion. However, for 2/9 participants …, the utility curve for the points below 100,000 does not indicate loss aversion, and for 2/9 additional participants no loss aversion is observed for the few points below $50,000. Thus, it appears that in Grayson (1960) and Swalm (1966), almost all participants behaved as if they gave extreme losses more weight than corresponding gains, yet about half of them did not exhibit a similar asymmetry for the lower losses (e.g., below $50,000 in Grayson, 1960).

Again, loss aversion is stronger for extreme losses.

On Green (1963):

… Green (1963) did not examine any losses, making any interpretation concerning loss aversion in this study speculative as it rests on the authors’ subjective impression.

The results from Swalm (1966), Grayson (1960) and Green (1963) covers 26 of the 30 participants aggregated by Fishburn and Kochenberger. Halter and Dean (1971) and Barnes and Reinmuth (1976) only involved two participants each.

So what of other studies that were available to Kahneman and Tversky at the time?

In 1955, Davidson, Siegel, and Suppes conducted an experiment in which participants were presented with heads or tails bets which they could accept or refuse. …

… Outcomes were in cents and ran up to a gain or loss of 50 cents. The results of 15 participants showed that utility curves for gains and losses were symmetric …, with a loss/ gain utility ratio of 1.1 (far below than the 2.25 estimated by Tversky and Kahneman, 1992). The authors also re-analyzed an earlier data set by Mosteller and Nogee (1951) involving bets for amounts ranging from − 30 to 30 cents, and it too showed utility curves that were symmetric for gains and losses.

Lichtenstein (1965) similarly used incentivized bets and small amounts. … Lichtenstein (1965) argued that “The preference for low V [variance] bets indicates that the utility curve for money is not symmetric in its extreme ranges; that is, that large losses appear larger than large wins.” (p. 168). Thus, Lichtenstein (1965) interpreted her findings not as a general aversion to losses (which would include small losses and gains), but only as a tendency to overweight large losses relative to large gains.

… Slovic and Lichtenstein (1968) developed a regression-based approach to examine whether the participants’ willingness to pay (WTP) for a certain lottery is predicted more strongly by the size of its gains or the size of its losses. Their results showed that size of losses predicted WTP more than sizes of gains. … Moreover, in a follow-up study, Slovic (1969) found a reverse effect in hypothetical lotteries: Choices were better predicted by the gain amount than the loss amount. In the same study, he found no difference for incentivized lotteries in this respect.

Similar findings of no apparent loss aversion were observed in studies that used probabilities that are learned from experience (Katz, 1963; Katz, 1964; Myers & Suydam, 1964).

In sum, the evidence for loss aversion at the time of the publication of prospect theory was relatively weak and limited to high-stakes gambles.

As Yechiam notes, Kahneman and Tversky only turned their attention to specifically investigating loss aversion in 1992 – and even there it tended to involve large amounts.

Only in 1992 did Tversky and Kahneman (1992) and Redelmeier and Tversky (1992) start to empirically investigate loss aversion, and when they did, they used either very large amounts (Redelmeier & Tversky, 1992) or the so-called “list method” in which one chooses between lotteries with changing amounts up until choices switch from one alternative to the other (Tversky & Kahneman, 1992). This usage of high amounts would come to characterize most of the literature later arguing for loss aversion (e.g., Redelmeier & Tversky, 1992; Abdellaoui et al., 2007; Rabin & Weizsäcker, 2009) as would be the usage of decisions that are not incentivized (i.e., hypothetical; as discussed below).

I’ll examine the post-1979 evidence in more detail in a future post, but in the interim will note this observation from Yechiam on the more recent experiments.

In a review of the literature, Yechiam and Hochman (2013a) have shown that modern studies of loss aversion seem to be binomially distributed into those who used small or moderate amounts (up to $100) and large amounts (above $500). The former typically find no loss aversion, while the latter do. For example, Yechiam and Hochman (2013a) reviewed 11 studies using decisions from description (i.e., where participants are given exact information regarding the probability of gaining and losing money). From these studies, seven did not find loss aversion and all of them used loss/gain amounts of up to $100. Four did find loss aversion, and three of them used very high amounts (above $500 and typically higher). Thus, the usage of high amounts to produce loss aversion is maintained in modern studies.

The presence of loss aversion for only large stakes gambles raises some interesting questions. In particular, are we actually observing the effect of “minimal requirements”, whereby a loss would push them below some minimum threshold for, say, survival or other basic necessities? (Or at least a heuristic that operates with that intent?) This is a distinct concept from loss aversion as presented in prospect theory.

Finally – and a minor point on the claim that Yechiam’s paper was the beginning of the spread of the replication crisis to loss aversion – there is of course no direct experiment on loss aversion in the initial prospect theory paper to be replicated. A recent replication of the experiments in the 1979 paper had positive results (excepting some mixed results concerning the reflection effect). Replication of the 1979 paper doesn’t, however, resolve provide any evidence on the replicability of loss aversion itself, nor the appropriate interpretation of the experiments.

On that point, in my next post on the topic I’ll turn to some of the alternative explanations for what appears to be loss aversion, particularly the claims of Gal and Rucker that losses do not loom larger than gains.

David Leiser and Yhonatan Shemesh’s How We Misunderstand Economics and Why it Matters: The Psychology of Bias, Distortion and Conspiracy

From a new(ish) book by David Leiser and Yhonatan Shemesh, How We Misunderstand Economics and Why it Matters: The Psychology of Bias, Distortion and Conspiracy:

Working memory is a cognitive buffer, responsible for the transient holding, processing, and manipulation of information. This buffer is a mental store distinct from that required to merely hold in mind a number of items and its capacity is severely limited. The complexity of reasoning that can be handled mentally by a person is bounded by the number of items that can be kept active in working memory and the number of interrelationships between elements that can be kept active in reasoning. Quantifying these matters is complicated, but the values involved are minuscule, and do not exceed four distinct elements …

LTM [long-term memory] suffers from a different failing.  … It seems there is ample room for our knowledge in the LTM. The real challenge relates to retrieval: people routinely fail to use knowledge that they possess – especially when there is no clear specification of what might be relevant, no helpful retrieval cue. …

The two flaws … interact with one another. Ideas and pieces of knowledge accumulate in LTM, but those bits often remain unrelated. Leiser (2001) argues that, since there is no process active in LTM to harmonize inconsistent parts, coordination between elements can only take place in working memory. And in view of its smallness, the scope of explanations is small too. …

Limited knowledge, unavailability of many of the relevant economic concepts
and variables, and restricted mental processing power mean that incoherencies are to be expected, and they are indeed found. One of the most egregious is the tendency, noted by Furnham and Lewis (1986) who examined findings from the US, the UK, France, Germany, and Denmark, to demand both reductions in taxation and increased public expenditure (especially on schools, the sick, and the old). You can of course see why people would rather pay less in taxes, and also that they prefer to benefit from more services, but it is still surprising how often the link between the two is ignored. This is only possible because, to most people, taxes and services are two unrelated mental concepts, sitting as it were in different parts of LTM, a case of narrow scoping, called by McCaffery and Baron (2006) in this context an “isolation effect.”

Bastounis, Leiser, and Roland- Levy ( 2004 ) ran an extensive survey on economic beliefs in several countries (Austria, France, Greece, Israel, New Zealand, Slovenia, Singapore, and Turkey) among nearly 2000 respondents, and studied the correlations between answers to the different questions. No such broad clustering of opinions as that predicted by Salter was in evidence. Instead, the data indicate that lay economic thinking is organized around circumscribed economic phenomena, such as inflation and unemployment, rather than by integrative theories. Simply put, knowing their answers about one question about inflation was a fair predictor of their answer to another, but was not predictive of their views regarding unemployment.

A refreshing element of the book is that it draws on a much broader swathe of psychology than just the heuristics and biases literature, which often becomes the focus of stories on why people err. However, I was surprised by the lack of mention of intelligence.

A couple of other interesting snippets, the first on the ‘halo effect’:

The tendency to oversimplify complex judgments also manifests in the “halo” effect. … [K]nowing a few positive traits of a person leads us to attribute additional positive traits to them. … The halo effect comes from the tendency to rely on global affect, instead of discriminating among conceptually distinct and potentially independent attributes.

This bias is unfortunate enough by itself, as it leads to the unwarranted attribution of traits to individuals. But it becomes even more pernicious when it blinds people to the possibility of tradeoffs, where two of the features are inversely correlated. To handle a tradeoff situation rationally, it is essential to disentangle the attributes, and to realize that if one increases the other decreases. When contemplating an investment, for instance, a person must decide whether to invest in stocks (riskier, but with a greater potential return) or in bonds (safer, but offering lower potential returns). Why not go for the best of both worlds – and buy a safe investment that also yields high returns? Because no such gems are on offer. A basic rule in investment pricing is that risk and return are inversely related, and for a good reason. …

Strikingly, this relation is systematically violated when people are asked for an independent evaluation of their risk perception and return expectations. Shefrin (2002) asked portfolio managers, analysts, and MBA students for such assessments, and found, to his surprise, that expected return correlates inversely with perceived risk. Respondents appear to expect that riskier stocks will also produce lower returns than safer stocks. This was confirmed experimentally by Ganzach (2000). In the simplest of his several experiments, participants received a list of (unfamiliar) international stock markets. One group of participants was asked to judge the expected return of the market portfolio of these stock markets, and the other was asked to judge the level of risk associated with investing in these portfolios. … The relationship between judgments of risk and judgments of expected return, across the financial assets evaluated, was large and negative (Pearson r = −0.55). Ganzach interprets this finding as showing that both perceived risk and expected return are derived from a global preference. If an asset is perceived as good, it will be judged to have both high return and low risk, whereas if it is perceived as bad, it will be judged to have both low return and high risk.

And on whether some examinations of economic comprehension are actually personality tests:

Leiser and Benita (in preparation) asked 300 people in the US for their view concerning economic fragility or stability, by checking the extent to which they agreed with the following sentences:

1. The economy is fundamentally sound, and will restore itself after occasional
2. The economy is capable of absorbing limited shocks, but if the shocks are
excessive, a major crisis and even collapse will ensue.
3. Deterioration in the economy, when it occurs, is a very gradual process.
4. The economy’s functioning is delicate, and always at a risk of collapse.
5. The economy is an intricate system, and it is all but impossible to predict how it will evolve.
6. Economic experts can ensure that the economy will regain stability even after major crises.

These questions relate to the economy, and respondents answered them first. But
we then asked corresponding questions, with minimal variations of wording, about
three other widely disparate domains: personal relationships, climate change, and health. Participants rated to what extent they agree with each of the statements about each additional domain. The findings were clear: beliefs regarding economic stability are highly correlated with parallel beliefs in unrelated social and natural domains. People who believe that “The economy’s functioning is delicate, and always at a risk of collapse” tend to agree that “Close interpersonal relationships are delicate, and always at a risk of collapse” … And people who hold that “The economy is capable of absorbing limited shocks, but if the shocks are excessive, a major crisis will occur” also tend to judge that “The human body is capable of absorbing limited shocks, but beyond a certain intensity of illness, body collapse will follow.”

What we see in such cases is that people don’t assess the economy as an intelligible system. Instead, they express their general feelings towards dangers. … [T]hose who believe that the world is dangerous and who see an external locus of control see all four domains (economics, personal relations, health, and the environment) as unstable and unpredictable. Such judgments have little to do with an evaluation of the domain assessed, be it economic or something else. They attest personal traits, not comprehension.

Nick Chater’s The Mind is Flat: The Illusion of Mental Depth and the Improvised Mind

Nick Chater’s The Mind is Flat: The Illusion of Mental Depth and the Improvised Mind is a great book.

Chater’s basic argument is that there are no ‘hidden depths’ to our minds. The idea that we have an inner mental world with beliefs, motives and fears is just a work of imagination. As Chater puts it:

no one, at any point in human history, has ever been guided by inner beliefs or desires, any more than any human being has been possessed by evil spirits or watched over by a guardian angel.

The book represents Chater’s reluctant acceptance that much experimental psychological data can no longer be accommodated by simply extending and modifying existing theories of reasoning and decision making. These theories are built on an intuitive conception of the mind, in which our thoughts and behaviour are rooted in reasoning and built on our deeply held beliefs and desires. As Chater argues, this intuitive conception is simply an illusion. This leads him to take his somewhat radical departure from many theories of perception, reasoning and decision making,

I have one major disagreement with the book, which turns out to be a fundamental disagreement with Chater’s central claim, but I’ll come to that later.

The visual illusion

Chater starts by examining visual perception. This is in part because visual perception is a (relatively) well understood area of psychology and neuroscience, and in part because Chater sees the whole of thought as being an extension of perception.

Consider our sense of colour vision. The sensitivity of colour vision falls rapidly outside of the fovea, the area of the retina responsible for our sharp central vision. The rod cells that capture most of our visual field only able to capture light and dark. This means that outside of a few degrees of where you are looking, you are effectively colour blind. Despite this, we feel that our entire visual world is coloured. That is an illusion.

Similarly, our visual periphery is fuzzy. Our visual acuity plunges in line with decreasing cone density with the increase in angle. Yet, again, we have a sense that we can capture the entire scene before us.

That limited vision is highlighted in experiments using gaze-contingent eye-tracking. In one experiment, participants are asked to read lines of text. Rather than showing the full text, the computer only displayed a window of text where the experimental participants were looking, with all letters outside of that window replaced by blocks of ‘x’s.

When someone is reading this text, they feel they are looking at a page or screen full of text. How small can the window of text be before this illusion is shattered? It turns out, the window can be shrunk to around 10 to 15 characters (centred slight right of the fixation point) without the reader sensing anything is amiss. This is despite the page being almost completely covered in ‘x’s. The sense that they are looking at a full page of text is an illusion, as most of the text isn’t there.

Chater walks through a range of other interesting experiments showing similar points. For instance, we can only encode one colour or shape or object at a time. The idea we are looking at a rich coloured world, taking in all of the colours and shapes at one, is also an illusion.

Our brain is not simultaneously grasping a whole, but is rather piecing together a stream of information. Yet we are fooled into believing we are having a rich sensory experience. We don’t actually see a broad, rich multi-coloured world. The sense that we do is a hoax.

So show can the mind execute this hoax? Chater suggests the answer is simply because as soon as we wonder about any aspect of the world, we can simply flick our eyes over and instantly provide an answer. The fluency of this process suggests to us that we already had the answers stored, but the experimental and physiological evidence suggests this cannot be the case.

Put another way, the sense of a rich sensory world is actually just the potential to explore a rich sensory world. This potential is misinterpreted as actually experiencing that world.

An interesting question posed by Chater later in the book is why don’t we have any awareness of the brain’s mode of thought. Why don’t we sense the continually flickering snapshots generated by our visual system? His answer is that the brain’s goal is to inform us of the world around us. It is not to inform us about the working of our own mechanisms to understand it.

The inner world

So does story change when we move from visual perception to our inner thoughts?

Charter asks us to think of a tiger as clearly and distinctly as we can. Consider the pattern of stripes on the tiger. Count them. What way do they flow over the body? Along the length or vertically? What about on the legs?

Visually, we can only grasp fragments at a time, but each visual feature is available on demand, giving the impression that our vision encompasses the whole scene. A similar dynamic is at work for the imaginary tiger. Here the mind improvises the answer as soon as you ask for it. Until you ask the question, those details are entirely absent.

What happens when you compare your answer about the tiger’s stripes with a real tiger? For the real tiger, the front legs don’t have stripes. At the back legs the stripes rotate from horizontal around the leg to vertical around the body. The belly and inner legs are white. Were they part of the image in your mind?

As we considered the tiger, we invented the answers to the questions we asked. What appeared to be a coherent image was constructed on the fly in the same way our system of visual perception gives us answers as we need them.

In one chapter, Chater also argues that we invent our feelings. He describes experimental participants dosed with either adrenaline or a placebo and then placed  in a waiting room with a stooge. The stooge was either manic (flying paper aeroplanes) or angry (reacting to a questionnaire they had to fill in while waiting). Those who had been adrenalised had stronger reactions to both stooges, but in opposite directions: euphoric with the manic stooge and irritated in the presence of the angry stooge. Chater argues that we interpret our emotions in the moment based on both the situation we are in and our own physiological state. By being an act of interpretation, having an emotion is an act of reasoning.

Improvising our preferences and beliefs

The core of Chater’s argument comes when he turns to our preferences and beliefs.  And here he argues that we are still relentless improvisers.

The famous split brain research of Michael Gazzaniga provides evidence for the improvisation. A treatment for severe epilepsy is surgical severance of the corpus callosum that links the two hemispheres of the brain. This procedure prevents seizures from spreading from one hemisphere to the other, but also results in the two halves of the cortex functioning independently.

What if you show different images to the right and left halves of the visual field, which are processed in the opposite hemispheres of the brain (the crossover wiring to the brain means that the right hemisphere processes information in the left visual field, and vice versa)? In one experiment Gazzaniga showed two images to a split brain patient, P.S. On the left hand side was a picture of a snowy scene. On the right was a picture of a chicken’s foot.  P.S., like most of us, had his language abilities focused in the left hemisphere of the brain, so P.S. could report seeing the chicken foot but was unable to say anything about the snowy scene.

P.S. was asked to pick one of four pictures associated with each of the images. The right hand, controlled by the left hemisphere, picked a chicken head to match the claw. The left hand picked out a shovel for the snow. And how did P.S. explain the choice of the shovel? ‘Oh that’s simple. The chicken claw goes with the chicken. And you need a shovel to clean out the chicken shed.’ An invented explanation. With no insight into the reason, the left hemisphere invents the explanation.

This fluent explanation by split brain patients presents the possibility that after-the-fact explanation might also be the case for people with normal brains. Rather than explanations expressing inner preferences and beliefs, we make up reasons in retrospect to interpret our actions.

Chater proceeds to build his case that we don’t have such inner beliefs and preferences with some of the less convincing research in the book, much of which looks and feels like a lot of what has been questioned during the replication crisis. It is interesting all the same.

In one experiment, voters in Sweden were asked whether they intended to vote for the left or right-leaning coalition. They were then given a questionnaire on various campaign topics. When the responses were handed to the experimenter, the experimenter changed some of the responses by a slight of hand. When they were handed back for checking, just under a quarter of voters spotted and corrected the error. But the majority were happy to explain political opinions that moments ago they did not hold.

Chater also reports an experiment where the experimenters got a similar effect when asking people which of two faces they prefer. When the face was switched before asking for the explanation, the fluent explanation still emerged.

An interesting twist to this experiment is when people who have been justified a choice of face they didn’t make are asked to choose again. These people tend to choose the face that they didn’t choose previously but were asked to justify. The explanation helped shape future decisions.

A similar effect occurred in another experiment in which participants took a web-based survey on political attitudes, with half the participants presented with an American flag in corner of screen. The flag caused a shift in political attitudes. But more interestingly, this effect persisted eight months later.

Chater’s interpretation of this experiment is not that Republicans should cover everything with flags. Rather, if people are exposed to a flag at a moment when they are contemplating their political views, this will have a long-lasting effect from the ‘memory traces’ that are laid down at the time.

When I read Chater’s summary of the experiment, my immediate reaction was that this was unlikely to replicate – and my reading of the original paper (PDF) firmed my view. And it turns out there was a replication of the first flag priming experiment in the Many Labs project – no effect. (My reaction to the paper might have been shaped by previously reading the Many Labs paper but not immediately recalling that this particular experiment was included.) So let’s scrub this experiment from the list of evidence in support. If there’s no immediate effect, it’s hard to make a case for an effect eight months later. (Chater should have noted this given the replication was published in 2014.)

This isn’t the only experiment reported by Chater with a failed replication in this section, although the other dates from after publication of the book. An experiment by Eldar Shafir that makes an appearance failed to replicate in Many Labs 2.

One other piece of evidence called on by Chater is the broad (and strong) evidence of the inconsistency of our risk preferences and how susceptible they are to the framing of the risk and the domain in which they are realised. Present the same gamble in a loss rather than a gain frame, and risk-seeking choices spike.

But putting these pieces together, I am not convinced Chater has made his case. The split brain experiments demonstrate our willingness to improvise explanations in the absence of any evidence. But this does not extend to an unequivocal case that we we don’t call on any “hidden depths” that are there. They are variable, but are they so variable that they have no deeper basis at all? Chater thinks so.

[N]o amount of measuring and re-measuring is going to help. The problem with measuring risk preferences is not that measurement is difficult and inaccurate; it is that there are no risk preferences to measure – there is simply no answer to how, ‘deep down’, we wish to balance risk and reward. And, while we’re at it, the same goes for the way people trade off the present against the future; how altruistic we are and to whom; how far we display prejudice on gender or race, and so on.

But this brings me to my major disagreement with Chater. For all Chater’s sweeping statements about our lack of hidden depths, he didn’t spend much effort trying to find them. Rather, he took a lot of evidence on how manipulable we can be (which we certainly are to a degree) and our willingness to improvise explanations when we have no idea (more robust), and then turned this into a finding that there is no hidden depth.

One place Chater could have looked is behavioural genetics. The first law of behavioural genetics is that all behavioural traits are heritable. That is, a proportion of the variation in these characteristics between people are due to genetic variation. These traits include risk preferences, the way we trade off the past and the future, and political preferences. These are among the characteristics that Chater suggests have no hidden depth. If there is no hidden depth, why are identical twins (even raised part) so similar for these traits Chater is likely right that when asked to explain why we took a certain risky preference we are likely to improvise an explanation with little connection to reality. We rarely point to our genes. But that does not mean the hidden depth is not there.

We can only have one thought at a time

Once Chater has completed his argument about our lack of hidden depths, he turns to describing his version of how the mind actually works. And part of that answer is that the brain can only tackle one problem at a time.

This inability to take on multiple tasks comes from the way that our brain computes when facing a difficult problem. Computation in the brain occurs through cooperation across the brain, with coordinated neural activity occurring across whole networks or entire regions of the brain. This large cooperative activity between slow neurons means that a network can only work on one problem at a time. And the brain is close to one large network.

Chater turns this idea into an attack on the “myth of the unconscious”. This myth is the idea that our brain is working away in the background. If we step away from a problem, we might suddenly have the answer pop into our head as our unconscious has kept working at the problem while we tend to other things.

Chater argues that for all the stories about scientists suddenly having major breakthroughs in the shower, neuroscience has found no evidence of these hidden processes. Chater summaries the studies in this area as concluding that, first, the effects of breaks either negligible or non-existent, and second, that the explanations for the minor effects of a break involve no unconscious thought at all.

As one example of the lack of effect, Chater describes an experiment in which subjects are asked to name both as many food items and as many countries as possible. Someone doing this task might switch back and forth between the two topics, changing to foods when they run out of countries and vice versa. How would the performance of a person able to switch back and forth compare to someone who has to first deal with one category, and only when finished move to the other? Would the former outperform as they could think about the second category in the background before coming back to it? The results suggest that when thinking about countries, there is no evidence that we are also thinking about food. When we switch from one category to the other, the search ceases abruptly.

So how did this myth of unconscious thought arise? Chater’s argument is that when we set a problem aside and return to it later, we are unencumbered by the past failures and patterns of thought in which we were trapped before. The new perspective may not be better than the old, but occasionally it will hit upon the angle that we need to solve the problem. So yes, the insight may emerge in a flash, but not because the unconscious had been grinding away at the problem.

This lack of unconscious thought is also demonstrated in the the literature concerning inattentional blindness. If people are busy attending to a task, they can miss information that they are not attending to. The classic example of this (at least, before the gorilla experiment) is an experiment by Ulric Neisser, in which participants are asked to watch three people throwing a ball to each other and press a button each time there was a throw. When an unexpected event occurs – in this case a woman with an umbrella walking through the players – less than one quarter of the participants noticed.

Chater takes the inattentional blindness studies as again showing that we can only lock onto and impose meaning on one fragment of sensory information at a time. If our brains are busy on one task, they can be utterly oblivious to other events.

One distinction Chater makes that I found useful is how to think about our unconscious thought processes. Chater’s argument is not that there is no processing in the brain outside our conscious knowledge. Rather, we have one type of thought, with unconscious processing resulting a a conscious result. Chater writes:

The division between the conscious and the unconscious does not distinguish between different types of thought. Instead, it is a division within individual thoughts themselves: between the conscious result of our thinking and the unconscious processes that create it.

There are no conscious thoughts and unconscious thoughts; and there are certainly no thoughts slipping in and out of consciousness. There is just one type of thought, and each such thought has two aspects: a conscious read-out, and unconscious processes generating the read-out.

So where do our actions come from?

So if there are no hidden depths, what drives us? Chater’s argument is that our thoughts come from memory traces created by previous thoughts and experiences. Each person is shaped by, and in effect unique due to, the uniqueness of their past thoughts and experiences. Thought follows channels carved by previous thoughts.

This argument does in some ways suggest that we have an inner-world. But that inner world is a record of the effect of the past cycles of thought. It is not an inner world of beliefs, hopes and fears. As Chater states, the brain operates based on precedents, not principles.

Chater’s first piece of evidence in support of this point comes from chess. What makes grandmasters special? It is not because humans are lightning calculating machines. Rather it is because of their long experience and their ability to find meaning in chess positions with great fluency. They can link the current position with memory traces of past board positions. They do not succeed by looking further ahead, but rather by drawing on a deeper memory bank and then focusing on only the best moves.

Chater argues that this is how perception works more generally. We do not interpret sensory information afresh, but interpret based on memory traces from past experience. He gives the example of “found faces”, where people see faces in inanimate objects. Our interpretation of the inputs finds resonance with memory traces of past inputs. Similarly, recognising a friend, word or tune depend on a link with your memories. Successful perception requires us to deploy the right memory traces when we need them.

Chater’s argument of the role of memory in perception seems sound. But absent the clear case that there there are no other sources of beliefs or motivations, I am not convinced these memory traces are all that there is.

What this means for intelligence and AI

The final chapter of the book is Chater’s attempt to put a positive gloss on his argument. It feels like the sort of chapter that the publisher might ask for to help with the promotion of the book.

That positive gloss is human creativity. Chater writes:

But the secret of human intelligence is the ability to find patterns in the least structured, most unexpected, hugely variable of streams of information – to lock onto a handbag and see a snarling face; to lock onto a set of black-and-white patches and discern a distinctive, emotion-laden, human being; to find mappings and metaphors through the complexity and chaos of the physical and psychological worlds. All this is far beyond the reach of modern artificial intelligence.

I am not sure I agree. Vision recognition systems regularly make errors through seeing patterns that aren’t there. Are these just the machine version of seeing a face in a handbag? Both are mismatches, but one is labelled as an imaginative leap, the other as an error. Should we endow this overactive human pattern matching with the title of intelligence and call a similar matching errors when done by a computer a mistake? Chess is also instructive here, with a sign of a machine move now often being great creativity.

This final chapter is somewhat shallow relative to the rest of the book. Chater provides little in the way of evidence to support his case, although you can piece together some threads supporting Chater yourself from the examples discussed earlier in the book. It ends the book with a nice hook, but for me was a flat ending for an otherwise great book.


Debating the conjunction fallacy

From Eliezer Yudkowsky on Less Wrong (a few years old, but worth revisiting in the light of my recent Gigerenzer v Kahneman and Tversky post):

When a single experiment seems to show that subjects are guilty of some horrifying sinful bias – such as thinking that the proposition “Bill is an accountant who plays jazz” has a higher probability than “Bill is an accountant” – people may try to dismiss (not defy) the experimental data. Most commonly, by questioning whether the subjects interpreted the experimental instructions in some unexpected fashion – perhaps they misunderstood what you meant by “more probable”.

Experiments are not beyond questioning; on the other hand, there should always exist some mountain of evidence which suffices to convince you.

Here is (probably) the single most questioned experiment in the literature of heuristics and biases, which I reproduce here exactly as it appears in Tversky and Kahneman (1982):

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Please rank the following statements by their probability, using 1 for the most probable and 8 for the least probable:

(5.2)  Linda is a teacher in elementary school.
(3.3)  Linda works in a bookstore and takes Yoga classes.
(2.1)  Linda is active in the feminist movement. (F)
(3.1)  Linda is a psychiatric social worker.
(5.4)  Linda is a member of the League of Women Voters.
(6.2)  Linda is a bank teller. (T)
(6.4)  Linda is an insurance salesperson.
(4.1)  Linda is a bank teller and is active in the feminist movement. (T & F)

(The numbers at the start of each line are the mean ranks of each proposition, lower being more probable.)

How do you know that subjects did not interpret “Linda is a bank teller” to mean “Linda is a bank teller and is not active in the feminist movement”? For one thing, dear readers, I offer the observation that most bank tellers, even the ones who participated in anti-nuclear demonstrations in college, are probably not active in the feminist movement. So, even so, Teller should rank above Teller & Feminist.  …  But the researchers did not stop with this observation; instead, in Tversky and Kahneman (1983), they created a between-subjects experiment in which either the conjunction or the two conjuncts were deleted. Thus, in the between-subjects version of the experiment, each subject saw either (T&F), or (T), but not both. With a total of five propositions ranked, the mean rank of (T&F) was 3.3 and the mean rank of (T) was 4.4, N=86. Thus, the fallacy is not due solely to interpreting “Linda is a bank teller” to mean “Linda is a bank teller and not active in the feminist movement.”

Another way of knowing whether subjects have misinterpreted an experiment is to ask the subjects directly. Also in Tversky and Kahneman (1983), a total of 103 medical internists … were given problems like the following:

A 55-year-old woman had pulmonary embolism documented angiographically 10 days after a cholecstectomy. Please rank order the following in terms of the probability that they will be among the conditions experienced by the patient (use 1 for the most likely and 6 for the least likely). Naturally, the patient could experience more than one of these conditions.

  • Dyspnea and hemiparesis
  • Calf pain
  • Pleuritic chest pain
  • Syncope and tachycardia
  • Hemiparesis
  • Hemoptysis

As Tversky and Kahneman note, “The symptoms listed for each problem included one, denoted B, that was judged by our consulting physicians to be nonrepresentative of the patient’s condition, and the conjunction of B with another highly representative symptom denoted A. In the above example of pulmonary embolism (blood clots in the lung), dyspnea (shortness of breath) is a typical symptom, whereas hemiparesis (partial paralysis) is very atypical.”

In indirect tests, the mean ranks of A&B and B respectively were 2.8 and 4.3; in direct tests, they were 2.7 and 4.6. In direct tests, subjects ranked A&B above B between 73% to 100% of the time, with an average of 91%.

The experiment was designed to eliminate, in four ways, the possibility that subjects were interpreting B to mean “only B (and not A)”. First, carefully wording the instructions:  “…the probability that they will be among the conditions experienced by the patient”, plus an explicit reminder, “the patient could experience more than one of these conditions”. Second, by including indirect tests as a comparison. Third, the researchers afterward administered a questionnaire:

In assessing the probability that the patient described has a particular symptom X, did you assume that (check one):
X is the only symptom experienced by the patient?
X is among the symptoms experienced by the patient?

60 of 62 physicians, asked this question, checked the second answer.

Fourth and finally, as Tversky and Kahneman write, “An additional group of 24 physicians, mostly residents at Stanford Hospital, participated in a group discussion in which they were confronted with their conjunction fallacies in the same questionnaire. The respondents did not defend their answers, although some references were made to ‘the nature of clinical experience.’  Most participants appeared surprised and dismayed to have made an elementary error of reasoning.”

Does the conjunction fallacy arise because subjects misinterpret what is meant by “probability”? This can be excluded by offering students bets with payoffs. In addition to the colored dice discussed yesterday, subjects have been asked which possibility they would prefer to bet $10 on in the classic Linda experiment. This did reduce the incidence of the conjunction fallacy, but only to 56% (N=60), which is still more than half the students.

But the ultimate proof of the conjunction fallacy is also the most elegant. In the conventional interpretation of the Linda experiment, subjects substitute judgment of representativeness for judgment of probability: Their feelings of similarity between each of the propositions and Linda’s description, determines how plausible it feels that each of the propositions is true of Linda. …

You just take another group of experimental subjects, and ask them how much each of the propositions “resembles” Linda. This was done – see Kahneman and Frederick (2002) – and the correlation between representativeness and probability was nearly perfect.  0.99, in fact.

The conjunction fallacy is probably the single most questioned bias ever introduced, which means that it now ranks among the best replicated. The conventional interpretation has been nearly absolutely nailed down.

There are a few additional experiments in Yudkowsky’s post that I have not replicated here.

Three algorithmic views of human judgment, and the need to consider more than algorithms

From Gerd Gigerenzer’s The bounded rationality of probabilistic mental models (PDF) (one of the papers mentioned in my recent post on the Kahneman and Tversky and Gigerenzer debate):

Defenders and detractors of human rationality alike have tended to focus on the issue of algorithms. Only their answers differ. Here are some prototypical arguments in the current debate.

Statistical algorithms

Cohen assumes that statistical algorithms … are in the mind, but distinguishes between not having a statistical rule and not applying such as rule, that is, between competence and performance. Cohen’s interpretation of cognitive illusions parallels J.J. Gibson’s interpretation of visual illusions: illusions are attributed to non-realistic experimenters acting as conjurors, and to other factors that mask the subjects’ competence: ‘unless their judgment is clouded at the time by wishful thinking, forgetfulness, inattentiveness, low intelligence, immaturity, senility, or some other competence-inhibiting factor, all subjects reason correctly about probability: none are programmed to commit fallacies or indulge in illusions’ … Cohen does not claim, I think, that people carry around the collected works of Kolmogoroff, Fisher, and Neyman in their heads, and merely need to have their memories jogged, like the slave in Plato’s Meno. But his claim implies that people do have at least those statistical algorithms in their competence that are sufficient to solve all reasoning problems studied in the heuristics and biases literature, including the Linda problem

Non-statistical algorithms: heuristics

Proponents of the heuristics-and-biases programme seem to assume that the mind is not built to work by the rules of probability:

In making predictions and judgments under uncertainty, people do not appear to follow the calculus of chance or the statistical theory of prediction. Instead they rely on a limited number of heuristics which sometimes yield reasonable judgments and sometimes lead to severe and systematic errors.

(Kahneman and Tversky, 1973:237)

Cognitive illusions are explained by non-statistical algorithms, known as cognitive heuristics.

Statistical and non-statistical heuristics

Proponents of a third position do not want to be forced to choose between statistical and non-statistical algorithms, but want to have them both. Fong and Nisbett … argue that people possess both rudimentary but abstract intuitive versions for statistical principles such as the law of large numbers, and non-statistical heuristics such as representativeness. The basis for these conclusions are the results of training studies. For instance, the experimenters first teach the subject the law of large numbers or some other statistical principle, and subsequently also explain how to apply this principle to a real-world domain such as sports problems. Subjects are then tested on similar problems front he same or other domains. The typical result is that more subjects reasons statistically, but transfer to domains not trained in is often low.

However, Gigerenzer argues that we need to consider more than just the mental algorithms.

Information needs representation. In order to communicate information, it has to be represented in some symbols system. Take numerical information. This information can be represented by the Arabic numeral system, by the binary system, by Roman numbers, or other systems. These different representations can be mapped in a one-to-one way, and are in this sense equivalent representations. But they are not necessarily equivalent for an algorithm. Pocket calculators, for instance, generally work on the Arabic base-10 system, whereas general purpose computers work on the base-2 system. The numerals 10000 and 32 are representations of the number thirty-two in the binary and Arabic system, respectively. The algorithms of my pocket calculator will perform badly with the first kind of representation but work well on the latter.

The human mind finds itself in an analogous situation. The algorithms most Western people have stored in their minds – such as how to add, subtract and multiply – work well on Arabic numerals. But contemplate for a moment division in Roman numerals, without transforming them first into Arabic numerals.

There is more to the distinction between an algorithm and a representation of information. Not only are algorithms tuned to particular representations, but different representations make explicit different features of the same information. For instance, one can quickly see whether a number is a power of 10 in an Arabic numeral representation, whereas to see whether that number is a power of 2 is more difficult. The converse holds with binary numbers. Finally, algorithms are tailored to given representations. Some representations allow for simpler and faster algorithms than others. Binary representation, for instance, is better suited to electronic techniques than Arabic representation. Arabic numerals, on the other hand, are better suited to multiplication and elaborate mathematical algorithms than Roman numerals …

Gigerenzer versus Kahneman and Tversky: The 1996 face-off

Through the late 1980s and early 1990s, Gerd Gigerenzer and friends wrote a series of articles critiquing Daniel Kahneman and Amos Tversky’s work on heuristic and biases. They hit hard. As Michael Lewis wrote in The Undoing Project:

Gigerenzer had taken the same angle of attack as most of their other critics. But in Danny and Amos’s view he’d ignored the usual rules of intellectual warfare, distorting their work to make them sound even more fatalistic about their fellow man than they were. He also downplayed or ignored most of their evidence, and all of their strongest evidence. He did what critics sometimes do: He described the object of his scorn as he wished it to be rather than as it was. Then he debunked his description. … “Amos says we absolutely must do something about Gigerenzer,” recalled Danny. … Amos didn’t merely want to counter Gigerenzer; he wanted to destroy him. (“Amos couldn’t mention Gigerenzer’s name without using the word ‘sleazeball,’ ” said UCLA professor Craig Fox, Amos’s former student.) Danny, being Danny, looked for the good in Gigerenzer’s writings. He found this harder than usual to do.

Kahneman and Tversky’s response to Gigerenzer’s work was published in 1996 in Psychological Review. It was one of the blunter responses you will read in academic debates, as the following passages indicate. From the first substantive section of the article:

It is not uncommon in academic debates that a critic’s description of the opponent’s ideas and findings involves some loss of fidelity. This is a fact of life that targets of criticism should learn to expect, even if they do not enjoy it. In some exceptional cases, however, the fidelity of the presentation is so low that readers may be misled about the real issues under discussion. In our view, Gigerenzer’s critique of the heuristics and biases program is one of these cases.

And the close:

As this review has shown, Gigerenzer’s critique employs a highly unusual strategy. First, it attributes to us assumptions that we never made … Then it attempts to refute our alleged position by data that either replicate our prior work … or confirm our theoretical expectations … These findings are presented as devastating arguments against a position that, of course, we did not hold. Evidence that contradicts Gigerenzer’s conclusion … is not acknowledged and discussed, as is customary; it is simply ignored. Although some polemic license is expected, there is a striking mismatch between the rhetoric and the record in this case.

Below are my notes put together on a 16-hour flight on the claims and counterclaims across Gigerenzer’s articles, the Kahneman and Tversky response in Psychological Review, and Gigerenzer’s rejoinder in the same issue. This represents my attempt to get my head around this debate and to understand the degree to which the heat is justified, not to give final judgment (although I do show my leanings). I don’t go to work published after the 1996 articles, although that might be for another day.

I will use Gigerenzer or Kahneman and Tversky’s words to make their arguments when I can. The core articles I refer to are:

Gigerenzer (1991) How to Make Cognitive Illusions Disappear: Beyond “Heuristics and Biases” (pdf)

Gigerenzer (1993) The bounded rationality of probabilistic mental models (pdf)

Kahneman and Tversky (1996) On the Reality of Cognitive Illusions (pdf)

Gigerenzer (1996) On Narrow Norms and Vague Heuristics: A Reply to Kahneman and Tversky (1996) (pdf)

Kahneman and Tversky (1996) Postscript (at the end of their 1996 paper)

Gigerenzer (1996) Postscript (at the end of his 1996 paper)

I recommend reading those articles, along with Kahneman and Tversky’s classic Science article (pdf) as background. (And note that the below debate and Gigerenzer’s critique only relates to two of the 12 “biases” covered in that paper.)

I touch on four of Gigerenzer’s arguments (using most of my word count on the first), although there are numerous other fronts:

  • Argument 1: Does the use of frequentist rather than probabilistic representations make many of the so-called biases disappear? Despite appearances, Kahneman, Tversky and Gigerenzer largely agree on the answer to this question. However, it was largely Gigerenzer’s work that brought this to my attention, so there was clearly some value (for me) to Gigerenzer’s focus.
  • Argument 2: Can you attribute probabilities to single events? Gigerenzer says no. Here there is a fundamental disagreement. I largely agree with Kahneman and Tversky as to whether this point is fatal to their work.
  • Argument 3: Are Kahneman and Tversky’s norms content blind? For particular examples, yes. Generally? No.
  • Argument 4: Should more effort be expended in understanding the underlying cognitive processes or mental models behind these various findings? This is where Gigerenzer’s argument is strongest, and I agree that many of Kahneman and Tversky’s proposed heuristics have weaknesses that need examination.

Putting these four together, I have sympathy for Gigerenzer’s way of thinking and ultimate program of work, but I am much less sympathetic to his desire to pull down Kahneman and Tversky’s findings on the way.

Now into the details.

Argument 1: Does the use of frequentist rather than probabilistic representations make many of the so-called biases disappear?

Gigerenzer’s argues that many biases involving probabilistic decision-making can be “made to disappear” by framing the problems in terms of frequencies rather than probabilities. The back-and-forth on this point centres on three major biases: overconfidence, the conjunction fallacy and base-rate neglect. I’ll take each in turn.


A typical question from the overconfidence literature reads as follows:

Which city has more inhabitants?

(a) Hyderabad, (b) Islamabad

How confident are you that your answer is correct?

50% 60% 70% 80% 90% 100%

After answering many questions of this form, the usual finding is that where people are 100% confident they had the correct answer, they might be correct only 80% of the time. When 80% confident, they might get only 65% correct. This discrepancy is often called “overconfidence”. [I’ve written elsewhere about the need to disambiguate different forms of overconfidence.]

There are numerous explanations for this overconfidence, such as confirmation bias, although in Gigerenzer’s view this is “a robust fact waiting for a theory”.

But what if we take a different approach to this problem. Gigerenzer (1991) writes:

Assume that the mind is a frequentist. Like a frequentist, the mind should be able to distinguish between single-event confidences and relative frequencies in the long run.

This view has testable consequences. Ask people for their estimated relative frequencies of correct answers and compare them with true relative frequencies of correct answers, instead of comparing the latter frequencies with confidences.

He tested this idea as follows:

Subjects answered several hundred questions of the Islamabad-Hyderabad type … and in addition, estimated their relative frequencies of their correct answers. …

After a set of 50 general knowledge questions, we asked the same subjects, “How many of these 50 questions do you think you got right?”. Comparing their estimated frequencies with actual frequencies of correct answers made “overconfidence” disappear. …

The general point is (i) a discrepancy between probabilities of single events (confidences) and long-run frequencies need not be framed as an “error” and called “overconfidence bias”, and (ii) judgments need not be “explained” by a flawed mental program at a deeper level, such as “confirmation bias”.

Kahneman and Tversky agree:

May (1987, 1988) was the first to report that whereas average confidence for single items generally exceeds the percentage of correct responses, people’s estimates of the percentage (or frequency) of items that they have answered correctly is generally lower than the actual number. … Subsequent studies … have reported a similar pattern although the degree of underconfidence varied substantially across domains.

Gigerenzer portrays the discrepancy between individual and aggregate assessments as incompatible with our theoretical position, but he is wrong. On the contrary, we drew a distinction between two modes of judgment under uncertainty, which we labeled the inside and the outside views … In the outside view (or frequentistic approach) the case at hand is treated as an instance of a broader class of similar cases, for which the frequencies of outcomes are known or can be estimated. In the inside view (or single-case approach) predictions are based on specific scenarios and impressions of the particular case. We proposed that people tend to favor the inside view and as a result underweight relevant statistical data. …

The preceding discussion should make it clear that, contrary to Gigerenzer’s repeated claims, we have neither ignored nor blurred the distinction between judgments of single and of repeated events. We proposed long ago that the two tasks induce different perspectives, which are likely to yield different estimates, and different levels of accuracy (Kahneman and Tversky, 1979). As far as we can see, Gigerenzer’s position on this issue is not different from ours, although his writings create the opposite impression.

So we leave this point with a degree of agreement.

Conjunction fallacy

The most famous illustration of the conjunction fallacy is the “Linda problem”. Subjects are shown the following vignette:

Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.

They are then asked which of the following two alternatives was more probable (either as just those two options, as part of a longer list of options, or across different experimental subjects):

Linda is a bank teller
Linda is a bank teller and is active in the feminist movement

In the original Tversky and Kahneman experiment, when shown only those two options, 85% of subjects chose the second. Tversky and Kahneman argued this was an error as the probability of the conjunction of two events can never be greater than one of its constituents.

Once again Gigerenzer reframed for the frequentist mind (quoting from the 1996 article):

There are 100 persons who fit the description above (i.e. Linda’s). How many of them are:

(a) bank tellers
(b) bank tellers and active in the feminist movement.

As Gigerenzer states:

If the problem is phrased in this (or a similar) frequentist way, then the “conjunction fallacy” largely disappears.

The postulated representativeness heuristic cannot account for this dramatic effect.

Gigerenzer’s 1993 article expands on this latter point:

If the mind solves the problem using a representative heuristic, changes in representation should not matter, because they do not change the degree of similarity. … Subjects therefore should still exhibit the conjunction fallacy.

Kahneman and Tversky’s response starts with the note that their first demonstration of the conjunction fallacy involved judgments of frequency. They asked subjects:

to estimate the number of “seven-letter words of the form ‘—–n-‘ in 4 pages of text.” Later in the same questionnaire, those subjects estimated the number of “seven-letter words of the form ‘—-ing’ in 4 pages of text.” Because it is easier to think of words ending with “ing” than to think of words with “n” in the next-to-last position, availability suggests that the former will bejudged more numerous than the latter, in violation of the conjunction rule. Indeed, the median estimate for words ending with “ing” was nearly three times higher than for words with “n” in the next-to-the-last position. This finding is a counter-example to Gigerenzer’s often repeated claim that conjunction errors disappear in judgments of frequency, but we have found no mention of it in his writings.

Here Gigerenzer stretches his defence of human consistency a step too far:

[T]he effect depends crucially on presenting the two alternatives to a participant at different times, that is, with a number (unspecified in their reports) of other tasks between the alternatives. This does not seem to be a violation of internal consistency, which I take to be the point of the conjunction fallacy.

Kahneman and Tversky also point out that they they had studied the effect of frequencies in other contexts:

We therefore turned to the study of cues that may encourage extensional reasoning and developed the hypothesis that the detection of inclusion could be facilitated by asking subjects to estimate frequencies. To test this hypothesis, we described a health survey of 100 adult men and asked subjects, “How many of the 100 participants have had one or more heart attacks?” and “How many of the 100 participants both are over 55 years old and have had one or more heart attacks?” The incidence of conjunction errors in this problem was only 25%, compared to 65% when the subjects were asked to estimate percentages rather than frequencies. Reversing the order of the questions further reduced the incidence to 11%.

Kahneman and Tversky go on to state:

Gigerenzer has essentially ignored our discovery of the effect of frequency and our analysis of extensional cues. As primary evidence for the “disappearance” of the conjunction fallacy in judgments of frequency, he prefers to cite a subsequent study by Fiedler (1988), who replicated both our procedure and our findings, using the bank-teller problem. … In view of our prior experimental results and theoretical discussion, we wonder who alleged that the conjunction fallacy is stable under this particular manipulation.

Gigerenzer concedes, but then turns to Kahneman and Tversky’s lack of focus on this result:

It is correct that they demonstrated the effect on conjunction violations first (but not for overconfidence bias and the base-rate fallacy). Their accusation, however, is out of place, as are most others in their reply. I referenced their demonstration in every one of the articles they cited … It might be added that Tversky and Kahneman (1983) themselves paid little attention to this result, which was not mentioned once in some four pages of discussion.

A debate about who was first and how much focus each gave to the findings is not substantive, but Kahneman and Tversky (1996) do not leave this problem here. While the frequency representation can reduce error when there is the possibility of direct comparison (the same subject sees and provides frequencies for both alternatives), they have less effect in between-subject experiment designs; that is, where one set of subjects will see one of the options and another set of subject the other:

Linda is in her early thirties. She is single, outspoken, and very bright. As a student she majored in philosophy and was deeply concerned with issues of discrimination and social justice.

Suppose there are 1,000 women who fit this description. How many of them are

(a) high school teachers?

(b) bank tellers? or

(c) bank tellers and active feminists?”

One group of Stanford students (N = 36) answered the above three questions. A second group (N = 33) answered only questions (a) and (b), and a third group (N = 31) answered only questions (a) and (c). Subjects were provided with a response scale consisting of 11 categories in approximately logarithmic spacing. As expected, a majority (64%) of the subjects who had the opportunity to compare (b) and (c) satisfied the conjunction rule. In the between-subjects comparison, however, the estimates for feminist bank tellers (median category: “more than 50”) were significantly higher than the estimates for bank tellers … Contrary to Gigerenzer’s position, the results demonstrate a violation of the conjunction rule in a frequency formulation. These findings are consistent with the hypothesis that subjects use representativeness to estimate outcome frequencies and edit their responses to obey class inclusion in the presence of strong extensional cues.

Gigerenzer in part concedes, and in part battles on:

Hence, Kahneman and Tversky (1996) believe that the appropriate reply is to show that frequency judgments can also fail. There is no doubt about the latter …

[T]he between subjects version of the Linda problem is not a violation of internal consistency, because the effect depends on not presenting the two alternatives to the same subject.

It’s right not to describe this as a violation of internal consistency, but for evidence of representativeness affecting judgement and doing so even with frequentist representations, it makes a good case. It is also difficult to argue that the subjects are making a good judgment. Kahneman and Tversky write:

Gigerenzer appears to deny the relevance of the between-subjects design on the ground that no individual subject can be said to have committed an error. In our view, this is hardly more reasonable than the claim that a randomized between-subject design cannot demonstrate that one drug is more effective than another because no individual subject has experienced the effects of both drugs.

Kahneman and Tversky write further in the postscript, possibly conceding on language but not on their substantive point:

This formula will not do. Whether or not violations of the conjunction rule in the between-subjects versions of the Linda and “ing” problems are considered errors, they require explanation. These violations were predicted from representativeness and availability, respectively, and were observed in both frequency and probability judgments. Gigerenzer ignores this evidence for our account and offers no alternative.

I’m with Kahneman and Tversky here.

Base-rate neglect

Base-rate neglect (or the base-rate fallacy) describes situations where a known base rate of an event or characteristic in a reference population is under-weighted, with undue focus given to specific information on the case at hand. An example is as follows:

If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming you know nothing about the person’s symptoms or signs?

The typical result is that around half of the people asked will guess a probability of 95% (even among medical professionals), with less than a quarter giving the correct answer of 2%. The positive result, which has associated errors, is weighted too heavily relative to the base rate of one in a thousand.

Gigerenzer (1991) once again responds with the potential of a frequentist representation to eliminate the bias, drawing on work by Cosmides and Tooby (1990) [The 1990 paper was an unpublished conference paper, but this work was later published here (pdf)]:

One our of 1000 Americans has disease X. A test has been developed to detect when a person has disease X. Every time the test is given to a person who has he disease, the test comes out positive. But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, out of every 1000 people who are perfectly healthy, 50 of them test positive for the disease.

Imagine that we have assembled a random sample of 1000 Americans. They were selected by a lottery. Those who conducted the lottery had no information about the health status of any of these people. How many people who test positive for the disease will actually have the disease? — out of —.

The result:

If the question was rephrased in a frequentist way, as shown above, then the Bayesian answer of 0.02 – that is, the answer “one out of 50 (or 51); – was given by 76% of the subjects. The “base-rate fallacy” disappeared.

Kahneman and Tversky (1996) do not respond to this particular example, beyond a footnote:

Cosmides and Tooby (1996) have shown that a frequentistic formulation also helps subjects solve a base-rate problem that is quite difficult when framed in terms of percentages or probabilities. Their result is readily explained in terms of extensional cues to set inclusion. These authors, however, prefer the speculative interpretation that evolution has favored reasoning with frequencies but not with percentages.

It seems we have agreement on the effect, although a differing interpretation.

Kahneman and Tversky, however, more directly attack the idea that people are natural frequentists.

He [Gigerenzer] offers a hypothetical example in which a physician in a nonliterate society learns quickly and accurately the posterior probability of a disease given the presence or absence of a symptom. … However, Gigerenzer’s speculation about what a nonliterate physician might learn from experience is not supported by existing evidence. Subjects in an experiment reported by Gluck and Bower (1988) learned to diagnose whether a patient has a rare (25%) or a common (75%) disease. For 250 trials the subjects guessed the patient’s disease on the basis of a pattern of four binary symptoms, with immediate feedback. Following this learning phase, the subjects estimated the relative frequency of the rare disease, given each of the four symptoms separately.

If the mind is “a frequency monitoring device,” as argued by Gigerenzer …, we should expect subjects to be reasonably accurate in their assessments of the relative frequencies of the diseases, given each symptom. Contrary to this naive frequentist prediction, subjects’ judgments of the relative frequency of the two diseases were determined entirely by the diagnosticity of the symptom, with no regard for the base-rate frequencies of the diseases. … Contrary to Gigerenzer’s unqualified claim, the replacement of subjective probability judgments by estimates of relative frequency and the introduction of sequential random sampling do not provide a panacea against base-rate neglect.

Gigerenzer (1996) responds:

Concerning base-rate neglect, Kahneman and Tversky … created the impression that there is little evidence that certain types of frequency formats improve Bayesian reasoning. They do not mention that there is considerable evidence (e.g., Gigerenzer & Hoffrage, 1995) and back their disclaimer principally with a disease-classification study by Gluck and Bower (1988), which they summarized thus: “subjects’ judgments of the relative frequency . . . were determined entirely by the diagnosticity of the symptom, with no regard for the base-rate frequencies of the diseases” … To set the record straight, Gluck and Bower said their results were consistent with the idea that “base-rate information is not ignored, only underused” (p. 235). Furthermore, their study was replicated and elaborated on by Shanks (1991), who concluded that “we have no conclusive evidence for the claim . . . that systematic base-rate neglect occurs in this type of situation” (p. 153). Adding up studies in which base-rate neglect appears or disappears will lead us nowhere.

Gigerenzer is right that Kahneman and Tversky were overly strong in their description of the findings of the Gluck and Bower study, but Gigerenzer’s conclusion seems close to that of Kahneman and Tversky. As Kahneman and Tversky wrote:

[I]t is evident that subjects sometimes use explicitly mentioned base-rate information to a much greater extent than they did in our original engineer- lawyer study [another demonstration of base-rate neglect], though generally less than required by Bayes’ rule.

Argument 2: Can you attribute probabilities to single events?

While I leave the question of frequency representations with a degree of agreement, Gigerenzer has a deeper critique of Kahneman and Tversky’s findings. From his 1993 article:

Is the conjunction fallacy a violation of probability theory? Has a person who chooses T&F violated probability theory? The answer is no, if the person is a frequentist such as Richard von Mises or Jerzy Neyman; yes, if he or she is a subjectivist such as Bruno de Finetti; and open otherwise.

The mathematician Richard von Mises, one of the founders of the frequency interpretation, used the following example to make his point:

We can say nothing about the probability of death of an individual even if we know his condition of life and health in detail. The phrase ‘probability of death’, when it refers to a single person, has no meaning at all for us. This is one of the most important consequences of our definition of probability.

(von Mises, 1957/1928: 11)

In this frequentist view, one cannot speak of a probability unless a reference class has been defined. … Since a person is always a member of many reference classes, no unique relative frequency can be assigned to a single person. … Thus, for a strict frequentist, the laws of probability are about frequencies and not about single events such as whether Linda is a bank teller. There, in this view, no judgement about single events can violate probability theory.

… Seen from the Bayesian point of view, the conjunction fallacy is an error.

Thus, choosing T&F in the Linda problem is not a reasoning error. What has been labelled the ‘conjunction fallacy’ here does not violate the laws of probability. It only looks so from one interpretation of probability.

He writes in his 1991 article somewhat more strongly (here talking in the context of overconfidence):

For a frequentist like the mathematician Richard von Mises, the term “probability”, when it refers to a single event, “has no meaning at all for us” … Probability is about frequencies, not single events. To compare the two means comparing applies with oranges.

Even the major opponents of the frequentists – subjectivists such as Bruno de Finetti – would not generally think of a discrepancy between confidence and relative frequency as a “bias”, albeit for different reasons. For a subjectivist, probability is about single events, but rationality is identified with the internal consistency of subjective probabilities. As de Finetti emphasized, “however an individual evaluates the probability of a particular event, no experience can prove him right, or wrong; nor, in general, could any conceivable criterion give any objective sense to the distinction one would like to draw, here, between right and wrong” …

Kahneman and Tversky address this argument across a few of the biases under debate. First, on conjunction errors:

Whether or not it is meaningful to assign a definite numerical value to the probability of survival of a specific individual, we submit (a) that this individual is less likely to die within a week than to die within a year and (b) that most people regard the preceding statement as true—not as meaningless—and treat its negation as an error or a fallacy.

In response, Gigerenzer makes an interesting point that someone asked that question might make a different inference:

One can easily create a context, such as a patient already on the verge of dying, that would cause a sensible person to answer that this patient is more likely to die within a week (inferring that the question is next week versus the rest of the year, because the question makes little sense otherwise). In the same fashion, the Linda problem creates a context (the description of Linda) that makes it perfectly valid not to conform to the conjunction rule.

I think Gigerenzer is right that if you treat the problem as content-blind you might miss the inference the subjects are drawing from the question (more on content-blind norms below). But conversely, Kahneman and Tversky’s general point appears sound.

Kahneman and Tversky also address this frequentist argument in relation to over-confidence:

Proper use of the probability scale is important because this scale is commonly used for communication. A patient who is informed by his surgeon that she is 99% confident in his complete recovery may be justifiably upset to learn that when the surgeon expresses that level of confidence, she is actually correct only 75% of the time. Furthermore, we suggest that both surgeon and patient are likely to agree that such a calibration failure is undesirable, rather than dismiss the discrepancy between confidence and accuracy on the ground that “to compare the two means comparing apples and oranges”

Gigerenzer’s response here is amusing:

Kahneman and Tversky argued that the reluctance of statisticians to make probability theory of norm of all single events “is not generally shared by the public” (p. 585). If this was meant to shift the burden of justification for their norms from the normative theory of probability to the intuitions of ordinary people, it is exceedingly puzzling. How can people’s intuitions be called upon to substitute for the standards of statisticians, in order to prove that people’s intuitions systematically violate the normative theory of probability?

Kahneman and Tversky did not come back on this particular argument, but several points could be made in their favour. First, and as noted above, there can still be errors under frequentist representations. Even if we discard the results with judgments of probability for single events, there is still a strong case for the use of heuristics leading to the various biases.

Second, if a surgeon states they are confident that someone has a 99% probability of complete recovery when they are right only 75% of the time, they are making one of two errors. Either they are making a probability estimate of a single event, which has no meaning at all according to Gigerenzer and von Mises, or they are poorly calibrated according to Kahneman and Tversky.

Third, whatever the philosophically or statistically correct position, we have a practical problem. We have judgements being made and communicated, with subsequent decisions based on those communications. To the extent there are weaknesses in that chain, we will have sub-optimal outcomes.

Putting this together, I feel this argument leaves us at a philosophical impasse, but Kahneman and Tversky’s angle provides scope for practical application and better outcomes. (Look at the training for the Good Judgment Project and associated improvements in forecasting that resulted).

Argument 3: Are Kahneman and Tversky’s norms content blind?

An interesting question about the norms against which Kahneman and Tversky assess the experimental subjects’ heuristics and biases is whether the norms are blind to the content of the problem. Gigerenzer (1996) writes:

[O]n Kahneman and Tversky’s (1996) view of sound reasoning, the content of the Linda problem is irrelevant; one does not even need to read the description of Linda. All that counts are the terms probable and and, which the conjunction rule interprets in terms of mathematical probability and logical AND, respectively. In contrast, I believe that sound reasoning begins by investigating the content of a problem to infer what terms such as probable mean. The meaning of probable is not reducible to the conjunction rule … For instance, the Oxford English Dictionary … lists “plausible,” “having an appearance of truth,” and “that may in view of present evidence be reasonably expected to happen,” among others. … Similarly, the meaning of and in natural language rarely matches that of logical AND. The phrase T&F can be understood as the conditional “If Linda is a bank teller, then she is active in the feminist movement.” Note that this interpretation would not concern and therefore could not violate the conjunction rule.

This is a case where I believe Gigerenzer makes an interesting point on the specific case but is wrong on the broader point. As a start, in discussing their initial results for their 1983 paper, Kahneman and Tversky asked whether people were interpreting the language in different ways, such as asking whether people are taking “Linda is a bank teller” to mean “Linda is a bank teller and not active in the feminist movement.” They considered the content of their problem and ran different experimental specifications to attempt to understand what was occurring.

But as Kahneman and Tversky state in their postscript, critiquing the Linda problem on this point – and only the within subjects experimental design at that – is a narrow view of their work. The point of the Linda problem is to test whether the representativeness of the description changes the assessment. As they write in their 1996 paper:

Perhaps the most serious misrepresentation of our position concerns the characterization of judgmental heuristics as “independent of context and content” … and insensitive to problem representation … Gigerenzer also charges that our research “has consistently neglected Feynman’s (1967) insight that mathematically equivalent information formats need not be psychologically equivalent” … Nothing could be further from the truth: The recognition that different framings of the same problem of decision or judgment can give rise to different mental processes has been a hallmark of our approach in both domains.

The peculiar notion of heuristics as insensitive to problem representation was presumably introduced by Gigerenzer because it could be discredited, for example, by demonstrations that some problems are difficult in one representation (probability), but easier in another (frequency). However, the assumption that heuristics are independent of content, task, and representation is alien to our position, as is the idea that different representations of a problem will be approached in the same way.

This is a point where you need to look across the full set of experimental findings, rather than critiquing them one-by-one. Other experiments have people violating the conjunction rule while betting on sequences generated by a dice, where there were no such confusions to be had about the content.

Much of the issue is also one of focus. Kahneman and Tversky have certainly investigated the question of how representation changes the approach to a problem. However, it is set out in a different way to that Gigerenzer might have liked.

Argument 4: Should more effort be expended in understanding the underlying cognitive processes or mental models behind these various findings?

We now come to an important point: what is the cognitive process behind all of these results? Gigerenzer (1996) writes:

Kahneman and Tversky (1996) reported various results to play down what they believe is at stake, the effect of frequency. In no case was there an attempt to figure out the cognitive processes involved. …

Progress can be made only when we can design precise models that predict when base rates are used, when not, and why

I can see why Kahneman and Tversky focus on the claims regarding frequency representations  when Gigerenzer makes such strong statements about making biases “disappear”. The statement that in no case have they attempted to figure out the cognitive processes involved is also overly strong, as a case could be made that the heuristics are those processes.

However, Gigerenzer believes Kahneman and Tversky’s heuristics are too vague for this purpose. Gigerenzer (1996) wrote:

The heuristics in the heuristics-and-biases program are too vague to count as explanations. … The reluctance to specify precise and falsifiable process models, to clarify the antecedent conditions that elicit various heuristics, and to work out the relationship between heuristics have been repeatedly pointed out … The two major surrogates for modeling cognitive processes have been (a) one-word-labels such as representativeness that seem to be traded as explanations and (b) explanation by redescription. Redescription, for instance, is extensively used in Kahneman and Tversky’s (1996) reply. … Why does a frequency representation cause more correct answers? Because “the correct answer is made transparent” (p. 586). Why is that? Because of “a salient cue that makes the correct answer obvious” (p. 586). or because it “sometimes makes available strong extensional cues” (p. 589). Researchers are no closer to understanding which cues are more “salient” than others, much less the underlying process that makes them so.

The reader may now understand why Kahneman and Tversky (1996) and I construe this debate at different levels. Kahneman and Tversky centered on norms and were anxious to prove that judgment often deviates from those norms. I am concerned
with understanding the processes and do not believe that counting studies in which people do or do not conform to norms leads to much. If one knows the process, one can design any number of studies wherein people will or will not do well.

This passage by Gigerenzer captures the state of the debate well. However, Kahneman and Tversky are relaxed about the lack of full specification, and sceptical that process models are the approach to provide that detail. They write in the 1996 postscript:

Gigerenzer rejects our approach for not fully specifying the conditions under which different heuristics control judgment. Much good psychology would fail this criterion. The Gestalt rules of similarity and good continuation, for example, are valuable although they do not specify grouping for every display. We make a similar claim for judgmental heuristics.

Gigerenzer legislates process models as the primary way to advance psychology. Such legislation is unwise. It is useful to remember that the qualitative principles of Gestalt psychology long outlived premature attempts at modeling. It is also unwise to dismiss 25 years of empirical research, as Gigerenzer does in his conclusion. We believe that progress is more likely to come by building on the notions of representativeness, availability, and anchoring than by denying their reality.

To me, this is the most interesting point of the debate. I have personally struggled to grasp the precise operation of many of Kahneman and Tversky’s heuristics and how their operation would change across various domains. But are more precisely specified models the way forward? Which are best at explaining the available data? We have now had over 20 years of work since this debate to see if this is an unwise or fruitful pursuit. But that’s a question for another day.

Barry Schwartz’s The Paradox of Choice: Why More Is Less

I typically find the argument that increased choice in the modern world is “tyrannising” us to be less than compelling. On this blog, I have approvingly quoted Jim Manzi’s warning against extrapolating the results of an experiment on two Saturdays in a particular store – the famous jam experiment – into “grandiose claims about the benefits of choice to society.” I recently excerpted a section from Bob Sugden’s excellent The Community of Advantage: A Behavioural Economist’s Defence of the Market on the idea that choice restriction “appeals to culturally conservative or snobbish attitudes of condescension towards some of the preferences to which markets cater.”

Despite this, I liked a lot of Barry Schwartz’s The Paradox of Choice: Why More Is Less. I still disagree with some of Schwartz’s recommendations, his view that the “free market” undermines our well-being, and that areas such as “education, meaningful work, social relations, medical care” should not be addressed through markets. I believe he shows a degree of condescension toward other people’s preferences. However, I found that for much of the diagnosis of the problem I agreed with Schwartz, even if that doesn’t always extend to recommending the same treatment.

Schwartz’s basic argument is that increased choice can negatively affect our wellbeing. It can damage the quality of our decisions. We often regret our decisions when we see the trade-offs involved in our choice, with those trade-offs often multiplying with increased choice. We adapt to the consequences of our choices, meaning that the high search costs of search may not be recovered.

The result is that we are not satisfied with our choices. Schwartz argues that once our basic needs are met, much of what we are trying to achieve is satisfaction. So if the new car, phone or brand of salad dressing don’t deliver satisfaction, are we worse off?

The power of Schwartz’s argument varies with the domain. When he deals with shopping, it is easy to see that the choices would be overwhelming to someone who wanted to examine all of the options (do we need all 175 salad dressings that are on display?). People report that they are enjoying shopping less, despite shopping more. But it is hard to feel that a decline in our enjoyment of shopping or the confusion we face looking at a sea of salad dressings is a serious problem.

Schwartz spends little time examining the benefits of increased consumer choice for individuals whose preferences are met, or the effect of the accompanying competition on price and quality. Schwartz has another book in which he tackles the problems with markets, so having not read it I can’t say he doesn’t have a case. But that case is absent from The Paradox of Choice.

In fairness to Schwartz, he does state that it is big jump to extrapolate the increased complexity of shopping into claims that too much choice can “tyrannise”. Schwartz even notes that we do OK with many consumer choices. We implicitly execute strategies such as picking the same product each time.

Schwartz’s argument is more compelling when we move beyond consumer goods into important high-stakes decisions such as those about our health, retirement or work. A poor choice there can have large effects on both outcomes and satisfaction. These choices are of a scale that genuinely challenges our wellbeing.

The experimental evidence that we struggle with high-stakes choices is more persuasive evidence of a problem than experiments involving people having difficulty choosing jam. For instance, when confronted with a multitude of retirement plans, people tend to simply split between them rather than consider the merits or appropriate allocation. Tweak the options presented to them and you can markedly change the allocations. When faced with too many choices, they may simply not choose.

Schwartz’s argument about our failures when choosing draws heavily from the heuristics and biases literature, and a relatively solid part of the literature at that: impatience and inter-temporal inconsistency, anchoring and adjustment, availability, framing and so on. But in some ways, this isn’t the heart of Schwartz’s case. People are susceptible to error even when there are few choices, which is the typical scenario in the experiments in which these findings are made. And much of Schwartz’s case would hold even if we were not susceptible to these biases.

Rather, much of the problem that Schwartz identifies comes when we approach choices as maximisers instead of satisficers. Maximisation is the path to disappointment in a world of massive choice, as you will almost certainly not select the best option. Maximisers may not even make a choice as they are not comfortable with compromises and will tend to want to keep looking.

Schwartz and some colleagues created a maximisation scale, where survey respondents rate themselves against statements such as “I never settle for second best.” Those who rated high on the maximisation were less happy with life, less optimistic, more depressed and score high on regret. Why this correlation? Schwartz believes there is a causal role and that learning how to satisfice could increase happiness.

What makes this result interesting is that maximisers make better decisions when assessed objectively. Is objective or subjective success more important? Schwartz considers that once we have met our basic needs, what matters most is how we feel. Subjective satisfaction is the most important criteria.

I am not convinced that the story of satisfaction from particular choices flows into overall satisfaction. Take a particular decision and satisfice, and perhaps satisfaction for that particular decision is higher. Satisfice for every life decision, and what does your record of accomplishment look like? What is your assessment of satisfaction then? At the end of the book, Schwartz does suggest that we need to “choose when to choose”, and leave maximisation for the important decisions, so it seems he feels maximisation is important on some questions.

I also wonder about the second order effects. If everyone satisficed to achieve higher personal satisfaction, what would we lose? How much do we benefit from the refusal of maximisers such as Steve Jobs or Elon Musk to settle for second best. Would a more satisfied world have less of the amazing accomplishments that give us so much value? Even if there were a personal gain to taking the foot off the pedal, would this involve broader cost?

An interesting thread relating to maximisation concerns opportunity costs. Any economist will tell you that opportunity cost – the opportunity you forgo by choosing an option – is the benchmark against which options should be assessed. But Schwartz argues that assessing opportunity costs has costs in itself. Being forced to make decisions with trade-offs makes people unhappy, and considering the opportunity costs makes those trade-offs salient.

The experimental evidence on considering trade-offs is interesting. For instance, in one experiment a groups of doctors were given a case history and a choice between sending the patient to a specialist or trying one other medication first. 75% choose the medication. Give the same choice to another group of doctors, but with the addition of a second medication option, and this time only 50% chose medication. Choosing the specialist is a way of avoiding a decision between the two medications. When there are trade-offs, all options can begin to look unappealing.

Another problem lurking for maximisers is regret, as the only way to avoid regret is to make the best possible decision. People will often avoid decisions if they could cause regret, or they aim for the regret minimising decision (which might be considered a form of satisficing).

There are some problems that arise even without the maximisation mindset.  One is that expectations may increase with choice. Higher expectations create a higher benchmark to achieve satisfaction, and Schwartz argues that these expectations may lead to an inability to cope rather than more control. High expectations create the burden of meeting them. For example, job options are endless. You can live anywhere in the world. The nature of your relationships – such as decisions about marriage – have a flexibility far above that of our recent past. For many, this creates expectations that are unlikely to be met. Schwartz does note the benefits of these options, but the presence of a psychological cost means the consequences are not purely positive.

Then there is unanticipated adaptation. People tend to predict bigger hypothetical changes in their satisfaction than that reported by those who experienced the events. Schwartz draws on the often misinterpreted paper that compares the happiness of lottery winners with para- and quadriplegics. He notes that the long-term difference in happiness between the two groups is smaller than you would expect (although I am not sure what you would expect on a 5-point scale). The problem with unanticipated adaptation is that the cost of search does not get balanced by the subjective benefit that the chooser was anticipating.

So what should we do? Schwartz offers eleven steps to reduce the burden of choosing. Possibly the most important is the need to choose when to choose. Often it is not that any particular choice is problematic (although some experiments suggest they are). Rather, it is the cumulative effect that is most damaging. Schwartz suggests picking those decisions that you want to invest effort in. Choosing when to choose allows adequate time and attention when we really want to choose. I personally do this: a wardrobe full of identical work shirts (although this involved a material initial search cost), a regular lunch spot, and many other routines.

Schwartz also argues that we should consider the opportunity costs of considering opportunity costs. Being aware of all the possible trade-offs, particularly when no option can dominate on all dimensions, is a recipe for disappointment. Schwartz suggests being a satisficer and only consider other options when you need to.

The final recommendation I will note is the need to anticipate adaptation. I personally find this a useful tool. Whenever I am making a new purchase I tend to recall a paragraph in Geoffrey Miller’s Spent, which often changes my view on a purchase:

You anticipate the minor mall adventure: the hunt for the right retail environment playing cohort-appropriate nostalgic pop, the perky submissiveness of sales staff, the quest for the virgin product, the self-restraint you show in resisting frivolous upgrades and accessories, the universe’s warm hug of validation when the debit card machine says “Approved,” and the masterly fulfillment of getting it home, turned on, and doing one’s bidding. The problem is, you’ve experienced all this hundreds of times before with other products, and millions of other people will experience it with the same product. The retail adventure seems unique in prospect but generic in retrospect. In a week, it won’t be worth talking about.

Miller’s point in that paragraph was about the signalling benefits of consumerism, but I find a similar mindset useful when thinking about the adaptation that will occur.