Gigerenzer versus Kahneman and Tversky: The 1996 face-off

Author

Jason Collins

Published

April 1, 2019

Through the late 1980s and early 1990s, Gerd Gigerenzer and friends wrote a series of articles critiquing Daniel Kahneman and Amos Tversky’s work on heuristic and biases. They hit hard. As Michael Lewis wrote in The Undoing Project:

Gigerenzer had taken the same angle of attack as most of their other critics. But in Danny and Amos’s view he’d ignored the usual rules of intellectual warfare, distorting their work to make them sound even more fatalistic about their fellow man than they were. He also downplayed or ignored most of their evidence, and all of their strongest evidence. He did what critics sometimes do: He described the object of his scorn as he wished it to be rather than as it was. Then he debunked his description. … “Amos says we absolutely must do something about Gigerenzer,” recalled Danny. … Amos didn’t merely want to counter Gigerenzer; he wanted to destroy him. (“Amos couldn’t mention Gigerenzer’s name without using the word ‘sleazeball,’” said UCLA professor Craig Fox, Amos’s former student.) Danny, being Danny, looked for the good in Gigerenzer’s writings. He found this harder than usual to do.

Kahneman and Tversky’s response to Gigerenzer’s work was published in 1996 in Psychological Review. It was one of the blunter responses you will read in academic debates, as the following passages indicate. From the first substantive section of the article:

It is not uncommon in academic debates that a critic’s description of the opponent’s ideas and findings involves some loss of fidelity. This is a fact of life that targets of criticism should learn to expect, even if they do not enjoy it. In some exceptional cases, however, the fidelity of the presentation is so low that readers may be misled about the real issues under discussion. In our view, Gigerenzer’s critique of the heuristics and biases program is one of these cases.

And the close:

As this review has shown, Gigerenzer’s critique employs a highly unusual strategy. First, it attributes to us assumptions that we never made … Then it attempts to refute our alleged position by data that either replicate our prior work … or confirm our theoretical expectations … These findings are presented as devastating arguments against a position that, of course, we did not hold. Evidence that contradicts Gigerenzer’s conclusion … is not acknowledged and discussed, as is customary; it is simply ignored. Although some polemic license is expected, there is a striking mismatch between the rhetoric and the record in this case.

Below are my notes put together on a 16-hour flight on the claims and counterclaims across Gigerenzer’s articles, the Kahneman and Tversky response in Psychological Review, and Gigerenzer’s rejoinder in the same issue. This represents my attempt to get my head around this debate and to understand the degree to which the heat is justified, not to give final judgment (although I do show my leanings). I don’t go to work published after the 1996 articles, although that might be for another day.

I will use Gigerenzer or Kahneman and Tversky’s words to make their arguments when I can. The core articles I refer to are:

Gigerenzer (1991) How to Make Cognitive Illusions Disappear: Beyond “Heuristics and Biases” (pdf)
Gigerenzer (1993) The bounded rationality of probabilistic mental models (pdf)
Kahneman and Tversky (1996) On the Reality of Cognitive Illusions (pdf)
Gigerenzer (1996) On Narrow Norms and Vague Heuristics: A Reply to Kahneman and Tversky (1996) (pdf)
Kahneman and Tversky (1996) Postscript (at the end of their 1996 paper)
Gigerenzer (1996) Postscript (at the end of his 1996 paper)

I recommend reading those articles, along with Kahneman and Tversky’s classic Science article (pdf) as background. (And note that the below debate and Gigerenzer’s critique only relates to two of the 12 “biases” covered in that paper.)

I touch on four of Gigerenzer’s arguments (using most of my word count on the first), although there are numerous other fronts:

Argument 1: Does the use of frequentist rather than probabilistic representations make many of the so-called biases disappear? Despite appearances, Kahneman, Tversky and Gigerenzer largely agree on the answer to this question. However, it was largely Gigerenzer’s work that brought this to my attention, so there was clearly some value (for me) to Gigerenzer’s focus.
Argument 2: Can you attribute probabilities to single events? Gigerenzer says no. Here there is a fundamental disagreement. I largely agree with Kahneman and Tversky as to whether this point is fatal to their work.
Argument 3: Are Kahneman and Tversky’s norms content blind? For particular examples, yes. Generally? No.
Argument 4: Should more effort be expended in understanding the underlying cognitive processes or mental models behind these various findings? This is where Gigerenzer’s argument is strongest, and I agree that many of Kahneman and Tversky’s proposed heuristics have weaknesses that need examination.

Putting these four together, I have sympathy for Gigerenzer’s way of thinking and ultimate program of work, but I am much less sympathetic to his desire to pull down Kahneman and Tversky’s findings on the way.

Now into the details.

Argument 1: Does the use of frequentist rather than probabilistic representations make many of the so-called biases disappear?

Gigerenzer’s argues that many biases involving probabilistic decision-making can be “made to disappear” by framing the problems in terms of frequencies rather than probabilities. The back-and-forth on this point centres on three major biases: overconfidence, the conjunction fallacy and base-rate neglect. I’ll take each in turn.

Overconfidence

A typical question from the overconfidence literature reads as follows:

Which city has more inhabitants?

Hyderabad, (b) Islamabad

How confident are you that your answer is correct?

50% 60% 70% 80% 90% 100%

After answering many questions of this form, the usual finding is that where people are 100% confident they had the correct answer, they might be correct only 80% of the time. When 80% confident, they might get only 65% correct. This discrepancy is often called “overconfidence”. [I’ve written elsewhere about the need to disambiguate different forms of overconfidence.]

There are numerous explanations for this overconfidence, such as confirmation bias, although in Gigerenzer’s view this is “a robust fact waiting for a theory”.

But what if we take a different approach to this problem. Gigerenzer (1991) writes:

Assume that the mind is a frequentist. Like a frequentist, the mind should be able to distinguish between single-event confidences and relative frequencies in the long run.

This view has testable consequences. Ask people for their estimated relative frequencies of correct answers and compare them with true relative frequencies of correct answers, instead of comparing the latter frequencies with confidences.

He tested this idea as follows:

Subjects answered several hundred questions of the Islamabad-Hyderabad type … and in addition, estimated their relative frequencies of their correct answers. …

After a set of 50 general knowledge questions, we asked the same subjects, “How many of these 50 questions do you think you got right?”. Comparing their estimated frequencies with actual frequencies of correct answers made “overconfidence” disappear. …

The general point is (i) a discrepancy between probabilities of single events (confidences) and long-run frequencies need not be framed as an “error” and called “overconfidence bias”, and (ii) judgments need not be “explained” by a flawed mental program at a deeper level, such as “confirmation bias”.

Kahneman and Tversky agree:

May (1987, 1988) was the first to report that whereas average confidence for single items generally exceeds the percentage of correct responses, people’s estimates of the percentage (or frequency) of items that they have answered correctly is generally lower than the actual number. … Subsequent studies … have reported a similar pattern although the degree of underconfidence varied substantially across domains.

Gigerenzer portrays the discrepancy between individual and aggregate assessments as incompatible with our theoretical position, but he is wrong. On the contrary, we drew a distinction between two modes of judgment under uncertainty, which we labeled the inside and the outside views … In the outside view (or frequentistic approach) the case at hand is treated as an instance of a broader class of similar cases, for which the frequencies of outcomes are known or can be estimated. In the inside view (or single-case approach) predictions are based on specific scenarios and impressions of the particular case. We proposed that people tend to favor the inside view and as a result underweight relevant statistical data. …

The preceding discussion should make it clear that, contrary to Gigerenzer’s repeated claims, we have neither ignored nor blurred the distinction between judgments of single and of repeated events. We proposed long ago that the two tasks induce different perspectives, which are likely to yield different estimates, and different levels of accuracy (Kahneman and Tversky, 1979). As far as we can see, Gigerenzer’s position on this issue is not different from ours, although his writings create the opposite impression.

So we leave this point with a degree of agreement.

Conjunction fallacy

The most famous illustration of the conjunction fallacy is the “Linda problem”. Subjects are shown the following vignette:

Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.

They are then asked which of the following two alternatives was more probable (either as just those two options, as part of a longer list of options, or across different experimental subjects):

Linda is a bank teller

Linda is a bank teller and is active in the feminist movement

In the original Tversky and Kahneman experiment, when shown only those two options, 85% of subjects chose the second. Tversky and Kahneman argued this was an error as the probability of the conjunction of two events can never be greater than one of its constituents.

Once again Gigerenzer reframed for the frequentist mind (quoting from the 1996 article):

There are 100 persons who fit the description above (i.e. Linda’s). How many of them are:

bank tellers

bank tellers and active in the feminist movement.

As Gigerenzer states:

If the problem is phrased in this (or a similar) frequentist way, then the “conjunction fallacy” largely disappears.

…

The postulated representativeness heuristic cannot account for this dramatic effect.

Gigerenzer’s 1993 article expands on this latter point:

If the mind solves the problem using a representative heuristic, changes in representation should not matter, because they do not change the degree of similarity. … Subjects therefore should still exhibit the conjunction fallacy.

Kahneman and Tversky’s response starts with the note that their first demonstration of the conjunction fallacy involved judgments of frequency. They asked subjects:

to estimate the number of “seven-letter words of the form ‘—–n-’ in 4 pages of text.” Later in the same questionnaire, those subjects estimated the number of “seven-letter words of the form ‘—-ing’ in 4 pages of text.” Because it is easier to think of words ending with “ing” than to think of words with “n” in the next-to-last position, availability suggests that the former will bejudged more numerous than the latter, in violation of the conjunction rule. Indeed, the median estimate for words ending with “ing” was nearly three times higher than for words with “n” in the next-to-the-last position. This finding is a counter-example to Gigerenzer’s often repeated claim that conjunction errors disappear in judgments of frequency, but we have found no mention of it in his writings.

Here Gigerenzer stretches his defence of human consistency a step too far:

[T]he effect depends crucially on presenting the two alternatives to a participant at different times, that is, with a number (unspecified in their reports) of other tasks between the alternatives. This does not seem to be a violation of internal consistency, which I take to be the point of the conjunction fallacy.

Kahneman and Tversky also point out that they they had studied the effect of frequencies in other contexts:

We therefore turned to the study of cues that may encourage extensional reasoning and developed the hypothesis that the detection of inclusion could be facilitated by asking subjects to estimate frequencies. To test this hypothesis, we described a health survey of 100 adult men and asked subjects, “How many of the 100 participants have had one or more heart attacks?” and “How many of the 100 participants both are over 55 years old and have had one or more heart attacks?” The incidence of conjunction errors in this problem was only 25%, compared to 65% when the subjects were asked to estimate percentages rather than frequencies. Reversing the order of the questions further reduced the incidence to 11%.

Kahneman and Tversky go on to state:

Gigerenzer has essentially ignored our discovery of the effect of frequency and our analysis of extensional cues. As primary evidence for the “disappearance” of the conjunction fallacy in judgments of frequency, he prefers to cite a subsequent study by Fiedler (1988), who replicated both our procedure and our findings, using the bank-teller problem. … In view of our prior experimental results and theoretical discussion, we wonder who alleged that the conjunction fallacy is stable under this particular manipulation.

Gigerenzer concedes, but then turns to Kahneman and Tversky’s lack of focus on this result:

It is correct that they demonstrated the effect on conjunction violations first (but not for overconfidence bias and the base-rate fallacy). Their accusation, however, is out of place, as are most others in their reply. I referenced their demonstration in every one of the articles they cited … It might be added that Tversky and Kahneman (1983) themselves paid little attention to this result, which was not mentioned once in some four pages of discussion.

A debate about who was first and how much focus each gave to the findings is not substantive, but Kahneman and Tversky (1996) do not leave this problem here. While the frequency representation can reduce error when there is the possibility of direct comparison (the same subject sees and provides frequencies for both alternatives), they have less effect in between-subject experiment designs; that is, where one set of subjects will see one of the options and another set of subject the other:

Linda is in her early thirties. She is single, outspoken, and very bright. As a student she majored in philosophy and was deeply concerned with issues of discrimination and social justice.

Suppose there are 1,000 women who fit this description. How many of them are

high school teachers?

bank tellers? or

bank tellers and active feminists?”

One group of Stanford students (N = 36) answered the above three questions. A second group (N = 33) answered only questions (a) and (b), and a third group (N = 31) answered only questions (a) and (c). Subjects were provided with a response scale consisting of 11 categories in approximately logarithmic spacing. As expected, a majority (64%) of the subjects who had the opportunity to compare (b) and (c) satisfied the conjunction rule. In the between-subjects comparison, however, the estimates for feminist bank tellers (median category: “more than 50”) were significantly higher than the estimates for bank tellers … Contrary to Gigerenzer’s position, the results demonstrate a violation of the conjunction rule in a frequency formulation. These findings are consistent with the hypothesis that subjects use representativeness to estimate outcome frequencies and edit their responses to obey class inclusion in the presence of strong extensional cues.

Gigerenzer in part concedes, and in part battles on:

Hence, Kahneman and Tversky (1996) believe that the appropriate reply is to show that frequency judgments can also fail. There is no doubt about the latter …

[T]he between subjects version of the Linda problem is not a violation of internal consistency, because the effect depends on not presenting the two alternatives to the same subject.

It’s right not to describe this as a violation of internal consistency, but for evidence of representativeness affecting judgement and doing so even with frequentist representations, it makes a good case. It is also difficult to argue that the subjects are making a good judgment. Kahneman and Tversky write:

Gigerenzer appears to deny the relevance of the between-subjects design on the ground that no individual subject can be said to have committed an error. In our view, this is hardly more reasonable than the claim that a randomized between-subject design cannot demonstrate that one drug is more effective than another because no individual subject has experienced the effects of both drugs.

Kahneman and Tversky write further in the postscript, possibly conceding on language but not on their substantive point:

This formula will not do. Whether or not violations of the conjunction rule in the between-subjects versions of the Linda and “ing” problems are considered errors, they require explanation. These violations were predicted from representativeness and availability, respectively, and were observed in both frequency and probability judgments. Gigerenzer ignores this evidence for our account and offers no alternative.

I’m with Kahneman and Tversky here.

Base-rate neglect

Base-rate neglect (or the base-rate fallacy) describes situations where a known base rate of an event or characteristic in a reference population is under-weighted, with undue focus given to specific information on the case at hand. An example is as follows:

If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming you know nothing about the person’s symptoms or signs?

The typical result is that around half of the people asked will guess a probability of 95% (even among medical professionals), with less than a quarter giving the correct answer of 2%. The positive result, which has associated errors, is weighted too heavily relative to the base rate of one in a thousand.

Gigerenzer (1991) once again responds with the potential of a frequentist representation to eliminate the bias, drawing on work by Cosmides and Tooby (1990) [The 1990 paper was an unpublished conference paper, but this work was later published here (pdf)]:

One our of 1000 Americans has disease X. A test has been developed to detect when a person has disease X. Every time the test is given to a person who has he disease, the test comes out positive. But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, out of every 1000 people who are perfectly healthy, 50 of them test positive for the disease.

Imagine that we have assembled a random sample of 1000 Americans. They were selected by a lottery. Those who conducted the lottery had no information about the health status of any of these people. How many people who test positive for the disease will actually have the disease? — out of —.

The result:

If the question was rephrased in a frequentist way, as shown above, then the Bayesian answer of 0.02 - that is, the answer “one out of 50 (or 51); - was given by 76% of the subjects. The”base-rate fallacy” disappeared.

Kahneman and Tversky (1996) do not respond to this particular example, beyond a footnote:

Cosmides and Tooby (1996) have shown that a frequentistic formulation also helps subjects solve a base-rate problem that is quite difficult when framed in terms of percentages or probabilities. Their result is readily explained in terms of extensional cues to set inclusion. These authors, however, prefer the speculative interpretation that evolution has favored reasoning with frequencies but not with percentages.

It seems we have agreement on the effect, although a differing interpretation.

Kahneman and Tversky, however, more directly attack the idea that people are natural frequentists.

He [Gigerenzer] offers a hypothetical example in which a physician in a nonliterate society learns quickly and accurately the posterior probability of a disease given the presence or absence of a symptom. … However, Gigerenzer’s speculation about what a nonliterate physician might learn from experience is not supported by existing evidence. Subjects in an experiment reported by Gluck and Bower (1988) learned to diagnose whether a patient has a rare (25%) or a common (75%) disease. For 250 trials the subjects guessed the patient’s disease on the basis of a pattern of four binary symptoms, with immediate feedback. Following this learning phase, the subjects estimated the relative frequency of the rare disease, given each of the four symptoms separately.

If the mind is “a frequency monitoring device,” as argued by Gigerenzer …, we should expect subjects to be reasonably accurate in their assessments of the relative frequencies of the diseases, given each symptom. Contrary to this naive frequentist prediction, subjects’ judgments of the relative frequency of the two diseases were determined entirely by the diagnosticity of the symptom, with no regard for the base-rate frequencies of the diseases. … Contrary to Gigerenzer’s unqualified claim, the replacement of subjective probability judgments by estimates of relative frequency and the introduction of sequential random sampling do not provide a panacea against base-rate neglect.

Gigerenzer (1996) responds:

Concerning base-rate neglect, Kahneman and Tversky … created the impression that there is little evidence that certain types of frequency formats improve Bayesian reasoning. They do not mention that there is considerable evidence (e.g., Gigerenzer & Hoffrage, 1995) and back their disclaimer principally with a disease-classification study by Gluck and Bower (1988), which they summarized thus: “subjects’ judgments of the relative frequency . . . were determined entirely by the diagnosticity of the symptom, with no regard for the base-rate frequencies of the diseases” … To set the record straight, Gluck and Bower said their results were consistent with the idea that “base-rate information is not ignored, only underused” (p. 235). Furthermore, their study was replicated and elaborated on by Shanks (1991), who concluded that “we have no conclusive evidence for the claim . . . that systematic base-rate neglect occurs in this type of situation” (p. 153). Adding up studies in which base-rate neglect appears or disappears will lead us nowhere.

Gigerenzer is right that Kahneman and Tversky were overly strong in their description of the findings of the Gluck and Bower study, but Gigerenzer’s conclusion seems close to that of Kahneman and Tversky. As Kahneman and Tversky wrote:

[I]t is evident that subjects sometimes use explicitly mentioned base-rate information to a much greater extent than they did in our original engineer-lawyer study [another demonstration of base-rate neglect], though generally less than required by Bayes’ rule.

Argument 2: Can you attribute probabilities to single events?

While I leave the question of frequency representations with a degree of agreement, Gigerenzer has a deeper critique of Kahneman and Tversky’s findings. From his 1993 article:

Is the conjunction fallacy a violation of probability theory? Has a person who chooses T&F violated probability theory? The answer is no, if the person is a frequentist such as Richard von Mises or Jerzy Neyman; yes, if he or she is a subjectivist such as Bruno de Finetti; and open otherwise.

The mathematician Richard von Mises, one of the founders of the frequency interpretation, used the following example to make his point:

We can say nothing about the probability of death of an individual even if we know his condition of life and health in detail. The phrase ‘probability of death’, when it refers to a single person, has no meaning at all for us. This is one of the most important consequences of our definition of probability.

(von Mises, 1957/1928: 11)

In this frequentist view, one cannot speak of a probability unless a reference class has been defined. … Since a person is always a member of many reference classes, no unique relative frequency can be assigned to a single person. … Thus, for a strict frequentist, the laws of probability are about frequencies and not about single events such as whether Linda is a bank teller. There, in this view, no judgement about single events can violate probability theory.

… Seen from the Bayesian point of view, the conjunction fallacy is an error.

Thus, choosing T&F in the Linda problem is not a reasoning error. What has been labelled the ‘conjunction fallacy’ here does not violate the laws of probability. It only looks so from one interpretation of probability.

He writes in his 1991 article somewhat more strongly (here talking in the context of overconfidence):

For a frequentist like the mathematician Richard von Mises, the term “probability”, when it refers to a single event, “has no meaning at all for us” … Probability is about frequencies, not single events. To compare the two means comparing applies with oranges.

Even the major opponents of the frequentists - subjectivists such as Bruno de Finetti - would not generally think of a discrepancy between confidence and relative frequency as a “bias”, albeit for different reasons. For a subjectivist, probability is about single events, but rationality is identified with the internal consistency of subjective probabilities. As de Finetti emphasized, “however an individual evaluates the probability of a particular event, no experience can prove him right, or wrong; nor, in general, could any conceivable criterion give any objective sense to the distinction one would like to draw, here, between right and wrong” …

Kahneman and Tversky address this argument across a few of the biases under debate. First, on conjunction errors:

Whether or not it is meaningful to assign a definite numerical value to the probability of survival of a specific individual, we submit (a) that this individual is less likely to die within a week than to die within a year and (b) that most people regard the preceding statement as true—not as meaningless—and treat its negation as an error or a fallacy.

In response, Gigerenzer makes an interesting point that someone asked that question might make a different inference:

One can easily create a context, such as a patient already on the verge of dying, that would cause a sensible person to answer that this patient is more likely to die within a week (inferring that the question is next week versus the rest of the year, because the question makes little sense otherwise). In the same fashion, the Linda problem creates a context (the description of Linda) that makes it perfectly valid not to conform to the conjunction rule.

I think Gigerenzer is right that if you treat the problem as content-blind you might miss the inference the subjects are drawing from the question (more on content-blind norms below). But conversely, Kahneman and Tversky’s general point appears sound.

Kahneman and Tversky also address this frequentist argument in relation to over-confidence:

Proper use of the probability scale is important because this scale is commonly used for communication. A patient who is informed by his surgeon that she is 99% confident in his complete recovery may be justifiably upset to learn that when the surgeon expresses that level of confidence, she is actually correct only 75% of the time. Furthermore, we suggest that both surgeon and patient are likely to agree that such a calibration failure is undesirable, rather than dismiss the discrepancy between confidence and accuracy on the ground that “to compare the two means comparing apples and oranges”

Gigerenzer’s response here is amusing:

Kahneman and Tversky argued that the reluctance of statisticians to make probability theory of norm of all single events “is not generally shared by the public” (p. 585). If this was meant to shift the burden of justification for their norms from the normative theory of probability to the intuitions of ordinary people, it is exceedingly puzzling. How can people’s intuitions be called upon to substitute for the standards of statisticians, in order to prove that people’s intuitions systematically violate the normative theory of probability?

Kahneman and Tversky did not come back on this particular argument, but several points could be made in their favour. First, and as noted above, there can still be errors under frequentist representations. Even if we discard the results with judgments of probability for single events, there is still a strong case for the use of heuristics leading to the various biases.

Second, if a surgeon states they are confident that someone has a 99% probability of complete recovery when they are right only 75% of the time, they are making one of two errors. Either they are making a probability estimate of a single event, which has no meaning at all according to Gigerenzer and von Mises, or they are poorly calibrated according to Kahneman and Tversky.

Third, whatever the philosophically or statistically correct position, we have a practical problem. We have judgements being made and communicated, with subsequent decisions based on those communications. To the extent there are weaknesses in that chain, we will have sub-optimal outcomes.

Putting this together, I feel this argument leaves us at a philosophical impasse, but Kahneman and Tversky’s angle provides scope for practical application and better outcomes. (Look at the training for the Good Judgment Project and associated improvements in forecasting that resulted).

Argument 3: Are Kahneman and Tversky’s norms content blind?

An interesting question about the norms against which Kahneman and Tversky assess the experimental subjects’ heuristics and biases is whether the norms are blind to the content of the problem. Gigerenzer (1996) writes:

[O]n Kahneman and Tversky’s (1996) view of sound reasoning, the content of the Linda problem is irrelevant; one does not even need to read the description of Linda. All that counts are the terms probable and and, which the conjunction rule interprets in terms of mathematical probability and logical AND, respectively. In contrast, I believe that sound reasoning begins by investigating the content of a problem to infer what terms such as probable mean. The meaning of probable is not reducible to the conjunction rule … For instance, the Oxford English Dictionary … lists “plausible,” “having an appearance of truth,” and “that may in view of present evidence be reasonably expected to happen,” among others. … Similarly, the meaning of and in natural language rarely matches that of logical AND. The phrase T&F can be understood as the conditional “If Linda is a bank teller, then she is active in the feminist movement.” Note that this interpretation would not concern and therefore could not violate the conjunction rule.

This is a case where I believe Gigerenzer makes an interesting point on the specific case but is wrong on the broader point. As a start, in discussing their initial results for their 1983 paper, Kahneman and Tversky asked whether people were interpreting the language in different ways, such as asking whether people are taking “Linda is a bank teller” to mean “Linda is a bank teller and not active in the feminist movement.” They considered the content of their problem and ran different experimental specifications to attempt to understand what was occurring.

But as Kahneman and Tversky state in their postscript, critiquing the Linda problem on this point - and only the within subjects experimental design at that - is a narrow view of their work. The point of the Linda problem is to test whether the representativeness of the description changes the assessment. As they write in their 1996 paper:

Perhaps the most serious misrepresentation of our position concerns the characterization of judgmental heuristics as “independent of context and content” … and insensitive to problem representation … Gigerenzer also charges that our research “has consistently neglected Feynman’s (1967) insight that mathematically equivalent information formats need not be psychologically equivalent” … Nothing could be further from the truth: The recognition that different framings of the same problem of decision or judgment can give rise to different mental processes has been a hallmark of our approach in both domains.

The peculiar notion of heuristics as insensitive to problem representation was presumably introduced by Gigerenzer because it could be discredited, for example, by demonstrations that some problems are difficult in one representation (probability), but easier in another (frequency). However, the assumption that heuristics are independent of content, task, and representation is alien to our position, as is the idea that different representations of a problem will be approached in the same way.

This is a point where you need to look across the full set of experimental findings, rather than critiquing them one-by-one. Other experiments have people violating the conjunction rule while betting on sequences generated by a dice, where there were no such confusions to be had about the content.

Much of the issue is also one of focus. Kahneman and Tversky have certainly investigated the question of how representation changes the approach to a problem. However, it is set out in a different way to that Gigerenzer might have liked.

Argument 4: Should more effort be expended in understanding the underlying cognitive processes or mental models behind these various findings?

We now come to an important point: what is the cognitive process behind all of these results? Gigerenzer (1996) writes:

Kahneman and Tversky (1996) reported various results to play down what they believe is at stake, the effect of frequency. In no case was there an attempt to figure out the cognitive processes involved. …

Progress can be made only when we can design precise models that predict when base rates are used, when not, and why

I can see why Kahneman and Tversky focus on the claims regarding frequency representations when Gigerenzer makes such strong statements about making biases “disappear”. The statement that in no case have they attempted to figure out the cognitive processes involved is also overly strong, as a case could be made that the heuristics are those processes.

However, Gigerenzer believes Kahneman and Tversky’s heuristics are too vague for this purpose. Gigerenzer (1996) wrote:

The heuristics in the heuristics-and-biases program are too vague to count as explanations. … The reluctance to specify precise and falsifiable process models, to clarify the antecedent conditions that elicit various heuristics, and to work out the relationship between heuristics have been repeatedly pointed out … The two major surrogates for modeling cognitive processes have been (a) one-word-labels such as representativeness that seem to be traded as explanations and (b) explanation by redescription. Redescription, for instance, is extensively used in Kahneman and Tversky’s (1996) reply. … Why does a frequency representation cause more correct answers? Because “the correct answer is made transparent” (p. 586). Why is that? Because of “a salient cue that makes the correct answer obvious” (p. 586). or because it “sometimes makes available strong extensional cues” (p. 589). Researchers are no closer to understanding which cues are more “salient” than others, much less the underlying process that makes them so.

…

The reader may now understand why Kahneman and Tversky (1996) and I construe this debate at different levels. Kahneman and Tversky centered on norms and were anxious to prove that judgment often deviates from those norms. I am concerned with understanding the processes and do not believe that counting studies in which people do or do not conform to norms leads to much. If one knows the process, one can design any number of studies wherein people will or will not do well.

This passage by Gigerenzer captures the state of the debate well. However, Kahneman and Tversky are relaxed about the lack of full specification, and sceptical that process models are the approach to provide that detail. They write in the 1996 postscript:

Gigerenzer rejects our approach for not fully specifying the conditions under which different heuristics control judgment. Much good psychology would fail this criterion. The Gestalt rules of similarity and good continuation, for example, are valuable although they do not specify grouping for every display. We make a similar claim for judgmental heuristics.

Gigerenzer legislates process models as the primary way to advance psychology. Such legislation is unwise. It is useful to remember that the qualitative principles of Gestalt psychology long outlived premature attempts at modeling. It is also unwise to dismiss 25 years of empirical research, as Gigerenzer does in his conclusion. We believe that progress is more likely to come by building on the notions of representativeness, availability, and anchoring than by denying their reality.

To me, this is the most interesting point of the debate. I have personally struggled to grasp the precise operation of many of Kahneman and Tversky’s heuristics and how their operation would change across various domains. But are more precisely specified models the way forward? Which are best at explaining the available data? We have now had over 20 years of work since this debate to see if this is an unwise or fruitful pursuit. But that’s a question for another day.