Gary Klein’s Sources of Power: How People Make Decisions

Summary: An important book describing how many experts make decisions, but with a lingering question mark about how good these decisions actually are.

—-

Gary Klein’s Sources of Power: How People Make Decisions is somewhat of a classic, with the version I read being a 20th anniversary edition issued by MIT Press. Klein’s work on expert decision making has reached a broad audience through Malcolm Gladwell’s Blink, and Klein’s adversarial collaboration with Daniel Kahneman (pdf) has given his work additional academic credibility.

However, throughout the growing application of behavioural science in public policy and the private sphere, I have rarely seen Klein’s work practically applied to improve decision making. The rare occasions where I see it referenced typically involve an argument that the conditions for the development of expertise do not exist in a particular domain.

This lack of application partly reflects the target of Klein’s research. Sources of Power is an exploration of what Klein calls naturalistic decision making. Rather than studying novices performing artificial tasks in the laboratory, naturalistic decision making involves the study of experienced decision makers performing realistic tasks. Klein’s aim is to document the strengths and capabilities of decision makers in natural environments with high stakes, such as lost lives or millions of dollars down the drain. It often involves uncertainty or missing information. The goals may be unclear. Klein’s focus is therefore in the field and the decisions of people such as firefighters, nurses, pilots and military personnel. They are typically people who have had many years of experience. They are “experts”.

The exploration of naturalistic decision making contrasts with the heuristics and biases program, which typically focuses on the limitations of decision makers and is the staple fodder of applied behavioural scientists. Using the findings of experimental outputs from the heuristics and biases program to tweak decision environments and measure the response across many decision makers (typically through a randomised controlled trial) is more tractable than exploring, modifying and measuring the effect of interventions to improve the rare, high-stakes decisions of experts in environments where the goal itself might not even be clear.

Is Klein’s work “science”?

The evidence that shapes Sources of Power was typically obtained through case interviews with decision makers and by observing these decision makers in action. There are no experiments, with the data obtained through interviews. The interviews are coded for analysis to attempt to find patterns in the approaches of the decision makers.

Klein is cognisant of the limitations of this approach. He notes that he gives detailed descriptions of each study so that we can judge the weaknesses of his approach ourselves. This brings his approach closer to what he considers to be a scientific piece of research. Klein writes:

What are the criteria for doing a scientific piece of research? Simply, that the data are collected so that others can repeat the study and that the inquiry depends on evidence and data rather than argument. For work such as ours, replication means that others could collect data the way we have and could also analyze and code the results as we have done.

The primary “weakness” of his approach is the reliance on observational data, not experiments. As Klein suggests, there are plenty of other sciences that have this feature. His approach is closer to anthropology that psychology. But obviously, an approach constrained to the laboratory has its own limitations:

Both the laboratory methods and the field studies have to contend with shortcomings in their research programs. People who study naturalistic decision making must worry about their inability to control many of the conditions in their research. People who use well-controlled laboratory paradigms must worry about whether their findings generalize outside the laboratory.

Klein has a faith in stories (the subject of one of the chapters) serving as natural experiments linking a network of causes to their effects. It is a fair point that stories can be used to communicate subtle points of expertise, but using them to reliably identify cause-effect relationships seems a step too far.

Recognition-primed decision making

Klein’s “sources of power” for decision-making by experts are intuition, mental simulation, metaphor and storytelling. This is in contrast to what might be considered a more typical decisions-making toolkit (the one you are more likely to be taught) of logical thinking, probabilistic analysis and statistics.

Klein’s workhorse model integrating these sources of power is recognition-primed decision making. This is a two stage process, involving an intuitive recognition of what response is required, followed by mental simulation of the response to see if it will work. Metaphors and storytelling are mental simulation tools. The recognition-primed model involves a blend of intuition and analysis, so is not just sourced from gut feelings.

From the perspective of the decision maker, someone using this model might not consider that they are making a decision. They are not generating options and then evaluating them to determine the best choice.

Instead, they would see their situation as a prototype for which they know the typical course of action right away. As their experience allowed them to generate a reasonable response at the first instance, they do not need to think of others. They simply evaluate the first option, and if suitable, execute. A decision was made in that alternative courses of action were available and could have been chosen. But there was no explicit examination across options.

Klein calls this process singular evaluation, as opposed to comparative evaluation. Singular evaluation may involve moving through multiple options, but each is considered on its own merits sequentially until a suitable option is found, with the search stopping at that point.

The result of this process is “satisficing”, a term coming from Herbert Simon. These experts do not optimise. They pick the first option that works.

Klein’s examination of various experts found that the recognition-primed decision model was the dominant mode of decision making, despite his initial expectation of comparative evaluation. For instance, fireground commanders used recognition-primed decision making for around 80% of the decisions that Klein’s team examined. Klein also points to similar evidence of decision making by chess grandmasters, who spend little time comparing the strengths and weaknesses of one move to another. Most of their time involves simulating the consequences and rejecting moves.

Mental simulation

Mental simulation involves the expert imagining the situation and transforming the situation until can they picture it in a different way from the start. Mental simulations are typically not overly elaborate, and generally rely on just a few factors (rarely more than three). The expert runs the simulation and assesses: can it pass an internal evaluation? Sometimes mental simulation can be wrong, but Klein considers them to be fairly accurate.

Klein’s examples of mental simulation were not always convincing. For example, he describes an economist who mentally simulated what the Polish economy would do following interventions to reduce inflation. It is hard to take seriously single examples of such mental simulation hitting the mark when I am aware of so many backfires in this space. And how would expertise in such economic simulations develop? (More on developing expertise below.)

One strength of simulations is that they can be used where traditional decision analytic strategies do not apply. You can use simulations (or stories) if you cannot otherwise remember every piece of information. Klein points to evidence that this is how juries absorb evidence.

One direct use of simulation is the premortem strategy. Imagine in the future plan has failed and you have to understand why. You can also do simulation through decision scenarios.

Novices versus experts

Expertise has many advantages. Klein notes experts can see the world differently, have more procedures to apply, notice problems more quickly, generate richer mental simulations and have more analogies to draw on. Experts can see things that novices can’t. They can see anomalies, violations of expectancies, the big picture, how things work, additional opportunities and improvisations, future events, small differences, and their own limitations.

Interestingly, while experts tend not to carefully deliberate about the merits of different courses of action, novices need to compare different approaches. Novices are effectively thinking through the problem from scratch. The rational choice method helps us when we lack the expertise to assess a situation.

Another contrast is where effort is expended. Experts spend most of their effort on situation assessment – this gives the answers. Novices spend more time on determining the course of action.

One interesting thread concerned what happened when time pressure was put on chess players. Time constraints barely degraded the performance of masters, while it destroyed that of novices. The masters often came up with their best move first, so there is no need for the time to test a lot of options.

Developing good decision making

Given the differences between novices and experts, how should novices develop good decision making? Klein suggests this should not be done through training in formal methods of analysis. In fact, this could get in the way of developing expertise. There is also no need to teach the recognition-primed model as it is descriptive: it shows what good decision makers already do. We shouldn’t teach people to think like experts.

Rather, we should teach people to learn like experts. They should engage in deliberate practice, obtain feedback that is accurate and timely, and enrich learning by reviewing prior experience and examining mistakes. The intuition that drives recognition grows out of experience.

Recognition versus analytical methods

Klein argues that recognition strategies are not a substitute for analytical methods, but an improvement. Analytical methods are the fallback for those without experience.

Klein sees a range of environments where recognition strategies will be the superior options. These include the presence of time pressure, when the decision maker is experienced in the domain, when conditions are dynamic (meaning effort can be rendered useless if conditions shift), and when the goals ill-defend (making it hard to develop evaluation criteria). Comparative evaluation is more useful where people have to justify choice, where it is required for conflict resolution, where you are trying to optimise (as opposed to finding just workable option), and where the decision is computationally complex (e.g. investment portfolio).

From this, it is hard to use a rigorous analytical approach in many natural settings. Rational, linear approaches run into problems when the goal is shifting or ill-defined.

Diagnosing poor decisions

I previously posted some of Klein’s views on the heuristics and biases approach to assessing decision quality. Needless to say, Klein is sceptical that poor decisions are largely due to faulty reasoning. More effort should be expended in finding the sources of poor decisions, rather than blaming the operator.

Klein describes a review a sample of 25 decisions with poor outcomes (from 600 he had available) to assess what went wrong. Sixteen outcomes were due to lack of experience, such as someone not realising that construction of the building on fire was problematic. The second most common issue was lack of information. The third most common involved noticing but explaining away problems during mental simulation – possibly involving bias.

Conditions for expertise

The conditions for developing the expertise for effective recognition-primed decision making is delved into in depth in Klein’s article with Daniel Kahneman, Conditions for Intuitive Exertise: A Failure to Disagree (pdf). However, Klein does examine this area to some degree in Sources of Power.

Klein notes that it is one thing to gain experience, and another to turn that into expertise.  It is often difficult to see cause and effect relationships. There is typically delay between the two. It is difficult to disentangle luck and skill. Drawing on work by Jim Shanteau, Klein also notes that expertise was hard to develop when the domain is dynamic, we need to predict human behaviour, there is less chance for feedback, there is not enough repetition to get sense of typicality or there are fewer trials. Funnily enough, this description seems to align somewhat with many of the naturalistic decision making environments.

Despite these barriers, Klein believes that it is possible to get expertise in some areas, such as fighting fires, caring for hospitalised infants or flying planes. Less convincingly (given some research in the area), he also references the fine discrimination of wine tasters (e.g.).

Possibly my biggest criticism of Klein’s book relates to this final point, as he provides little evidence for the conversion of experience into expertise beyond the observation that in many of these domains novices are completely lost. Is the best benchmark a comparison with a novice who has no idea, or is it better to look at, say, a simple algorithm, statistical rule, or someone with basic training?

A review of 2018 and some thoughts on 2019

As a record largely for myself, below are some notes in review of 2018 and a few thoughts about 2019.

Writing: I started 2018 intending to post to this blog at least once a week, which I did. I set this objective as I had several long stretches in 2017 where I dropped the writing habit.

I write posts in batches and schedule in advance, so the weekly target did not require a weekly focus. However, at times I wrote shorter posts that I might not have otherwise written to make sure there was a sufficient pipeline. Traffic for the blog was similar to the previous year, with around 100,000 visitors, although unlike previous years there was no runaway post with tens of thousands of views. Three of the 10 most popular posts were written during the year.

In 2019, I am relaxing my intention to post on the blog every week (although that will largely still happen). I will prioritise writing on what I want to think about, rather than achieving a consistent flow of posts.

I wrote three articles for Behavioral Scientist during the year. I plan to increase my output for external forums such as Behavioural Scientist in 2019. My major rationale for blogging is that I think (and learn) about issues better when I write for public consumption, and forums outside of the blog escalate that learning experience.

I also had a paper published in Evolution & Human Behavior (largely written in 2017). For the immediate future, I plan to stop writing academic articles unless I come up with a cracker of an idea. Having another academic publication provides little career value, and the costs of the academic publication process outweigh the limited benefit that comes from the generally limited readership.

For some time I have had two book ideas that I would love to attack, but I did not progress in 2018. One traces back to my earlier interest and writings on the link between economics and evolutionary biology. The other is an attempt to distil the good from the bad in behavioural economics – a sceptical take if you like. Given what else is on my plate (particularly a new job), I’d need a strong external stimulus to progress these in 2019, but I wouldn’t rule out dabbling with one.

Reading: I read 79 books in 2018 (47 non-fiction, 32 fiction). I read fewer books than a typical year, largely due to having three children four and under. My non-fiction selection was less deliberate than I would have liked and included fewer classics than I planned. In 2019 I plan to read more classics and more books that directly relate to what I am doing or want to learn, and picking up fewer books on whim.

I’m not sure how many academic articles I read, but I read at least part of an article most days.

Focus: I felt the first half of 2018 was more focused and productive than the second. For various reasons, I got sucked into a few news cycles late in the year, with almost zero benefit. I continued to use most of the productivity hacks described in my post on how I focus (and live) – one of the most popular posts this year, and continue to struggle with the distraction of email.

I am meditating less than when I wrote that post (then daily), but still do meditate a couple of times a week for 10 to 20 minutes when I am able to get out for a walk at lunch. I use 10% Happier for this. I find meditation most useful as a way to refocus, as opposed to silencing or controlling the voices in my head.

Health: I continue to eat well (three parts Paleo, one part early agriculturalist), excepting the Christmas break where I relax all rules (I like to think of myself as a Hadza tribesman discovering a bunch of bee hives, although it’s more a case of me simply eating like the typical Australian for a week or two).

I surf at least once most weeks. My gym attendance waxed and waned based on various injuries (wrist, back), so my strength and fitness is below the average level of the last five years, although not by a great amount.

With all of the chaps generally sleeping through the night, I had the best year of sleep I have had in three years.

Work: I lined up a new role to start in late January this year. For almost three years I have been building the data science capability in my organisation, and have recruited a team that is largely technically stronger than me and can survive (thrive) without me. I’m shifting back into a behavioural science role (although similarly building a capability), which is closer to my interests and skillset. I’m also looking forward to shifting back into the private sector.

I plan to use the change in work environment to reset some work habits, including batching email and entering each day with a better plan on how I will tackle the most important (as opposed to the most urgent) tasks.

Life: Otherwise, I had few major life events. I bought my first house (settlement coming early this year). It meets the major goals of being five minutes walk from a surfable beach, next to a good school, and sufficient to cater to our needs for at least the next ten years.

Another event that had a large effect on me was an attempt to save a drowning swimmer while surfing at my local beach (some news on it here and here). It reinforced something that I broadly knew about myself – that I feel calm and focused in a crisis, but typically dwell heavily on it in the aftermath. My attention was shot for a couple of weeks after. It was also somewhat of a learning experience of how difficult a water rescue is and how different CPR is on a human compared to a training dummy. My thinking about this day has brought a lot of focus onto what I want to do this year.

Carol Dweck’s Mindset: Changing the Way You Think to Fulfil Your Potential

I did not find Carol Dweck’s Mindset: Changing the Way You Think to Fulfil Your Potential to be a compelling translation of academic work into a popular book. To all the interesting debates concerning growth mindset – such as Scott Alexander’s series of growth mindset posts (1, 2, 3 and 4), the recent meta-analysis (with Carol Dweck response), and replication of the effect – the book adds little material that might influence your views. If you want to better understand the case for (or against) growth mindset and its link with ability or performance, skip the book, follow the above links and go to the academic literature.

As a result, I will limit my comments on the book to a few narrow points, and add a dash of personal reflection.

In the second in his series, Alexander describes two positions on growth mindset. The first is the “bloody obvious position”:

[I]nnate ability might matter, but that even the most innate abilityed person needs effort to fulfill her potential. If someone were to believe that success were 100% due to fixed innate ability and had nothing to do with practice, then they wouldn’t bother practicing, and they would fall behind. Even if their innate ability kept them from falling behind morons, at the very least they would fall behind their equally innate abilityed peers who did practice.

Dweck and Alexander (and I) believe this position.

Then there is the controversial position:

The more important you believe innate ability to be compared to effort, the more likely you are to stop trying, to avoid challenges, to lie and cheat, to hate learning, and to be obsessed with how you appear before others. …

To distinguish the two, Alexander writes:

In the Bloody Obvious Position, someone can believe success is 90% innate ability and 10% effort. They might also be an Olympian who realizes that at her level, pretty much everyone is at a innate ability ceiling, and a 10% difference is the difference between a gold medal and a last-place finish. So she practices very hard and does just as well as anyone else.

According to the Controversial Position, this athlete will still do worse than someone who believes success is 80% ability and 20% effort, who will in turn do worse than someone who believes success is 70% ability and 30% effort, all the way down to the person who believes success is 0% ability and 100% effort, who will do best of all and take the gold medal.

The bloody obvious and controversial positions are often conflated in popular articles, and in Dweck’s book the lack of differentiation is shifted up another gear. The book is interspersed with stories about people expending some effort to improve or win, with almost no information as to what they believe about growth and ability. The fact that they are expending effort is almost taken to be evidence of the growth mindset. At best the stories are evidence toward the bloody obvious position.

But Dwecks’s strong statements about growth mindset through the book make it clear that she holds the controversial position. Here are some snippets from the introduction:

Believing that your qualities are carved in stone—the fixed mindset—creates an urgency to prove yourself over and over.

[I]t’s startling to see the degree to which people with the fixed mindset do not believe in effort.

It’s not just that some people happen to recognize the value of challenging themselves and the importance of effort. Our research has shown that this comes directly from the growth mindset.

Although Dweck marshals her stories from business and sports to support these “controversial position” claims, it does’t work absent the evidence of beliefs. Add in the survivorship bias in the examples at hand, plus the halo effect in assessing whether the successful people have a “growth mindset”, and there is little compelling evidence that these people held a growth or fixed mindset (as per the “controversial position) and that the mindset in turn caused the outcomes.

To find evidence in support of Dweck’s statements you need to turn to the academic work, but Dweck covers her research in little detail. From the limited descriptions in the book, it was often hard to know what the experiment involved and how much weight to give it. The book pointed me to interesting papers, but absent that visit to the academic literature I felt lost.

One point that becomes clear through the book is that Dweck sees people with growth mindsets having a host of other positive traits. At times this feels like growth mindset is being expanded to encompass all positive behaviours. These included:

  • Embracing challenges and persisting after setbacks
  • Seeking and learning from criticism
  • Understanding the need to invest effort to develop expertise
  • Seeking forgiveness rather than revenge against those who have done them wrong, such as when they are bullied
  • Being happy whatever their outcomes (be it their own, their team’s or their child’s)
  • Compassion and consideration when coaching, rather than through fear and intimidation
  • More accurate estimation of performance and ability
  • Accurately weighting positive and negative information (compared to the extreme reactions of the fixed mindset people)

I need to better understand the literature on how growth mindset correlates with (or causes) these kinds of behaviours, but a lot is being put into the growth mindset basket.

In some of Dweck’s examples of people without a growth mindset, there is a certain boldness. John McEnroe is a recurring example, despite his seven singles and ten doubles grand slam titles. On McEnroe’s note that part of 1982 did not go as well as expected when little things kept him off his game (he still ended the year number one), Dweck asks “Always a victim of outside forces. Why didn’t he take charge and learn how to perform well in spite of them?” McEnroe later recorded the best single season record in the open era (82-3 in 1984), ending the year at number one for the fourth straight time. McEnroe feels he did not fulfil his potential as he often folded when the going got tough, but would have he had really been more successful with a “growth mindset”?

Similarly, Mike Tyson is labelled as someone who “reached the top, but … didn’t stay there”, despite having the third longest unified championship reign in heavyweight history with eight consecutive defences. Tyson obviously had some behavioural issues, but would he have been the same fighter if he didn’t believe in his ability?

——

On a personal angle. Dweck’s picture of someone with “fixed mindset” is a good description of me. Through primary and high-school I was always the “smartest” kid in (my small rural then regional) school, despite investing close to zero effort outside of the classroom. I spent the evenings before my university entrance exams shooting a basketball.

My results gave me the pick of Australian universities and scholarships, but I then dropped out of my first two attempts at university, and followed that by dropping out of Duntroon (Australia’s army officer training establishment, our equivalent to West Point). For the universities, lecture attendance alone was not enough. I was simply too lazy and immature to make it through Duntroon. (Maybe I lacked “grit“.)

After working in a chicken factory to fund a return to university (not recommended), I finally obtained a law degree, although I did so with a basic philosophy of doing just enough to pass.

Through this stretch, I displayed a lot of Dweck’s archetypical “fixed mindset” behaviours. I loved demonstrating how smart I was in domains where I was sure I would do well, and hated the risk of being shown up as otherwise in any domain where I wasn’t. (My choice of law was somewhat strange in this regard, as my strength is my quantitative ability. I chose law largely because this is what “smart” kids do.) I dealt with failure poorly.

It took five years after graduation before I finally realised that I needed to invest some effort to get anywhere – which happened to be a different direction to where I had previously been heading. I have spent most of my time since then investing in my intellectual capital. I am more than willing to try and fail. I am always looking for new learning opportunities. I am happy asking “dumb” questions. I want to prove myself wrong.

Do I now have a “growth mindset”? I don’t believe that anyone can achieve anything. IQ is malleable but only at the margins, and we have a very poor understanding of how to do this. But I have a strong belief that effort pays off, and that absent effort natural abilities can be wasted. I hold the bloody obvious position but not the controversial position. If I was able to blot from my mind the evidence for, say, the genetic contribution to intelligence, could I do even better?

Despite finding limited value in the book from an intellectual standpoint, I can see its appeal. It was a reminder of the bloody obvious position. It highlighted that many of the so-called growth mindset traits or behaviours can be valuable (whether or not they are accompanied by a growth mindset). There was something in there that suggested I should try a bit harder. Maybe that makes it a useful book after all.

Books I read in 2018

The best books I read in 2018 – generally released in other years – are below. Where I have reviewed, the link leads to that review.

  • Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (2014) – Changed my mind, and gave me a framework for thinking about the problem that I didn’t have before.
  • Annie Duke, Thinking in Bets: Making Smarter Decisions When You Don’t Have All the Facts (2018) – While I have many small quibbles with the content, and it could easily have been a long-form article, I  liked the overarching approach and framing.
  • Gary Klein, Sources of Power: How People Make Decisions (1998) – Rightfully considered classic in decision-making. Review coming soon
  • Michael Lewis’s The Undoing Project (2016) – Despite focusing on Kahneman and Tversky’s relationship, it is also one of the better introductions to their work.
  • Robert Sapolsky’s Behave: The Biology of Humans at Our Best and Worst (2017) – A wonderful examination of what “causes” of our actions. Sapolsky zooms out from the almost immediate activity in our brain, to the actions of our hormones over seconds to hours, through our developmental influences, out to our evolutionary past. Review also coming soon.
  • Robert Sapolsky’s Why Zebras Don’t Get Ulcers (3rd ed, 2004) – Great writing and interesting science.
  • Fred Schwed, Where Are the Customer’s Yachts? (1955) – Timeless commentary on the value delivered by the financial services sector
  • Robert Sugden, The Community of Advantage: A Behavioural Economist’s Defence of the Market (2018) – The most compelling critique of the practical application of behavioural economics that I have read.
  • Joseph Conrad, Lord Jim – I love Conrad. Nostromo is possibly my favourite book.
  • Daphne Du Maurier, Rebecca
  • Henry James, Turn of the Screw

Below is the full list of books that I read in 2018 (with links where reviewed and starred if a reread). Relative to previous years, I read (and reread) fewer books in total, less non-fiction, more fiction. That was largely a consequence of regularly reading my youngest to sleep.

My non-fiction reading through 2018 was less deliberate than I would have liked. There are fewer timeless pieces in the list than usual, with many of the choices based on whim or the particular piece of work I was doing at the time.

Non-Fiction

Fiction

  • Christopher Buckley, Thank You For Smoking*
  • Edgar Rice Burroughs, The Return of Tarzan
  • Edgar Rice Burroughs, Tarzan of the Apes
  • Ray Bradbury, Farenheit 451*
  • Joseph Conrad, Lord Jim
  • James Fenimoore Cooper, The Last of the Mohicans
  • Charles Dickens, Bleak House
  • Charles Dickens, A Christmas Carol
  • Charles Dickens, A Tale of Two Cities
  • Charles Dickens, Hard Times
  • Wilkie Collins, The Moonstone
  • Arthur Conan Doyle, A Study in Scarlet*
  • Arthur Conan Doyle, The Hound of the Baskervilles*
  • Arthur Conan Doyle, The Lost World
  • Daphne Du Maurier, Rebecca
  • George Elliott, Middlemarch
  • Gustave Flaubert, Madame Bovary
  • Henry James, Turn of the Screw
  • James Joyce, The Dubliners
  • Andrew Lang, The Arabian Nights
  • George Bernard Shaw, Pygmalion
  • Upton Sinclair, The Jungle
  • Robert Louis Stevenson, Kidnapped
  • Bram Stoker, Dracula*
  • JRR Tolkein, The Hobbit*
  • Mark Twain, The Adventures of Huckleberry Finn
  • Mark Twain, The Adventures of Tom Sawyer*
  • Jules Verne, Journey to the Center of the Earth
  • Andy Weir, The Martian
  • Oscar Wilde, The Picture of Dorian Gray
  • PG Wodehouse, Carry on, Jeeves
  • PG Wodehouse, Meet Mr Mulliner

Gary Klein on confirmation bias in heuristics and biases research, and explaining everything

Confirmation bias

In Sources of Power: How People Make Decisions (review coming soon), Gary Klein writes:

Kahneman, Slovic, and Tversky (1982) present a range of studies showing that decision makers use a variety of heuristics, simple procedures that usually produce an answer but are not foolproof. … The research strategy was not to demonstrate how poorly we make judgments but to use these findings to uncover the cognitive processes underlying judgments of likelihood.

Lola Lopes (1991) has shown that the original studies did not demonstrate biases, in the common use of the term. For example, Kahneman and Tversky (1973) used questions such as this: “Consider the letter R. Is R more likely to appear in the first position of a word or the third position of a word?” The example taps into our heuristic of availability. We have an easier time recalling words that begin with R than words with R in the third position. Most people answer that R is more likely to occur in the first position. This is incorrect. It shows how we rely on availability.

Lopes points out that examples such as the one using the letter R were carefully chosen. Of the twenty possible consonants, twelve are more common in the first position. Kahneman and Tversky (1973) used the eight that are more common in the third position. They used stimuli only where the availability heuristic would result in a wrong answer. … [I have posted some extracts of Lopes’s article here.]

There is an irony here. One of the primary “biases” is confirmation bias—the search for information that confirms your hypothesis even though you would learn more by searching for evidence that might disconfirm it. The confirmation bias has been shown in many laboratory studies (and has not been found in a number of studies conducted in natural settings). Yet one of the most common strategies of scientific research is to derive a prediction from a favorite theory and test it to show that it is accurate, thereby strengthening the reputation of that theory. Scientists search for confirmation all the time, even though philosophers of science, such as Karl Popper (1959), have urged scientists to try instead to disconfirm their favorite theories. Researchers working in the heuristics and biases paradigm condemn this sort of bias in their subjects, even as those same researchers perform more laboratory studies confirming their theories.

On explaining everything

On 3 July 1988 a missile fired from the USS Vincennes destroyed a commercial Iran Air flight taking off over the Persian gulf, killing all onboard. The crew of the Vincennes had incorrectly identified the aircraft as an attacking F-14.

Klein writes:

The Fogarty report, the official U.S. Navy analysis of the incident, concluded that “stress, task fixation, an unconscious distortion of data may have played a major role in this incident. [Crew members] became convinced that track 4131 was an Iranian F-14 after receiving the … report of a momentary Mode II. After this report of the Mode II, [a crew member] appear[ed] to have distorted data flow in an unconscious attempt to make available evidence fit a preconceived scenario (‘Scenario fulfillment’).” This explanation seems to fit in with the idea that mental simulation can lead you down a garden path to where you try to explain away inconvenient data. Nevertheless, trained crew members are not supposed to distort unambiguous data. According to the Fogarty report, the crew members were not trying to explain away the data, as in a de minimus explanation. They were flat out distorting the numbers. This conclusion does not feel right.

The conclusion of the Fogarty report was echoed by some members of a five-person panel of leading decision researchers, who were invited to review the evidence and report to a congressional subcommittee. Two members of the panel specifically attributed the mistake to faulty decision making. One described how the mistake seemed to be a clear case of expectancy bias, in which a person sees what he is expecting to see, even when it departs from the actual stimulus. He cited a study by Bruner and Postman (1949) in which subjects were shown brief flashes of playing cards and asked to identify each. When cards such as the Jack of Diamonds were printed in black, subjects would still identify it as the Jack of Diamonds without noticing the distortion. The researcher concluded that the mistake about altitude seemed to match these data; subjects cannot be trusted to make accurate identifications because their expectancies get in the way.

I have talked with this decision researcher, who explained how the whole Vincennes incident showed a Combat Information Center riddled with decision biases. That is not how I understand the incident. My reading of the Fogarty report shows a team of men struggling with an unexpected battle, trying to guess whether an F-14 is coming over to blow them out of the water, waiting until the very last moment for fear of making a mistake, hoping the pilot will heed the radio warnings, accepting the risk to their lives in order to buy some more time.

To consider this alleged expectancy bias more carefully, imagine what would have happened if the Vincennes had not fired and in fact had been attacked by an F-14. The Fogarty report stated that in the Persian Gulf, from June 2, 1988, to July 2, 1988, the U.S. Middle East Forces had issued 150 challenges to aircraft. Of these, it was determined that 83 percent were issued to Iranian military aircraft and only 1.3 percent to aircraft that turned out to be commercial. So we can infer that if a challenge is issued in the gulf, the odds are that the airplane is Iranian military. If we continue with our scenario, that the Vincennes had not fired and had been attacked by an F-14, the decision researchers would have still claimed that it was a dear case of bias, except this time the bias would have been to ignore the base rates, to ignore the expectancies. No one can win. If you act on expectancies and you are wrong, you are guilty of expectancy bias. If you ignore expectancies and are wrong, you are guilty of ignoring base rates and expectancies. This means that the decision bias approach explains too much (Klein, 1989). If an appeal to decision bias can explain everything after the fact, no matter what has happened, then there is no credible explanation.

I’m not sure the right base rate is the proportion of aircraft challenged, but it is still an interesting point.

In contrast to less-is-more claims, ignoring information is rarely, if ever optimal

From the abstract of an interesting paper Heuristics as Bayesian inference under extreme priors by Paula Parpart and colleagues:

Simple heuristics are often regarded as tractable decision strategies because they ignore a great deal of information in the input data. One puzzle is why heuristics can outperform full-information models, such as linear regression, which make full use of the available information. These “less-is-more” effects, in which a relatively simpler model outperforms a more complex model, are prevalent throughout cognitive science, and are frequently argued to demonstrate an inherent advantage of simplifying computation or ignoring information. In contrast, we show at the computational level (where algorithmic restrictions are set aside) that it is never optimal to discard information. Through a formal Bayesian analysis, we prove that popular heuristics, such as tallying and take-the-best, are formally equivalent to Bayesian inference under the limit of infinitely strong priors. Varying the strength of the prior yields a continuum of Bayesian models with the heuristics at one end and ordinary regression at the other. Critically, intermediate models perform better across all our simulations, suggesting that down-weighting information with the appropriate prior is preferable to entirely ignoring it. Rather than because of their simplicity, our analyses suggest heuristics perform well because they implement strong priors that approximate the actual structure of the environment.

The following excerpts from the paper (minus references) help give more context to this argument. First, what is meant by a simple heuristic as opposed to a full-information model?

Many real-world prediction problems involve binary classification based on available information, such as predicting whether Germany or England will win a soccer match based on the teams’ statistics. A relatively simple decision procedure would use a rule to combine available information (i.e., cues), such as the teams’ league position, the result of the last game between Germany and England, which team has scored more goals recently, and which team is home versus away. One such decision procedure, the tallying heuristic, simply checks which team is better on each cue and chooses the team that has more cues in its favor, ignoring any possible differences among cues in magnitude or predictive value. … Another algorithm, take-the-best (TTB), would base the decision on the best single cue that differentiates the two options. TTB works by ranking the cues according to their cue validity (i.e., predictive value), then sequentially proceeding from the most valid to least valid until a cue is found that favors one team over the other. Thus TTB terminates at the first discriminative cue, discarding all remaining cues.

In contrast to these heuristic algorithms, a full-information model such as linear regression would make use of all the cues, their magnitudes, their predictive values, and observed covariation among them. For example, league position and number of goals scored are highly correlated, and this correlation influences the weights obtained from a regression model.

So why might less be more?

Heuristics have a long history of study in cognitive science, where they are often viewed as more psychologically plausible than full-information models, because ignoring data makes the calculation easier and thus may be more compatible with inherent cognitive limitations. This view suggests that heuristics should underperform full-information models, with the loss in performance compensated by reduced computational cost. This prediction is challenged by observations of less-is-more effects, wherein heuristics sometimes outperform full-information models, such as linear regression, in real-world prediction tasks. These findings have been used to argue that ignoring information can actually improve performance, even in the absence of processing limitations. … Gigerenzer and Brighton (2009) conclude, “A less-is-more effect … means that minds would not gain anything from relying on complex strategies, even if direct costs and opportunity costs were zero”.

Less-is-more arguments also arise in other domains of cognitive science, such as in claims that learning is more successful when processing capacity is (at least initially) restricted.

The current explanation for less-is-more effects in the heuristics literature is based on the bias-variance dilemma. … From a statistical perspective, every model, including heuristics, has an inductive bias, which makes it best-suited to certain learning problems. A model’s bias and the training data are responsible for what the model learns. In addition to differing in bias, models can also differ in how sensitive they are to sampling variability in the training data, which is reflected in the variance of the model’s parameters after training (i.e., across different training samples).

A core tool in machine learning and psychology for evaluating the performance of learning models, cross-validation, assesses how well a model can apply what it has learned from past experiences (i.e., the training data) to novel test cases. From a psychological standpoint, a model’s cross-validation performance can be understood as its ability to generalize from past experience to guide future behavior. How well a model classifies test cases in cross-validation is jointly determined by its bias and variance. Higher flexibility can in fact hurt performance because it makes the model more sensitive to the idiosyncrasies of the training sample. This phenomenon, commonly referred to as overfitting, is characterized by high performance on experienced cases from the training sample but poor performance on novel test items. …

Bias and variance tend to trade off with one another such that models with low bias suffer from high variance and vice versa. With small training samples, more flexible (i.e., less biased) models will overfit and can be bested by simpler (i.e., more biased) models such as heuristics. As the size of the training sample increases, variance becomes less influential and the advantage shifts to the complex models.

So what is an alternative explanation to the performance of heuristics?

The Bayesian framework offers a different perspective on the bias-variance dilemma. Provided a Bayesian model is correctly specified, it always integrates new data optimally, striking the perfect balance between prior and data. Thus using more information can only improve performance. From the Bayesian standpoint, a less-is-more effect can arise only if a model uses the data incorrectly, for example by weighting it too heavily relative to prior knowledge (e.g., with ordinary linear regression, where there effectively is no prior). In that case, the data might indeed increase estimation variance to the point that ignoring some of the information could improve performance. However, that can never be the best solution. One can always obtain superior predictive performance by using all of the information but tempering it with the appropriate prior.

Heuristics may work well in practice because they correspond to infinitely strong priors that make them oblivious to aspects of the training data, but they will usually be outperformed by a prior of finite strength that leaves room for learning from experience. That is, the strong form of less-is-more, that one can do better with heuristics by throwing out information rather than using it, is false. The optimal solution always uses all relevant information, but it combines that information with the appropriate prior. In contrast, no amount of data can overcome the heuristics’ inductive biases.

So why have heuristics proven to be so useful? According this Bayesian argument, it is not because of a “computational advantage of simplicity per se, but rather to the fact that simpler models can approximate strong priors that are well-suited to the true structure of the environment.”

An interesting question from this work is whether our minds use heuristics as a good approximation of complex models, or whether heuristics are good approximations of more complex processes that the mind uses. The authors write:

Although the current contribution is formal in nature, it nevertheless has implications for psychology. In the psychological literature, heuristics have been repeatedly pitted against full-information algorithms that differentially weight the available information or are sensitive to covariation among cues. The current work indicates that the best-performing model will usually lie between the extremes of ordinary linear regression and fast-and-frugal heuristics, i.e., at a prior of intermediate strength. Between these extremes lie a host of models with different sensitivity to cue-outcome correlations in the environment.

One question for future research is whether heuristics give an accurate characterization of psychological processing, or whether actual psychological processing is more akin to these more complex intermediate models. On the one hand, it could be that implementing the intermediate models is computationally intractable, and thus the brain uses heuristics because they efficiently approximate these more optimal models. This case would coincide with the view from the heuristics-and-biases tradition of heuristics as a tradeoff of accuracy for efficiency. On the other hand, it could be that the brain has tractable means for implementing the intermediate models (i.e., for using all available information but down-weighting it appropriately). This case would be congruent with the view from ecological rationality where the brain’s inferential mechanisms are adapted to the statistical structure of the environment. However, this possibility suggests a reinterpretation of the empirical evidence used to support heuristics: heuristics might fit behavioral data well only because they closely mimic a more sophisticated strategy used by the mind.

There have been various recent approaches looking at the compatibility between psychologically plausible processes and probabilistic models of cognition. These investigations are interlinked with our own, and while most of that work has focused on finding algorithms that approximate Bayesian models, we have taken the opposite approach. This contribution reiterates the importance of applying fundamental machine learning concepts to psychological findings. In doing so, we provide a formal understanding of why heuristics can outperform full-information models by placing all models in a common probabilistic inference framework, where heuristics correspond to extreme priors that will usually be outperformed by intermediate models that use all available information.

The (open access) paper contains a lot more detail – and the maths – and I recommend reading it.

My latest in Behavioral Scientist: Simple heuristics that make algorithms smart

My latest contribution at Behavioral Scientist is up. Here’s an excerpt:

Modern discussions of whether humans will be replaced by algorithms typically frame the problem as a choice between humans on one hand or complex statistical and machine learning models on the other. For problems such as image recognition, this is probably the right frame. Yet much of the past success of algorithms relative to human judgment points us to a third option: the mechanical application of simple models and heuristics.

Simple models appear more powerful when removed from the minds of the human and implemented in a consistent way. The chain of evidence that simple heuristics are powerful tools, that humans use these heuristics, and that these heuristics can make us smart does not bring us to a point where these humans are outperforming simple heuristics or models consistently applied by an algorithm.

Humans are inextricably entwined in developing these algorithms, and in many cases provide the expert knowledge of what cues should be used. But when it comes to execution, taking the outputs of the model gives us a better outcome.

You can read the full article here.

A problem in the world or a problem in the model

In reviewing Michael Lewis’s The Undoing Project, John Kay writes:

Since Paul Samuelson’s Foundations of Economic Analysis, published in 1947, mainstream economics has focused on an axiomatic approach to rational behaviour. The overriding requirement is for consistency of choice: if A is chosen when B is available, B will never be selected when A is available. If choices are consistent in this sense, their outcomes can be described as the result of optimisation in the light of a well-defined preference ordering.

In an impressive feat of marketing, economists appropriated the term “rationality” to describe conformity with these axioms. Such consistency is not, however, the everyday meaning of rationality; it is not rational, though it is consistent, to maintain the belief that there are fairies at the bottom of the garden in spite of all evidence to the contrary. …

… In the 1970s, however, Kahneman and Tversky began research that documented extensive inconsistency with those rational choice axioms.

What they did, as is common practice in experimental psychology, was to set puzzles to small groups of students. The students often came up with what the economics of rational choice would describe as the “wrong” answer. These failures of the predictions of the theory clearly demand an explanation. But Lewis—like many others who have written about behavioural economics—does not progress far beyond compiling a list of these so-called “irrationalities.”

This taxonomic approach fails to address crucial issues. Is rational choice theory intended to be positive—a description of how people do in fact behave—or normative—a recommendation as to how they should behave? Since few people would wish to be labelled irrational, the appropriation of the term “rationality” conflates these perspectives from the outset. Do the observations of allegedly persistent irrationality represent a wide-ranging attack on the quality of human decision-making—or a critique of the economist’s concept of rationality? The normal assumption of economists is the former; the failure of observation to correspond with theory identifies a problem in the world, not a problem in the model. Kahneman and Tversky broadly subscribe to that position; their claim is that people—persistently—make stupid mistakes.

I have seen many presentations with an opening line of “economists assume we are rational”, quickly followed by conclusions about poor human decision-making, the two being conflated. More often than not, it’s better to ignore economics as a starting point and to simply examine the evidence for poor decision making. That evidence is, of course, much richer – and debatable – than a simple refutation of the basic economics axioms.

One of those debates concerns the Linda problem. Kay continues:

Take, for example, the famous “Linda Problem.” As Kahneman frames it: “Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which of the following is more likely? ‘Linda is a bank teller,’ ‘Linda is a bank teller and is active in the feminist movement.’”

The common answer is that the second alternative—that Linda is more likely to be a feminist bank teller than a bank teller—is plainly wrong, because the rules of probability state that a compound probability of two events cannot exceed the probability of either single event. But to the horror of Kahneman and his colleagues, many people continue to assert that the second description is the more likely even after their “error” is pointed out.

But it does not require knowledge of the philosopher Paul Grice’s maxims of conversation—although perhaps it helps—to understand what is going on here. The meaning of discourse depends not just on the words and phrases used, but on their context. The description that begins with Linda’s biography and ends with “Linda is a bank teller” is not, without more information, a satisfactory account. Faced with such a narrative in real life, one would seek further explanation to resolve the apparent incongruity and, absent of such explanation, be reluctant to believe, far less act on, the information presented.

Kahneman and Tversky recognised that we prefer to tell stories than to think in terms of probability. But this should not be assumed to represent a cognitive failure. Storytelling is how we make sense of a complex world of which we often know, and understand, little.

So we should be wary in our interpretation of the findings of behavioural economists. The environment in which these experiments are conducted is highly artificial. A well-defined problem with an identifiable “right” answer is framed in a manner specifically designed to elucidate the “irrationality” of behaviour that the experimenter triumphantly identifies. This is a very different exercise from one which demonstrates that people make persistently bad decisions in real-world situations, where the issues are typically imperfectly defined and where it is often not clear even after the event what the best course of action would have been.

Kay also touches on the more general criticisms:

Lewis’s uncritical adulation of Kahneman and Tversky gives no credit to either of the main strands of criticism of their work. Many mainstream economists would acknowledge that people do sometimes behave irrationally, but contend that even if such irrationalities are common in the basements of psychology labs, they are sufficiently unimportant in practice to matter for the purposes of economic analysis. At worst, a few tweaks to the standard theory can restore its validity.

From another perspective, it may be argued that persistent irrationalities are perhaps not irrational at all. We cope with an uncertain world, not by attempting to describe it with models whose parameters and relevance we do not know, but by employing practical rules and procedures which seem to work well enough most of the time. The most effective writer in this camp has been the German evolutionary psychologist Gerd Gigerenzer, and the title of one of his books, Simple Heuristics That Make Us Smart, conveys the flavour of his argument. The discovery that these practical rules fail in some stylised experiments tells us little, if anything, about the overall utility of Gigerenzer’s “fast and frugal” rules of behaviour.

Perhaps it is significant that I have heard some mainstream economists dismiss the work of Kahneman in terms not very different from those in which Kahneman reportedly dismisses the work of Gigerenzer. An economic mainstream has come into being in which rational choice modelling has become an ideology rather than an empirical claim about the best ways of explaining the world, and those who dissent are considered not just wrong, but ignorant or malign. An outcome in which people shout at each other from inside their own self-referential communities is not conducive to constructive discourse.

The Rhetoric of Irrationality

From the opening of Lola Lopes’s 1991 article The Rhetoric of Irrationality (pdf) on the heuristics and biases literature:

Not long ago, Newsweek ran a feature article describing how researchers at a major midwestern business school are exploring the process of choice in hopes of helping business executives and business students improve their ‘often rudimentary decision-making skills’

[T]he researchers have, in the author’s words, ‘sadly’ concluded that ‘most people’ are ‘woefully muddled information processors who stumble along ill-chosen shortcuts to reach bad conclusions’. Poor ‘saps’ and ‘suckers’ that we are, a list of our typical decision flaws would be so lengthy as to ‘demoralize’ Solomon.

This is a powerful message, sweeping in its generality and heavy in its social and political implications. It is also a strange message, for it concerns something that we might suppose could not be meaningfully studied in the laboratory, that being the fundamental adequacy or inadequacy of people’s capacity to choose and plan wisely in everyday life. Nonetheless, the message did originate in the laboratory, in studies that have no greater claim to relevance than hundreds of others that are published yearly in scholarly journals. My goal of this article is to trace how this message of irrationality has been selected out of the literature and how it has been changed and amplified in passing through the logical and expository layers that exist between experimental conception and popularization.

Below are some of the more interesting passages. First:

Prior to 1970 or so, most researchers in judgment and decision-making believed that people are pretty good decision-makers. In fact, the most frequently cited summary paper of that era was titled ‘Man as an intuitive statistician’ (Peterson & Beach, 1967). Since then, however, opinion has taken a decided turn for the worse, though the decline was not in any sense demanded by experimental results. Subjects did not suddenly become any less adept at experimental tasks nor did experimentalists begin to grade their performance against a tougher standard. Instead, researchers began selectively to emphasize some results at the expense of others.

The Science article [Kahneman and Tversky’s 1974 article (pdf)] is the primary conduit through which the laboratory results made their way our of psychology and into other branches of the social sciences. … About 20 percent of the citations were in sources outside psychology. Of these, all used the citation to support the unqualified claim that people are poor decision-makers.

Acceptance of this sort is not the norm for psychological research. Scholars from other fields in the social sciences such as sociology, political science, law, economics, business and anthropology look with suspicion on the tightly controlled experimental tasks that psychologists study in the laboratories, particularly when the studies are carried out using student volunteers. In the case of the biases and heuristics literature, however, the issue of generalizability is seldom raised and it is rarely so much as mentioned that the cited conclusions are based on laboratory research. Human incompetence is presented as a fact, like gravity.

If you think of it, this is a great trick, for the studies in question have managed to shed their experimental details without sacrificing scientific authority. Somehow the message of irrationality has been sprung free of its factual supports, allowing it to be seen entire, unobstructed by the hopeful assumptions and tedious methodologies that brace up all laboratory research.

One interesting thread concerns the purpose of the experiments and the contrasting conclusions drawn from them. For this discussion, Lopes looks at six of the experiments in four of Kahneman and Tversky papers published between 1971 and 1973, plus a summary article in Science from 1974. One example involved this question:

Consider the letter R. Is R more likely to appear in the first position of a word or the third position of a word?

This problem involves the availability heuristic, the tendency to estimate the probability of an event by the ease with which instances of the event can be remembered or constructed in the imagination. Under the availability hypothesis, people will see how many words they can generate with R in the first or third position. It is easier to think of words with R in the first position than the third, leading them to conclude – in error – that R is more common in the first.

Lopes writes:

[T]he question is posed so that there are only two possible results. One of these will occur if the subject reasons in accord with probability theory, and the other, if the subject reasons heuristically. …

By this logic, the implications of Figure 1 [a summary of the results] are clear: subjects reason heuristically and not according to probability theory. That is the result, signed, sealed and delivered, courtesy of strong inference. But the main contribution of the research is not this result since few would have supposed that naive people know much about combinations or variances of binomial proportions or how often R appears in the third position of words. Instead, the research commands attention and respect because the various problems function as thought experiments, strengthening our grasp of the task domain by revealing critical psychological variables that do not show up in the normative analysis. …

There is, however, another way to construe this set of studies and that is by considering the predictions of the two processing modes at a higher level of abstraction. If we think about performance in terms of correctness, we see that in every case the probability mode predicts correct answers and the heuristic mode predicts errors. … [T]he sheer weight of all the wrong answers tend to deform the basic conclusion, bending it away from an evaluatively neutral description of the process and toward something more like ‘people use heuristics to judge probabilities and they are wrong’, or even ‘people make mistakes when they judge probabilities because they use heuristics’.

Happily, conclusions like these do not hold up. This is because the tuning that is necessary for constructing problems that allow strong inference on processing questions is systematically misleading when it comes to asking evaluative questions. For example, consider the letter R problem. Why was R chosen for study and not, say, B? … Of the 20 possible consonants, 12 are more common in the first position and 8 are more common in the third position. All of the consonants that Kahneman and Tversky studied were taken from the third-position group even though there are more consonants in the first-position group.

The selection of consonants was not malicious. Their use is dictated by the strong inference logic since only they yield unambiguous answers to the processing question. In other words, when a subject says that R occurs more frequently in the first position, we know that he or she must be basing the judgment on availability, since the actual frequency information would lead to the opposite conclusion. Had we used B, instead, and had the subject also judged it to occur more often in the first position, we would not be able to tell whether the judgment reflect availability or factual knowledge since B is, in fact, more likely to occur in the first position.

We see, then, that the experimental logic constrains the interpretation of the data. We can conclude that people use heuristics instead of probability theory but we cannot conclude that their judgments are generally poor. All the same, it is the latter, unwarranted conclusion that is most often conveyed by this literature, particularly in settings outside psychology.

Lopes then turns her attention onto Kahneman and Tversky’s famous Science article.

In the original experimental reports, there is plenty of language to suggest that human judgments are often wrong, but the exposition focuses mostly on the delineation of process. In the Science article, however, Tversky and Kahneman (1974) shift their attention from heuristic processing to biased processing. In the introduction they tell us: ‘This article shows that people rely on a limited number of heuristic principles which reduce the complex tasks of assessing probabilities and predicting values to simpler judgmental operations’ (p. 1124). By the time we get to the discussion, however, the emphasis has changed. Now they say: ‘This article has been concerned with cognitive biases that stem from the reliance on judgmental heuristics’ (p. 1130).

Examination of the body of the paper shows that the retrospective account is the correct one: the paper is more concerned with biases than with heuristics even though the experiments bear more on heuristics than on biases.

There is plenty more of interest in Lopes’s article. I recommend reading the full article (pdf).

Genoeconomics and designer babies: The rise of the polygenic score

When genome-wide association studies (GWAS) were first used to study complex polygenic traits, the results were underwhelming. Few genes with any predictive power were found, and those that were typically explained only a fraction of the genetic effects that twin studies suggested were there.

This led to divergent responses, ranging from continued resistance to the idea that genes affect anything, to a quiet confidence that once sample sizes became large enough those genetic effects would be found.

Increasingly large samples are now showing that the quiet confidence was justified, with a steady flow of papers emerging finding material genetic effects on traits including educational attainment, intelligence and height.

One source of this work are “genoeconomists”. From Jacob Ward in the New York Times:

Once a G.W.A.S. shows genetic effects across a group, a “polygenic score” can be assigned to individuals, summarizing the genetic patterns that correlate to outcomes found in the group. Although no one genetic marker might predict anything, this combined score based on the entire genome can be a predictor of all sorts of things. And here’s why it’s so useful: People outside that sample can then have their DNA screened, and are assigned their own polygenic score, and the predictions tend to carry over. This, Benjamin realized, was the sort of statistical tool an economist could use.

As an economist, however, Benjamin wasn’t interested in medical outcomes. He wanted to see if our genes predict social outcomes.

In 2011, with a grant from the National Science Foundation, Benjamin launched the Social Science Genetic Association Consortium, an unprecedented effort to gather unconnected genetic databases into one enormous sample that could be studied by researchers from outside the world of genetic science. In July 2018, Benjamin and four senior co-authors, drawing on that database, published a landmark study in Nature Genetics. More than 80 authors from more than 50 institutions, including the private company 23andMe, gathered and studied the DNA of over 1.1 million people. It was the largest genetics study ever published, and the subject was not height or heart disease, but how far we go in school.

The researchers assigned each participant a polygenic score based on how broad genetic variations correlated with what’s called “educational attainment.” (They chose it because intake forms in medical offices tend to ask patients what education they’ve completed.) The predictive power of the polygenic score was very small — it predicts more accurately than the parents’ income level, but not as accurately as the parents’ own level of educational attainment — and it’s useless for making individual predictions.

One of the most interesting possibilities for using polygenic scores is to use them to control for heterogeneity in research subjects. Ward writes:

Several researchers involved in the project mentioned to me the possibility of using polygenic scores to sharpen the results of studies like the ongoing Perry Preschool Project, which, starting in the early 1960s, began tracking 123 preschool students and suggested that early education plays a large role in determining a child’s success in school and life. Benjamin and other co-authors say that perhaps sampling the DNA of the Perry Preschool participants could improve the accuracy of the findings, by controlling for those in the group that were genetically predisposed to go further in school.

In a world with easy access to genetic samples, it could become common to include genetic controls in analysis of interesting societal outcomes, in the same way we now control for parental traits.

A couple of times in the article, Ward notes that “scores aren’t individually predictive”. He writes that “The predictive power of the polygenic score was very small — it predicts more accurately than the parents’ income level, but not as accurately as the parents’ own level of educational attainment — and it’s useless for making individual predictions.”

I’m not sure what Ward’s definition of “predictive” is for an individual, but take this example from the article:

The authors calculated, for instance, that those in the top fifth of polygenic scores had a 57 percent chance of earning a four-year degree, while those in the bottom fifth had a 12 percent chance. And with that degree of correlation, the authors wrote, polygenic scores can improve the accuracy of other studies of education.

That looks like predictive power to me. Take an individual from the sample or an equivalent population, look at their polygenic score, and then assign a probability of whether they will obtain a four-year degree.

I recommend reading the whole article.

A related story getting ample press is that Genomic Prediction has started to offer intelligence screening for embryos. Polygenic scores have been used with success in livestock breeding for a while now, which is often a better place to look for evidence of the future possibilities than listening to those afraid of the human implications of genetic research. From The Guardian:

The company says it is only offering such testing to spot embryos with an IQ low enough to be classed as a disability, and won’t conduct analyses for high IQ. But the technology the company is using will permit that in principle, and co-founder Stephen Hsu, who has long advocated for the prediction of traits from genes, is quoted as saying: “If we don’t do it, some other company will.”

The development must be set, too, against what is already possible and permitted in IVF embryo screening. The procedure called pre-implantation genetic diagnosis (PGD) involves extracting cells from embryos at a very early stage and “reading” their genomes before choosing which to implant. It has been enabled by rapid advances in genome-sequencing technology, making the process fast and relatively cheap. In the UK, PGD is strictly regulated by the Human Fertilisation and Embryology Authority (HFEA), which permits its use to identify embryos with several hundred rare genetic diseases of which the parents are known to be carriers. PGD for other purposes is illegal.

In the US it’s a very different picture. Restrictive laws about what can be done in embryo and stem-cell research using federal funding sit alongside a largely unregulated, laissez-faire private sector, including IVF clinics. PGD to select an embryo’s sex for “family balancing” is permitted, for example. There is nothing in US law to prevent PGD for selecting embryos with “high IQ”.

Ball also expresses a scepticism about the value of the polygenic scores:

These relationships are, however, statistical. If you have a polygenic score that places you in the top 10% of academic achievers, that doesn’t mean you will ace your exams without effort. Even setting aside the substantial proportion of intelligence (typically around 50%) that seems to be due to the environment and not inherited, there are wide variations for a given polygenic score, one reason being that there’s plenty of unpredictability in brain wiring during growth and development.

So the service offered by Genomic Prediction, while it might help to spot extreme low-IQ outliers, is of very limited value for predicting which of several “normal” embryos will be smartest. Imagine, though, the misplaced burden of expectation on a child “selected” to be bright who doesn’t live up to it. If embryo selection for high IQ goes ahead, this will happen.

Despite Ball’s scepticism about comparing “normal” embryos, I expect it won’t be long before Genomic Prediction or a counterpart is doing just that.

Steve Hsu, co-founder of Genomic Prediction, comments on the press here (and provides some links to other articles). He closes by saying:

“Expert” opinion seems to have evolved as follows:

1. Of course babies can’t be “designed” because genes don’t really affect anything — we’re all products of our environment!

2. Gulp, even if genes do affect things it’s much too complicated to ever figure out!

3. Anyone who wants to use this technology (hmm… it works) needs to tread carefully, and to seriously consider the ethical issues.

Only point 3 is actually correct, although there are still plenty of people who believe 1 and 2 :-(