Uncategorized

Last’s What to Expect When No One’s Expecting: America’s Coming Demographic Disaster

lastI’ve recently read a couple of books on demographic trends, and there don’t seem to be a lot of silver linings in current fertility patterns in the developed world. The demographic boat takes a long time to turn around, so many short-term outcomes are already baked in.

Despite the less than uplifting subject, Jonathan Last’s What to Expect When No One’s Expecting: America’s Coming Demographic Disaster is entertaining – in some ways it is a data filled rant.

Last doesn’t see much upside to the low fertility in most of the developed world. Depopulation is generally associated with economic decline. He sees China’s One Child Policy – rather than saving them – as leading them down the path to demographic disaster. Poland needs a 300% increase in fertility just to hold population stable to 2100. The Russians are driving toward demographic suicide. In Germany they are converting prostitutes into elderly care nurses. Parts of Japan are now depopulated marginal land.

And Last sees little hope of a future increase (I have some views on that). He rightly lampoons the United Nations as having no idea. At the time of writing the book, the United Nations optimistically assumed all developed countries would have their fertility rate increase to the replacement level of 2.1 children per woman (although the United Nations has somewhat – but not completely – tempered this optimism via its latest methodology). There was no basis for this assumption, and the United Nations is effectively forecasting blind.

So why the decline? Last is careful to point out that the world is so complicated that it is not clear what happens if you try to change one factor. But he points to several causes.

First, children used to be an insurance policy. If you wanted care in your old age, your children provided it. With government now doing the caring, having children is consumption. Last points to one estimate that social security and medicare in the United States suppresses the fertility rate by 0.5 children per woman (following the citation trail, here’s one source for that claim).

Then there is the pill, which Last classifies as a major backfire for Margaret Sanger. She willed it into existence to stop the middle classes shouldering the burden of the poor, but the middle class have used it more.

Next is government policy. As one example, Last goes on a rant about child car seat requirements (which I feel acutely). It is impossible to fit more than 2 car seats in a car, meaning that transporting a family of five requires an upgrade. This is one of many subtle but real barriers to large family size.

Finally (at least of those factors I’ll mention), there is the cost of children today. Last considers that poorer families are poorer because they chose to have more children, or as Last puts it, “Children have gone from being a marker of economic success to a barrier to economic success.” Talk about maladaptation. (In the preface to the version I read, Last asked why feminists were expending so much effort demanding right to be child free and not railing against the free market for failing women who want children.)

The fertility decline isn’t just a case of people wanting fewer children, as – on average – people fall short of their ideal number of kids. In the UK, the ideal is 2.5, expected is 2.3, actual 1.9. If people could just realise their target number of children, fertility would be higher.

But this average hides some skew – less educated people end up with more than is ideal, educated people end up with way less. By helping the more educated reach their ideal, the dividend could be large.

So what should government do? Last dedicates a good part of the book to the massive catalogue of failures of government policy to boost birth rates. The Soviet Union’s motherhood medals and lump sum payments didn’t stop the decline. Japan’s monthly per child subsidies, daycare centres and paternal leave (plus another half dozen pro-natalist policies Last lists) had little effect. Singapore initially encouraged the decline, but when they changed their minds and started offering tax breaks and other perks for larger families, fertility kept on declining.

This suggests that you cannot bribe people into having babies. As Last points out, having kids is no fun and people aren’t stupid.

Then there is the impossibility of using migration to fill the gap. To keep the United States support ratio (retirees per worker) where it currently is (assuming you wanted to do this), the US would need to add 45 million immigrants between 2025 and 2035. The US would need 10.8 million a year until 2050 to get the ratio somewhere near what it was in 1960. Immigration is not as good for demographic profile as baby making and comes with other problems. Plus the sources of immigrants are going through own transition, so at some point that supply of young immigrants will dry up.

So, if government can’t make people have children they don’t want and can’t simply ship them in, Last asks if they could help people get the children they do want. As children go on to be taxpayers, Last argues government could cut social security taxes for those with more children and make people without children pay for what they’re not supporting. (Although you’d want to make sure there was no net burden of those children across their lives, as they’ll be old people one day too. There are limits to how far you could take that Ponzi scheme.)

Last also suggests eliminating the need for college, one of the major expenses of children. Allowing IQ testing for jobs would be one small step toward this.

Put together, I’m not optimistic much can be done, but Last is right in that there should be some exploration of removing unnecessary barriers (let’s start with those car seat rules).

I’ll close this post where Last closes the book. In a world where the goal is taken to be pleasure, children will never be attractive. So how much of the fertility decline is because modernity has turned us into unserious people?

Baumeister and Tierney’s Willpower: Rediscovering the Greatest Human Strength

baumeisterAfter the recent hullabaloo about whether ego depletion was a real phenomenon, I decided to finally read Roy Baumeister and John Tierney’s Willpower cover to cover (I’ve only flicked through it before).

My hope was that I’d find some interesting additions to my understanding of the debate, but the book tended into the pop science/self-help genre and there was rarely enough depth to add anything to the current debates (see Scott Alexander on that point). That said, it was an easy read and pointed me to a few studies that seem worth checking out.

One area that I have been interested in is the failure of the mathematics around glucose consumption to add up. Baumeister’s argument is that glucose is the scarce resource in the ego depletion equation. Exercising self control depletes our glucose, making us more likely to succumb to later temptations. Replenishing glucose restores our ego.

As plenty of people have pointed out – Robert Kurzban is the critic I am most familiar with – the maths on glucose simply does not add up. The brain does not burn more calories when making a quick decision. Even if it did (say, doubling while making a decision), the short time in which the decision is made means the additional energy expenditure would be miniscule.

Baumeister and Tierney indirectly dealt with the criticism, writing:

Despite all these findings, the growing community of brain researchers still had some reservations about the glucose connection. Some skeptics pointed out that the brain’s overall use of energy remains about the same regardless of what one is doing, which doesn’t square easily with the notion of depleted energy. Among the skeptics was Todd Heatherton….

Heatherton decided on an ambitious test of the theory. He and his colleagues recruited dieters and measured their reactions to pictures of food. Then ego depletion was induced by asking everyone to refrain from laughing while watching a comedy video. After that, the researchers again tested how their brains reacted to pictures of food (as compared with nonfood pictures). Earlier work by Heatherton and Kate Demos had shown that these pictures produce various reactions in key brain sites, such as the nucleus accumbens and a corresponding decrease in the amygdala. The crucial change in the experiment involved a manipulation of glucose. Some people drank lemonade sweetened with sugar, which sent glucose flooding through the bloodstream and presumably into the brain.

Dramatically, Heatherton announced his results during his speech accepting leadership of the Society for Personality and Social Psychology … Heatherton reported that the glucose reversed the brain changes wrought by depletion, a finding he said, that thoroughly surprised him. … Heatherton’s results did much more than provide additional confirmation that glucose is a vital part of willpower. They helped resolve the puzzle over how glucose could work without global changes in the brain’s total energy use. Apparently ego depletion shifts activity from one part of the brain to another. Your brain does not stop working when glucose is low. It stops doing some things and starts doing others.

In an hour of searching, I couldn’t find a publication arising from this particular study – happy for any pointers. (Interestingly, Demos is author of a paper on a failed replication of an ego depletion experiment.) I’m guessing that the initial findings didn’t hold up.

Given the challenges to ego depletion theory, it seems Baumeister is considering tweaking the theory (I found an ungated copy here). If you want a more recent, although not necessarily balanced view on where the theory is at, skip Willpower and start there.

For another perspective on Willpower, see also Steven Pinker’s review.

The Behavioural Economics Guide 2016 (with an intro by Gerd Gigerenzer)

The Behavioural Economics Guide 2016 is out (including a couple of references to yours truly), with the introduction by Gerd Gigerenzer. It’s nice to see some of the debate in the area making an appearance.

Here are a few snippets from Gigerenzer’s piece. First, on heuristics:

To rethink behavioral economics, we need to bury the negative rhetoric about heuristics and the false assumption that complexity is always better. The point I want to make here is not that heuristics are always better than complex methods. Instead, I encourage researchers to help work out the exact conditions under which a heuristic is likely to perform better or worse than some fine-tuned optimization method. First, we need to identify and study in detail the repertoire of heuristics that individuals and institutions rely on, which can be thought of as a box of cognitive tools. This program is called the analysis of the adaptive toolbox and is descriptive in its nature. Second, we need to analyze the environment or conditions under which a given heuristic (or complex model) is likely to succeed and fail. This second program, known as the study of the ecological rationality of heuristics (or complex models), is prescriptive in nature. For instance, relying on one good reason, as the hiatus rule does [If a customer has not made a purchase for nine months or longer, classify him/her as inactive, otherwise as active], is likely to be ecologically rational if the other reasons have comparatively small weights, if the sample size is small, and if customer behavior is unstable.

And the “bias bias”:

The bias bias is the tendency to diagnose biases in others without seriously examining whether a problem actually exists. In decision research, a bias is defined as a systematic deviation from (what is believed to be) rational choice, which typically means that people are expected to add and weigh all information before making a decision. In the absence of an empirical analysis, the managers who rely on the hiatus heuristic would be diagnosed as having committed a number of biases: they pay no attention to customers’ other attributes, let alone to the weight of these attributes and their dependency. Their stubborn refusal to perform extensive calculations might be labeled the “hiatus fallacy” – and provide entry number 176 in the list on Wikipedia. Yet many, including experts, don’t add and weigh most of the time, and their behavior is not inevitably irrational. As the bias–variance dilemma shows, ignoring some information can help to reduce error from variance – the error that arises from fine-tuned estimates that produce mostly noise. Thus, a certain amount of bias can assist in making better decisions.

The bias bias blinds us to the benefits of simplicity and also prevents us from carefully analyzing what the rational behavior in a given situation actually is. I, along with others, have shown that more than a few of the items in the Wikipedia list have been deemed reasoning errors on the basis of a narrow idea of rationality and that they can instead be easily justified as intelligent actions (Gigerenzer et al., 2012).

————–

A recent Spectator article on an interview with Richard Thaler – a contributor the 2016 Guide – opened with the following:

‘For ten years or so, my name was “that jerk”,’ says Professor Richard Thaler, president of the American Economics Association and principal architect of the behavioural economics movement. ‘But that was a promotion. Before, I was “Who’s he?”’

On hearing that Gigerenzer had written the introduction to the Guide, Thaler tweeted:

I suppose Thaler is now the establishment and Gigerenzer is “that jerk”.

Re-reading Kahneman’s Thinking, Fast and Slow

Thinking, Fast and SlowA bit over four years ago I wrote a glowing review of Daniel Kahneman’s Thinking, Fast and Slow. I described it as a “magnificent book” and “one of the best books I have read”. I praised the way Kahneman threaded his story around the System 1 / System 2 dichotomy, and the coherence provided  by prospect theory.

What a difference four years makes. I will still describe Thinking, Fast and Slow as an excellent book – possibly the best behavioural science book available. But during that time a combination of my learning path and additional research in the behavioural sciences has led me to see Thinking, Fast and Slow as a book with many flaws.

First, there is the list of studies that simply haven’t held up through the “replication crisis” of the last few years. The first substantive chapter of Thinking, Fast and Slow is on priming, so many of these studies are right up the front. These include the Florida effect, money priming, the idea that making a test harder to read can increase test results, and ego depletion (I touch on each of these in my recent talk at the Sydney Behavioural Economics and Behavioural Science Meetup).

It’s understandable that Kahneman was somewhat caught out by the replication crisis that has enveloped this literature. But what does not sit so well was the confidence with which Kahneman made his claims. For example, he wrote:

When I describe priming studies to audiences, the reaction is often disbelief . . . The idea you should focus on, however, is that disbelief is not an option. The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these studies are true.

I am surprised at the blind spot I had when first reading it – Kahneman’s overconfidence didn’t register with me.

As I was also, Kahneman is a fan of the hot hand studies. Someone who believes in the hot hand believes that a sportsperson such as a basketball player is more likely to make a shot if they made their previous one. Kahneman wrote:

The hot hand is entirely in the eye of the beholders, who are consistently too quick to perceive order and causality in randomness. The hot hand is a massive and widespread cognitive illusion. [Could the same be said about much of the priming literature?]

The public reaction to this research is part of the story. The finding was picked up by the press because of its surprising conclusion, and the general response was disbelief. When the celebrated coach of the Boston Celtics, Red Auerbach, heard of Gilovich and his study, he responded, “Who is this guy? So he makes a study. I couldn’t care less.” The tendency to see patterns in randomness is overwhelming – certainly more impressive than a guy making a study.

And now it seems there is a hot hand. The finding that there was no hot hand the consequence of a statistical error (also covered in my recent talk). The disbelief was appropriate, and Auerbach did himself a favour by ignoring the study.

As I’ve picked on Dan Ariely for the way he talks about organ donation rates, here’s Kahneman on that same point:

A directive about organ donation in case of accidental death is noted on an individual’s driver licence in many countries. The formulation of that directive is another case in which one frame is clearly superior to the other. Few people would argue that the decision of whether or not to donate one’s organs is unimportant, but there is strong evidence that most people make their choice thoughtlessly. The evidence comes from a comparison of organ donation rates in European countries, which reveals startling differences between neighbouring and culturally similar countries. An article published in 2003 noted that the organ donation rate was closer to 100% in Austria but only 12% in Germany, 86% in Sweden but only 4% in Denmark.

These enormous differences are a framing effect, which is caused by the format of the critical question. The high-donation countries have an opt-out form, where individuals who wish not to donate must check an appropriate box. Unless they take this simple action, they are considered willing donors. The low-contribution countries have an opt-in form: you must check a box to become a donor. That is all. The best single predictor of whether or not people will donate their organs is the designation of the default option that will be adopted without having to check a box. …

When the role of formulation is acknowledged, a policy question arises: Which formulation should be adopted. In this case, the answer is straightforward. If you believe that a large supply of donated organs is good for society, you will not be neutral between a formulation that yields almost 100% donations and another formulation that elicits donations from 4% of drivers.

As Ariely does, Kahneman describes the difference between European countries as being due to differences in form design, when in fact those European countries with high “donor rates” never ask their citizens whether they wish to be donors. The form described does not exist in the high-donation countries. They are simply presumed to consent to donation. (The paper that these numbers come from, Do Defaults Save Lives?, might have been better titled “Does not asking if you can take people’s organs save lives?”. That could have saved some confusion.)

Further, Kahneman talks about the gap between 100% and 4% as donation rates, when these numbers refer to those who are presumed to consent in the high-donation countries. Actual donation rates and the gap between the different types of countries are much lower.

All the above points are minor in themselves. But together the shaky science, overconfidence and lazy storytelling add up to something substantial.

What I also find less satisfying now is the attempt to construct a framework around the disparate findings in behavioural science. I once saw prospect theory as a great framework for thinking about many of the findings, but it is as unrealistic a decision making model as that for the perfectly rational man – the maths involved is even more complicated. It’s might be a useful descriptive or predictive model (if you could work out what the reference point actually is) but no one makes decisions in that way. (One day I will write a post on this.)

It will be interesting to see how Thinking, Fast and Slow stands up after another five years.

Levine’s Is Behavioural Economics Doomed?

levineDavid Levine’s Is Behavioural Economics Doomed? is a good but slightly frustrating read. I agree with Levine’s central argument that rationality is underweighted in many applications of behavioural economics, and he provides many good examples of the power of traditional economic thinking. For someone unfamiliar with game theory, this book is in some ways a good introduction (or more particularly, to the concept of Nash equilibrium). And for some of the points, Levine shows a richness in the literature that you don’t often hear about if you only consume pop behavioural economics books.

But the book is also littered with straw man arguments. Levine often gives views to behavioural economists which I am not sure they generally hold, and he often picks strange examples. And when it comes to explaining away behaviour that doesn’t fit so neatly with the rational actor model, Levine is not always convincing.

As an example, Levine provides an overview of the prisoner’s dilemma, a classic game demonstrating why two people might not cooperate, even though cooperation leads to a better outcome than both players defecting. Levine uses it to argue against those who suffer from the fallacy of composition – inferring something is true of the whole because it is true of the parts – and wonder why we can have war, crime and poverty if people are so rational. But who are these people that Levine is arguing against? I presume not the majority of the behavioural economics profession who are more than familiar with the prisoner’s dilemma game.

Levine’s introduction to the prisoner’s dilemma is good when he discusses what happens with different strategies or game designs. But when it comes to the players in experiments who don’t conform to the Nash equilibrium – such as those who don’t defect in every period if there is a defined end to the game – he hand waves away their play as “rational and altruistic” rather than seriously exploring whether they made systematic errors.

Similarly, when discussing the ultimatum game, Levine simply describes the failure to maximise income as “modest”. He does make the important point that it is rational for first movers to offer more than the minimum if there is a possibility of rejection (and since they don’t have opportunity to learn, they will get this wrong sometimes). But he seems less concerned about the behaviour of player 2 who rejects a material sum. Yes it might be a Nash equilibrium, but the behavioural view might shed some light on why we end up at that particular Nash equilibrium.

Levine is similarly dismissive of the situations where people make errors in markets. “Behavioural economics focuses on the irrationality of a few people or with people faced with extraordinary circumstances. Given time economists expect these same people will rationally adjust their behaviour to account for new understandings of reality and not simply repeat the same mistakes over and over again.” But given how many major decisions are one-shot decisions with major consequences (purchasing cars, retirement decisions etc.), surely they are worth exploring.

One of the more bizarre examples is where Levine addresses the question of why people vote despite having almost no chance of changing the outcome. Levine gives an example of a voting participation game conducted in the lab where he found that participants acted according to the predicted Nash equilibrium, reflecting their costs of voting, the benefits of winning and the probability of their vote swinging the result. But he doesn’t then grapple with the clear problem that this limited experiment doesn’t translate to the real world. Funnily enough, only pages later he cautions “[B]eware also of social scientist [sic] bearing only laboratory results.”

Levine also  brings out the now classic question of why couldn’t economics predict the economic crisis. He points out that crises must be inherently unpredictable as there is an inherent connection between the forecaster and the forecast. If a model that people believed predicted a collapse in the market of 20% next week, the crash would happen today (Let’s ignore for the moment that there seems to be an economist predicting a crash almost every day).

In defence of the economists, Levine pulls out a series of (well-cited) papers that he believes already explained the crisis, such as providing for the possibility of sharp crashes and the effect of fools in the market. Look, the shape of the curve by this random paper is the same! But was that actually what happened? Was that the dominant theory? Levine seems to believe mere existence of literature in which crises are present is an indication that the profession is fine, even if that wasn’t a dominant or widely believed model.

Having spend most this post complaining about Levine’s angle of attack, there are many good points. His discussion of learning theory is interesting – people don’t know all information before they undertake an action and learn along the way. Selfish rationality with imperfect learning does a pretty good job of explaining much behaviour. Some of this throwaway lines also make important points. For example, if a task is unpleasant, it can be rational to leave it to the last moment. Uncertainty can make the procrastination even more rational.

Some of Levine’s critiques of the experimental evidence are also interesting. One I was not aware of was whether the appearance of the endowment effect in some experiments was due to people misunderstanding the Becker-DeGreeot-Marschak elicitation procedure. (People state their willingness to pay or accept and a random draw of the price is made. If the price is lower than the willingness to pay, they pay it.) Levine points to experiments where, if people are trained to understand the procedure, the endowment effect disappears. As I mentioned in a previous post, Levine also points to some interesting literature on anchoring.

Levine closes with a quote from Loewenstein and Ubel that is worth repeating:

… [behavioral economics] has its limits. As policymakers use it to devise programs, it’s becoming clear that behavioral economics is being asked to solve problems it wasn’t meant to address. Indeed, it seems in some cases that behavioral economics is being used as a political expedient, allowing policymakers to avoid painful but more effective solutions rooted in traditional economics.

Behavioral economics should complement, not substitute for, more substantive economic interventions. If traditional economics suggests that we should have a larger price difference between sugar-free and sugared drinks, behavioral economics could suggest whether consumers would respond better to a subsidy on unsweetened drinks or a tax on sugary drinks.

But that’s the most it can do.

Underneath Levine’s critique you sense this is what is really bugging him. Despite the critiques, traditional economic approaches still have a lot of power. And for some people, that seems to have been forgotten along the way.

Replicating anchoring effects

The classic Ariely, Loewenstein, and Prelec experiment (ungated pdf) ran as follows. Students are asked to think of the last two digits of their social security number – essentially a random number – as a dollar price. They are then asked whether they would be willing to buy certain consumer goods for that price or not. Finally, they are asked what is the most they would be willing to pay for each of these goods.

The result was that those with a higher starting price – that is, a higher last two digits on their social security number – were willing to pay more for the consumer goods. That random number “anchored” how much they were willing to pay.

Reading David Levine’s Is Behavioural Economics Doomed? (review to come soon), Levine mentions the following attempted replication:

On the Robustness of Anchoring Effects in WTP and WTA Experiments (ungated pdf)

Drew Fudenberg, David K. Levine, and Zacharias Maniadis

We reexamine the effects of the anchoring manipulation of Ariely, Loewenstein, and Prelec (2003) on the evaluation of common market goods and find very weak anchoring effects. We perform the same manipulation on the evaluation of binary lotteries, and find no anchoring effects at all. This suggests limits on the robustness of anchoring effects.

And from the body of the article:

Our first finding is that we are unable to replicate the results of ALP [Ariely, Loewenstein, and Prelec]: we find very weak anchoring effects both with WTP [willingness to pay] and with WTA [willingness to accept]. The Pearson correlation coefficients between the anchor and stated valuation are generally much lower than in ALP, and the magnitudes of the anchoring effects (as measured by the ratio of top to bottom quintile) are smaller. Repeating the ALP procedure for lotteries we do not find any anchoring effects at all.

Unlike ALP, we carried out laboratory rather than classroom experiments. This necessitated some minor changes—discussed below—from ALP’s procedures. It is conceivable that these changes are responsible for the differences in our findings; if so the robustness of their results is limited.

Our results do not confirm the very strong anchoring effects found in ALP. They are more in agreement with the results of Simonson and Drolet (2004) and Alevy, Landry, and List (2011). Simonson and Drolet (2004) used the same SSN-based anchor as ALP, and found no anchoring effects on WTA, and moderate anchoring effects on WTP for four common consumer goods. Alevy, Landry, and List (2011) performed a field experiment, eliciting the WTP for peanuts and collectible sports cards, and they found no anchoring effects. Bergman et al. (2010) also used the design of ALP for six common goods, and found anchoring effects, but of smaller magnitude than in ALP.

Tufano (2010) and Maniadis, Tufano, and List (2011) also failed to confirm the robustness of the magnitude of the anchoring effects of ALP, using hedonic experiences, rather than common goods. Tufano (2010) used the anchoring manipulation to increase the variance in subjects’ WTA for a bad-tasting liquid, but the manipulation had no effect. Notice that this liquid offers a simple (negative) hedonic experience, like the “annoying sounds” used in Experiment 2 of ALP. Maniadis, Tufano, and List (2011) replicated Experiment 2 of ALP and found weaker (and nonsignificant) anchoring effects. Overall our results suggest that anchoring is real—it is hard to reconcile otherwise the fact that in the WTA treatment with goods the ratios between highest and lowest quintile is always bigger than one—but that quantitatively the effect is small. Additionally our data supports the idea that anchoring goes away when bidding on objects with greater familiarity, such as lotteries.

Saint-Paul’s The Tyranny of Utility: Behavioral Social Science and the Rise of Paternalism

Saint-PaulThe growth in behavioural science has given a new foundation for paternalistic government interventions. Governments now try to help “biased” humans make better decisions – from nudging them to pay their taxes on time, to constraining the size of the soda they can buy, to making them save for that retirement so far in the future.

There is no shortage of critics of these interventions. Are people actually biased? Do these interventions change behaviour or improve outcomes for the better? Is an also biased government the right agent to fix these problems? Ultimately, do the costs outweigh the benefits of government action?

In The Tyranny of Utility: Behavioral Social Science and the Rise of Paternalism, Gilles Saint-Paul points out the danger in this line of defence. By fighting the utilitarian battle based on costs and benefits, there will almost certainly be circumstances in which the scientific evidence on human behaviour and the effect of the interventions will point in the freedom-reducing direction. Arguing about whether a certain behaviour is rational at best leads to an empirical debate. Similarly, arguments about the irrationality of government can be countered by empirical debate on how particular government interventions change behaviour and outcomes.

As a result, Saint-Paul argues that:

[I]f we want to provide intellectual foundations for limited governments, we cannot do it merely on the basis of instrumental arguments. Instead, we need a system of values that delivers those limits and such a system cannot be utilitarian.

Saint-Paul argues that part of the problem is that the utilitarian approach is the backbone of neoclassical economics – once (and still in some respects) a major source of arguments in favour of freedom. Now that the assumptions about human behaviour underpinning many neoclassical models are seen to no longer hold, you are still left with utility maximisation as the policy objective. As Saint-Paul writes:

It should be emphasized that the drift toward paternalism is entirely consistent with the research program of traditional economics, which supposes that policies should be advocated on the basis of a consequentialist cost-benefit analysis, using some appropriate social welfare function. Paternalism then derives naturally from these premises, by simply adding empirical knowledge about how people actually behave …

When Saint-Paul describes the practical costs of this increased paternalism, his choice of examples often make it hard to share his anger. One of his prime cases of infringed liberty is a five-times public transport molester who is banned from using the train as a court determined he lacked the self-control to travelling on it. On gun control laws he suggests authoritarian governments could rise in the absence of an armed citizenry.

Still, some of the other stories (or even these more extreme examples) lead to an important point. Saint-Paul points out that many of these interventions extend beyond the initial cause of the problem and impose responsibility on people for the failings of others. For example, in many countries you need a pool fence even if don’t have kids. You effectively need to look after other people’s children. Similarly, liquor laws can extend to preventing sales to people who are drunk or likely to drive. Where does the chain of responsibility transfer stop?

One of the more interesting threads in the book concerns what the objective of policy is. Is it consumption? Or happiness? And based on this objective, how far does the utilitarian argument extend. If it is happiness, should we just load everyone up with Prozac? And then what of the flow on costs if everyone decides to check out and be happy?

What if a cardiologist decides that experts and studies are right, that it’s stupid after all to buy a glossy Lamborghini, and dumps a few of his patients in order to take more time off with his family? How is the well-being of the patients affected? What if that entrepreneur who works seventy hours a week to gain market shares calls it a day and closes his factory? In a market society the pursuit of status and material achievement is obtained through voluntary exchange, and must thus benefit somebody else. Owning a Lamborghini is futile, but curing a heart disease is not. The cardiologist may be selfish and alienated; he makes his neighbors feel bad; and he is tired of the Lamborghini. His foolishness, however, has improved the lives of many people, even by the standards of happiness researchers. Competition to achieve status may be unpleasant to my future incarnations and those of my neighbors, but it increases the welfare of those who buy the goods I am producing to achieve this goal.

Saint-Paul’s response to these problems – presented more as suggestions than a manifesto, and thinly summarised in only two pages  at the end of the book – is not to ignore science but to set some limits:

I am not advocating that scientific evidence should be disregarded in the decision-making process. That is obviously a recipe for poor outcomes. Instead, I am pointing out that the increased power and reliability of Science makes it all the more important that strict limits define what is an acceptable government intervention and that it is socially accepted that policies which trespass those limits cannot be implemented regardless of their alleged beneficial outcomes. We are going in the opposite direction from such discipline.

These limits could involve a minimal redistributive state to rule out absolute poverty – allowing some values to supersede freedom – but these values would not include “statistical notions of public health or aggregate happiness”, nor most forms of strong paternalism.

But despite pointing to the dangers of utilitarian arguments against paternalistic interventions, Saint-Paul finds them hard to resist. He regularly refers the biases of government, noting the irony that “the government could well offset such deficiencies with its own policy tools but soon chose not to by having high public deficits and low interest rates.” And when it comes to his picture of his preferred world it has a utilitarian flavour itself.

Being treated by society as responsible and unitary goes a long way toward eliciting responsible and unitary behavior. The incentives to solve my own behavioral problems are much larger if I expect society to hold me responsible for the consequences of my actions.

Bad Behavioural Science: Failures, bias and fairy tales

Below is the text of my presentation to the Sydney Behavioural Economics and Behavioural Science Meetup on 11 May 2016. The talk is aimed at an intelligent non-specialist audience. I expect the behavioural science knowledge of most attendees is drawn from popular behavioural science books and meetups such as this.

Intro

The typical behavioural science or behavioural economics event is a love-in. We all get together to laugh at people’s irrationality – that is, the irrationality of others – and opine that if only we designed the world more intelligently, people would make better decisions.

We can point to a vast literature – described in books such as Dan Ariely’s Predictably Irrational, Daniel Kahneman’s Thinking, Fast and Slow, and Richard Thaler and Cass Sunstein’s Nudge – all demonstrating the fallibility of humans, the vast array of biases we exhibit in our everyday decision making, and how we can help to overcome these problems.

Today I want to muddy the waters. Not only is the “we can save the world” TED talk angle that tends to accompany behavioural science stories boring, but this angle also ignores the problems and debates in the field.

I am going to tell you four stories – stories that many of you will have heard before. Then I am going to look at the foundations of each of these stories and show that the conclusions you should draw from each are not as clear as you might have been told.

I will say at the outset that the message of this talk is not that all behavioural science is bunk. Rather, you need to critically assess what you hear.

I should also point out that I am only covering one of the possible angles of critique. There are plenty of others.

For those who want to capture what I say, at 7pm tonight (AEST) the script of what I propose to talk about and the important images from my slides will be posted on my blog at jasoncollins.blog. That post will include links to all the studies I refer to.

Story one – the Florida effect.

John Bargh and friends asked two groups of 30 psychology students to rearrange scrambled words into a sentence that made sense. Students in each of these groups were randomly assigned into one of two conditions. Some students received scrambled sentences with words that relate to elderly stereotypes, such as worried, Florida, old, lonely, grey, wrinkle, and so on. The other students were given sentences with non-age-specific words.

After completing this exercise the participants were debriefed and thanked. They then exited the laboratory by walking down a corridor.

Now for the punch line. The experimenters timed the participants as they walked down the corridor. Those who had rearranged the sentences with non-age specific words walked down the corridor in a touch over seven seconds. Those who had rearranged the sentences with the elderly “primes” walked more slowly – down the corridor in a bit over eight seconds. A very cool result that has become known as the Florida Effect.

Except……the study doesn’t seem to replicate. In 2012 a paper was published in PLOS One where Stephen Doyen and friends used a laser timer to time how long people took to walk down the corridor after rearranging their scrambled sentences. The presence of the elderly words did not change their walking speed (unless the experimenters knew about the treatment – but that’s another story). There’s another failed replication on PsychFileDrawer.

What was most striking about this failed replication – apart from putting a big question mark next to the result – was the way the lead researcher John Bargh attacked the PLOS One paper in a post on his Psychology Today blog (his blog post appears to have been deleted, but you can see a description of the content in Ed Yong’s article). Apart from calling the post “Nothing in their heads” and describing the researchers as incompetent, he desperately tried to differentiate the results – such as by arguing there were differences in methodology (which in some cases did not actually exist) – and by suggesting that the replication team used too many primes.

I don’t want to pick on this particular study alone (although I’m happy to pick on the reaction). After all, failure to replicate is not proof that the effect does not exist. But failure to replicate is a systematic problem in the behavioural sciences (in fact, many sciences). A study by Brian Nosek and friends published in Science examined 100 cognitive and social psychology studies published in several major psychology journals. They subjectively rated 39% of the studies they attempted to replicate as having replicated. Only 25% of social psychology studies in that study met that mark. The size of the effect in these studies was also around half of that in the originals – as shown in this plot of original versus replication effect sizes. The Florida effect is just the tip of the iceberg.

Nosek et al (2015)

Nosek et al (2015)

Priming studies seem to be particularly problematic. Another priming area in trouble is “money priming”, where exposure to images of money or the concept of money make people less willing to help others or more likely to endorse a free market economy. As an example, one set of replications of the effect of money primes on political views by Rohrer and friends – as shown in these four charts –  found no effect (ungated pdf). Analysis of the broader literature on money priming suggests, among other things, massive publication bias.

As a non-priming example, those of you who have read Daniel Kahneman’s Thinking, Fast and Slow or Malcolm Gladwell’s David and Goliath might recall a study by Adam Alter and friends. In that study, 40 students were exposed to two versions of the cognitive reflection task. One of the typical questions in the cognitive reflection task is the following classic:

A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?

The two versions differed in that one used small light grey font that made the questions hard to read. Those exposed to the harder to read questions achieved higher scores.

Cognitive reflection test results, Meyer et al (2015)

It all sounds very cool. Slowing people down made them do better. But while the original study with 40 subjects found a large effect, replications involving thousands of people found nothing (Terry Burnham discusses this paper in more detail here). As you can see in the chart, the positive result is a small sample outlier.

Then there is ego depletion – the idea that we have a fixed stock of willpower that becomes depleted through use. If we have to use our willpower in one setting, we’re more likely to crumble later on as our ego is depleted.

Now, this theory doesn’t rest on one study – a 2010 meta-analysis examined 83 studies with 198 experiments in concluding there was an ego depletion effect. But that meta-analysis had a lot of flaws, including only including published studies.

Soon a pre-registered replication of one ego depletion experiment involving 23 labs and over 2,000 subjects will be published in Psychological Science. The result? If there is any effect of ego depletion – at least as captured in that experiment – it is close to zero.

So what is going on here? Why all these failures? One, there is likely publication bias. Only those studies with positive results make it into print. Small sample sizes in many studies make it likely that any positive results are false positives.

Then there is p-hacking. People play around with their hypotheses and the data until they get the result they want.

Then there is the garden of forking paths, which is the more subtle process whereby people choose their method of analysis or what data to exclude by what often seems to be good reasons after the fact. All of these lead to a higher probability of positive results and these positive results end up being the ones that we read.

Now that these bodies of research are crumbling, some of the obfuscation going on is deplorable. John Bargh’s concerning the Florida effect is one of the more extreme examples. Many of the original study proponents erect the defences and claim poor replication technique or that they haven’t captured all the subtleties of the situation. Personally, I’d like to see a lot more admissions of “well, that didn’t turn out”.

But what is also surprising was the level of confidence some people had in these findings. Here’s a passage from Kahneman’s Thinking, Fast and Slow – straight out of the chapter on priming:

When I describe priming studies to audiences, the reaction is often disbelief. This is not a surprise: System 2 believes that it is in charge and that it knows the reasons for its choices. Questions are probably cropping up in your mind as well: How is it possible for such trivial manipulations of the context to have such large effects? …

The idea you should focus on, however, is that disbelief is not an option. The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these studies are true.

Err, no.

So, don’t believe every study you read. Maintain some of that scepticism even for large bodies of published research. Look for pre-registered replications where people clearly stated what they were going to do before they did it.

And I should say, this recommendation doesn’t just apply to academic studies. There are now plenty of governments and consultants running around advertising the results of their behavioural work with approaches also likely to be subject to similar problems.

Story two – the jam study

On two Saturdays in a California supermarket, Mark Lepper and Sheena Iyengar (ungated pdf) set up tasting displays of either six or 24 jars of jam. Consumers could taste as many jams as they wished, and if they approached the tasting table they received a $1 discount coupon to buy the jam.

For attracting initial interest, the large display of 24 jams did a better job, with 60 per cent of people who passed the display stopping. Forty per cent of people stopped at the six jam display. But only three per cent of those who stopped at the 24 jam display purchased any of the jam, compared with almost 30 per cent who stopped at the six jam display.

This result has been one of the centrepieces of the argument that more choice is not necessarily good. The larger display seemed to reduce consumer motivation to buy the product. The theories around this concept and the associated idea that more choice does not make us happy are often labelled the choice overload hypothesis or the paradox of choice. Barry Schwartz wrote a whole book on this topic.

Fast-forward 10 years to another paper, this one by Benjamin Scheibehenne and friends (ungated pdf). They surveyed the literature on the choice overload hypothesis – there is plenty. And across the basket of studies – shown in this chart – evidence of choice overload does not emerge so clearly. In some cases, choice increases purchases. In others it reduces them. Scheibehenne and friends determined that the mean effect size of changing the number of choices across the studies was effectively zero.

These reviewed studies included a few attempts to replicate the jam study results. An experiment using jam in an upscale German supermarket found no effect. Other experiments found no effect of choice size using chocolates or jelly beans. There were small differences in study design between these and the original jam study (as original authors are often quick to point out when replications fail), but if studies are so sensitive to study design and hard to replicate, it seems foolhardy to extrapolate the results of the original study too far.

There is a great quote from one of my favourite books, Jim Manzi’s Uncontrolled, which captures this danger.

[P]opularizers telescoped the conclusions derived from one coupon-plus-display promotion in one store on two Saturdays, up through assertions about the impact of product selection for jam for this store, to the impact of product selection for jam for all grocery stores in America, to claims about the impact of product selection for all retail products of any kind in every store, ultimately to fairly grandiose claims about the benefits of choice to society.

While these study results often lead to grandiose extrapolations, the defences of these studies when there is a failure to replicate or ambiguous evidence often undermine the extent of these claims. Claiming that the replication didn’t perfectly copy the original study suggests the original effect applies to a small set of circumstances. This is no longer TED talk material that can be applied across our whole life.

That is not to say that there is not something interesting going on in these choice studies. Scheibehenne and friends suggest that there may be a set of restrictive conditions under which choice overload occurs. These conditions might involve the complexity (and not the size) of the choice, the lack of dominant alternatives, assortment of options, time pressure or the distribution of product quality (as suggested by another meta-analysis). And since the jam study appears tough to replicate, these conditions might be narrow. They suggest more subtle solutions than simply reducing choice. Let’s not recommend supermarkets get rid of 75% of their product lines to boost their sales by 900%.

So, even if a study suggests something interesting is going on, don’t immediately swallow the TED talk and book on how this completely changes our understanding of the world. Even if the result is interesting, the story is likely more subtle than the way it is  told.

Story three – organ donation

Organ donation rates are an often used example of the power of defaults. I’m now going to take a moment to read a passage by Dan Ariely explaining how defaults affect organ donation rates. He refers to this chart from Johnson and Goldstein (2003) (ungated pdf):

One of my favorite graphs in all of social science is the following plot from an inspiring paper by Eric Johnson and Daniel Goldstein. This graph shows the percentage of people, across different European countries, who are willing to donate their organs after they pass away. When people see this plot and try to speculate about the cause for the differences between the countries that donate a lot (in blue) and the countries that donate little (in orange) they usually come up with “big” reasons such as religion, culture, etc.

But you will notice that pairs of similar countries have very different levels of organ donations. For example, take the following pairs of countries: Denmark and Sweden; the Netherlands and Belgium; Austria and Germany (and depending on your individual perspective France and the UK). These are countries that we usually think of as rather similar in terms of culture, religion, etc., yet their levels of organ donations are very different.

So, what could explain these differences? It turns out that it is the design of the form at the DMV. In countries where the form is set as “opt-in” (check this box if you want to participate in the organ donation program) people do not check the box and as a consequence they do not become a part of the program. In countries where the form is set as “opt-out” (check this box if you don’t want to participate in the organ donation program) people also do not check the box and are automatically enrolled in the program. In both cases large proportions of people simply adopt the default option.

Johnson and Goldstein (2003) Organ donation rates in Europe

But does this chart seem right given that story? Only 2 in every 10,000 people fail to opt-out in Austria? Only 3 in 10,000 in Hungary? It seems too few. And for Dan Ariely’s story, it is too few, because the process is not as described.

The hint is in the term “presumed consent” in the chart description. There is actually no time where Austrians or Hungarians are presented with a form where they can simply change from the default. Instead, they are presumed to consent to organ donation. To change that presumption, they have to take steps such as contacting government authorities to submit forms stating they don’t want their organs removed. Most people probably don’t even think about it. It’s like calling my Australian citizenship – resulting from my birth in Australia – a default and praising the Australian Government for its fine choice architecture.

And what about the outcomes we care about – actual organ donation rates. Remember, the numbers on the Johnson and Goldstein chart aren’t the proportion of people with organs removed from their bodies. It turns out that the relationship is much weaker there.

Here is a second chart with actual donation rates – the same countries in the same order. The relationship suddenly looks a lot less clear. Germany at 15.3 deceased donors per million people is not far from Austria’s 18.8 and above Sweden’s 15.1. For two countries not on this chart, Spain, which has an opt-out arrangement, is far ahead of most countries at 33.8 deceased donors per million, but the United States, an opt-in country, is also ahead of most opt-out countries with a donation rate of 26.0.

Deceased donors per million people

Deceased donors per million people (Wikipedia, 2016)

[To be clear, I am not suggesting that Johnson and Goldstein did not analyse the actual donation rates, nor that no difference exists – there is an estimate of the effect of presumed consent in their paper, and other papers also attempt to do this. Those papers generally find a positive effect. However, the story is almost always told using the first chart. A difference of 16.4 versus 14.1 donors per million (Johnson and Goldstein’s estimate) is not quite as striking as 99.98% for Austria versus 12% for Germany. Even my uncontrolled chart could be seen to be exaggerating the difference – the averages in my chart are 13.1 per million for opt out and 19.3 per million for presumed consent. See the comments from Johnson and Goldstein at the end of this post.]

So, if you can, read the original papers, not the popularised version – and I should say that although I’ve picked on Dan Ariely’s telling of the story here, he is hardly Robinson Crusoe in telling the organ donation story in that way. I’ve lost count of the number of times reading the original paper has completely derailed what I thought was the paper’s message.

In fact, sometimes you will discover there is no evidence for the story at all – Richard Titmuss’s suggestion that paying for blood donations might reduce supply by crowding out intrinsic motivations was a thought experiment, not an observed effect. Recent evidence suggests that – as per most economic goods – paying for blood could increase supply.

And this organ donation story provides a second more subtle lesson – if you can, look at the outcomes we want to influence, not some proxy that might not lead where you hope.

Story four – the hot hand

This last story is going to be somewhat technical. I actually chose it as a challenge to myself to see if I could communicate this idea to a group of intelligent non-technical people. It’s also a very cool story, based on work by Joshua Miller and Adam Sanjurjo. I don’t expect you to be able to immediately go and give these explanations to someone else at the end of this talk, but I hope you can see something  interesting is going on.

So, when people watch sports such as basketball, they often see a hot hand. They will describe players as “hot” and “in form”. Our belief is that the person who has just hit a shot or a series of shots is more likely to hit their next one.

But is this belief in the ‘hot hand’ a rational belief? Or is it the case that people are seeing something that doesn’t exist? Is the ‘hot hand’ an illusion?

To answer this question, Thomas Gilovich, Robert Vallone and Amos Tversky took masses of shot data from a variety of sources, including the Philadelphia 76ers and Boston Celtics, and examined it for evidence of a hot hand. This included shots in games, free throws and a controlled shooting experiment.

What did they find? The hot hand was an illusion.

So, let’s talk a bit about how we might show this. This table shows a set of four shots by a player in each of 10 games. In the first column is the results of their shots. An X is a hit, an O is a miss. This particular player took 40 shots and hit 20 – so they are a 50% shooter.

So what would count as evidence of a hot hand? What we can do is compare 1) the proportion of shots they hit if the shot immediately before was a hit with 2) their normal shooting percentage. If their hit rate after a hit is higher than their normal shot probability, then we might say they get hot.

The second column of the table shows proportion of shots hit by the player if the shot before was a hit. Looking at the first sequence, the first shot was a hit, and it is followed by a hit. The second shot, a hit, is followed by a miss. So, for that first sequence, the proportion of hits if the shot before was a hit is 50%. The last shot, the third hit, is not followed by any other shots, so does not affect our calculation. The rest of that column shows the proportion of hits followed by hits for the other sequences. Where there is no hit in the first three shots, those sequences don’t enter our calculations.

Basketball player shot sequences (X=hit, O=miss)

Shots p(X|X)
XXOX 50%
OXOX 0%
OOXX 100%
OXOX 0%
XXXX 100%
XOOX 0%
XXOO 50%
OOOO
OOOX
OOXX 100%
AVERAGE 50%

Across these sequences, the average proportion of hits following a hit is 50%. (That average is also the expected value we would get if we randomly picked one of these sequences.) Since the proportion of hits after a hit is the same as their shooting percentage, we could argue that they don’t have a hot hand.

Now, I am going to take you on a detour, and then we’ll come back to this example. And that detour involves the coin flipping that I got everyone to do before we commenced.

34 people flipped a coin four times, and I asked you to try to flip a heads on each flip.  [The numbers obtained for the coin flipping were, obviously, added after the fact. The raw data is here. And as it turned out they did not quite tell the story I expected, so there are some slight amendments below to the original script.] Here are the results of our experiment. In the second column is the proportion of heads that you threw. Across all of you, you flipped heads 49% of the time, pretty close to 50%. Obviously you have no control over your flips. But what is more interesting is the second column. On average, the proportion of heads flipped after an earlier flip of heads looks to be closer to 48%.

Meetup experiment results – flipping a coin four times

Number of players p(H) p(H|H)
34 49% 48%

Now, intuition tells us the probability of a heads after flipping an earlier heads will be 50% (unless you suffer from the gambler’s fallacy). So this seems to be the right result.

But let’s have a closer look at this. This next table shows the 16 possible combinations of heads and tails you could have flipped. Each of these 16 combinations has an equal probability of occurring. What is the average proportion of heads following a previous flip of a heads? It turns out it is 40.5%. That doesn’t seem right. But let’s delve deeper into this. In the third column is how many heads follow a heads, and the fourth how many tails follow a heads. If we count across all the sequences, we see that we have 12 heads and 12 tails after the 24 earlier flips of heads – spot on the 50% you expect.

16 possible combinations of heads and tails across four flips

Flips p(H|H) n(H|H) n(T|H)
HHHH 100% 3 0
HHHT 67% 2 1
HHTH 50% 1 1
HHTT 50% 1 1
HTHH 50% 1 1
HTHT 0% 0 2
HTTH 0% 0 1
HTTT 0% 0 1
THHH 100% 2 0
THHT 50% 1 1
THTH 0% 0 1
THTT 0% 0 1
TTHH 100% 1 0
TTHT 0% 0 1
TTTH
TTTT
AVERAGE 40.5% 12 12

So what is going on in that second column. By looking at these short sequences, we are introducing a bias. Most of the cases of heads following heads are clustered together – such as the first sequence which has three cases of a heads following a heads. Yet it has the same weight in our average as the sequence TTHT – with only one shot occurring after a heads. The reason a tails appears more likely to follow a heads is because of this bias. The actual probability of a heads following a heads is 50%.

And if we do the same exercise for your flips, the result looks now look a bit different – you flipped 28 heads and 22 tails for the 50 flips directly after a head. 56% heads, 44% tails. It seems you have a hot hand, although our original analysis clouded that result (Obviously, they didn’t really have a hot hand – it is a chance result. There was a 24% probability of getting 28 or more heads. Ideally I should have got a larger sample size.)

Meetup experiment results – flipping a coin four times

Number of players p(H) p(H|H) n(H|H) n(T|H)
34 49% 48% 28 22

Turning back to the basketball example I showed you at the beginning, there I suggested there was a 50% chance of a hit after a hit for a 50% shooter – the first two columns of the table below. But let’s count the shots that occur after a hit. There are 12 shots that occur after a hit, and it turns out that 7 of these shots are a hit. Our shooter hits 58% of shots immediately following a hit. They miss on only 42% of those shots. They have a hot hand (noting the small sample size here……but you get the picture).

Basketball player shot sequences (X=hit, O=miss)

Shots p(X|X) n(X|X) n(O|X)
XXOX 50% 1 1
OXOX 0% 0 1
OOXX 100% 1 0
OXOX 0% 0 1
XXXX 100% 3 0
XOOX 0% 0 1
XXOO 50% 1 1
OOOO
OOOX
OOXX 100% 1 0
AVERAGE 50% 7 5

So, why have I bothered with this stats lesson? By taking short sequences of shots and measuring the proportion of hits following a hit, I have introduced a bias in the measurement. The reason this is important is because the papers that supposedly show that there is no hot hand used a methodology that suffered from this same bias. When you correct for the bias, there is a hot hand.

Taking the famous paper by Tom Gilovich and friends that I mentioned at the beginning, they did not average across sequences as I have shown here. But by looking at short sequences of shots, selecting each hit (or sequence of hits) and seeing the result of the following shot, they introduced the same bias. The bias acts in the opposite direction to the hot hand, effectively cancelling it out and leading to a conclusion that each shot is independent of the last.

Miller and Sanjurjo crunched the numbers for one of the studies in the Gilovich and friends paper, and found that the probability of hitting a three pointer following a sequence of three previous hits is 13 percentage points higher than after a sequence of three misses. There truly is a hot hand. To give you a sense of the scale of that difference, Miller and Sanjurjo note that the difference between the median and best three point shooter in the NBA is only 10 percentage points.

Apart from the fact that this statistical bias slipped past everyone’s attention for close to thirty years, I find this result extraordinarily interesting for another reason. We have a body of research that suggests that even slight cues in the environment can change our actions. Words associated with old people can slow us down. Images of money can make us selfish. And so on. Yet why haven’t these same researchers been asking why a basketball player would not be influenced by their earlier shots – surely a more salient part of the environment than the word “Florida”? The desire to show one bias allowed them to overlook another.

So, remember that behavioural scientists are as biased as anyone.

If you are interested in learning more….

Before I close, I’ll leave with a few places you can go if you found tonight’s presentation interesting.

First is Andrew Gelman’s truly wonderful blog Statistical Modeling, Causal Inference, and Social Science. Please don’t be put off by the name – you will learn something from Gelman even if you know little about statistics. Personally, I’ve learnt more about statistics from this blog than I did through the half a dozen statistics and econometrics units I completed through university. This is the number one place to see crap papers skewered and for discussion about why we see so much poor research. Google Andrew Gelman and his blog will be at the top of the list.

ManziSecond, read Jim Manzi’s Uncontrolled. It will give you a new lens with which to think about causal associations in our world. Manzi’s plea for humility about what we believe we know is important.

Third, read some Gerd Gigerenzer. I only touched on a couple of the critiques of behavioural science tonight. There are many others, such as the question of how irrational we really are. On this angle, Gigerenzer’s work is among the most interesting. I suggest starting with Simple Heuristics That Make Us Smart by Gigerenzer, Peter Todd and the ABC Research Group, and go from there.

Finally, check out my blog at jasoncollins.blog. I’m grumpy about more than the material that I covered tonight. I will point you to one piece – Please Not Another Bias: An Evolutionary Take on Behavioural Economics – where I complain about how behavioural economics needs to be more than a collection of biases, but hopefully you will find more there that’s of interest to you.

And that’s it for tonight.

Evolutionary Biology in Economics: A Review

I’ve just had a new article published in the Economic Record – Evolutionary Biology in Economics: A Review.

Evolutionary Biology in Economics: A Review

Jason Collins, Boris Baer and Ernst Juerg Weber

As human traits and preferences were shaped by natural selection, there is substantial potential for the use of evolutionary biology in economic analysis. In this paper, we review the extent to which evolutionary theory has been incorporated into economic research. We examine work in four areas: the evolution of preferences, the molecular genetic basis of economic traits, the interaction of evolutionary and economic dynamics, and the genetic foundations of economic development. These fields comprise a thriving body of research, but have significant scope for further investigation. In particular, the growing accessibility of low-cost molecular data will create more opportunities for research on the relationship between molecular genetic information and economic traits.

I previously posted about an earlier version of this paper when it was called The Evolutionary Foundations of Economics. You can access an ungated version of that earlier paper here. Drop me a line if you want a copy of the published paper but can’t get access.

It’s not the most exciting article – it was the introductory chapter of my PhD thesis and I wrote it to provide the foundations for the substantive chapters rather than to spark a revolution. However, it will give you a decent snapshot of what is going on.

Ariely’s The Honest Truth About Dishonesty

I rate the third of Dan Ariely’s books, The Honest Truth About Dishonesty: How We Lie to Everyone – Especially Ourselves, somewhere between his first two books.

One of the strengths of Ariely’s books is that he is largely writing about his own experiments, and not simply scraping through the same barrel as every other pop behavioural science author. The Honest Truth has a smaller back catalogue of experiments to draw from than Predictably Irrational, so it sometimes meanders in the same way as The Upside of Irrationality. But the thread that ties The Honest Truth together – how and why we cheat – and Ariely’s investigations into it gave those extended riffs more substance than the story telling that filled some parts of The Upside.

The basic story of the book is that we like to see ourselves as honest, but are quite willing and able to indulge in a small amount of cheating where we can rationalise it. This amount of cheating is quite flexible based on situational factors, such as what other people are doing, and is not purely the result of a cost-benefit calculation.

The experiment that crops up again and again through the book is a task to find numbers in a series of matrices. People then shred the answers before collecting payment based on how many the completed. Most people cheat a little, possibly because they can rationalise that they could have solved more, or had almost completed the next one. Few cheat to the maximum, even when it is clear they have the opportunity to do so.

For much of the first part of the book, Ariely frames his research against the Simple Model of Rational Crime (or ‘SMORC’) – where people do a rational cost-benefit analysis as to whether to commit the crime. He shows experiments where people don’t cheat to the maximum amount when they have no chance of being caught – almost no-one says that they solved all the puzzles (amusingly, a few say they solved 20 out of 20, but no-one says 18 or 19). And most people do not increase their level of cheating when the potential gains increase.

As Ariely works through the various experiments attempting to isolate parts of the SMORC and show they don’t hold, I never felt fully satisfied. It is always possible to see how people might rationally respond in a way that thwarts the experimental design.

For example, Ariely found that changes in the stake with no change in enforcement did not result in an increase in cheating. But if I am in an environment with more money, I might assume there is more monitoring and enforcement, even if I can’t see it. However, I believe Ariely is right in arguing that the decision is not a pure cost-benefit analysis.

One of the more interesting parts of the book concerned how increasing the degrees of separation from the monetary outcome increases cheating. Having people collect tokens, which could be later exchanged for cash, increased cheating. In that light, a decision to cheat in an area such as financial services, where the ultimate cost is cash but there are many degrees of separation (e.g. manipulating an interest rate benchmark which changes the price I get on a trade which affects my profit and loss which affects the size of my bonus), might not feel like cheating at all.

As is the case when I read any behavioural science book, the part that leaves me slightly cold is that I’m not sure I can trust some of the results. The recent replication failures involving priming and ego depletion – and both phenomena feature in the book – resulted in me taking some of the results with a grain of salt. How many will stand the test of time?