Author: Jason Collins

Economics. Behavioural and data science. PhD economics and evolutionary biology. Blog at jasoncollins.blog

A week of links

Links this week:

  1. We see skill where none exists and are happy to pay for transparently useless advice.
  2. No evidence of the effect of parenting on criminal behaviour.
  3. Doug Kenrick on testosterone and the rationality of taking risks.
  4. Pulling apart the recent paper on perceptions of ability and the gender gap.
  5. The human guinea pig.
  6. Distrust of vaccines not a left wing issue.

Manzi’s Uncontrolled

ManziIn social science, a myriad of factors can affect outcomes. Think of all the factors claimed to affect school achievement – student characteristics such as intelligence, conscientiousness, patience and willingness to work hard, parental characteristics such as income and education, and then there is genetics, socioeconomic status, school peers, teacher quality, class size, local crime and so on.

In assessing the effect of any policy or program, researchers typically attempt to control for these confounding factors. But as James Manzi forcefully argues in Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society, the high “causal density” in these settings nearly always results in the possibility that there is an important factor you have missed or do not understand.

As a result, Manzi advocates the use of randomised field trials (RFTs) to attempt to tease out whether interventions are having the desired effect. If control and treatment groups are randomised, any unidentified factors affecting the outcome should affect each group equally.

The ubiquity of uncontrolled factors and the ability of RFTs to do a better job of capturing them was demonstrated by John Ioannidis in a 2005 paper evaluating the reliability of forty-nine studies. As Manzi reports, 90 per cent of the large randomized experiments had produced results that could be replicated, compared to only 20 per cent of the non-randomized studies.

Manzi notes that RFTs have critics and limitations, and people such as James Heckman have argued that it is possible to achieve the same results as RFTs using non-experimental mathematical techniques. However, as Manzi points out, Heckman and friends’ demonstration that RFT results can be replicated using improved econometric methods after the fact is not the same as defining a set of procedures that can produce the same effect as future RFTs.

Although Manzi is a strong advocate of RFTs, he is clear that RFTs will not lead to a new era where we will understand everything. High causal density will always place limits on the ability to generalise experimental results. Manzi writes:

[I]ncreasing complexity has another pernicious effect: it becomes far harder to generalize the results of experiments. We can run a clinical trial in Norfolk, Virginia, and conclude with tolerable reliability that “Vaccine X prevents disease Y.” We can’t conclude that if literacy program X works in Norfolk, then it will work everywhere. The real predictive rule is usually closer to something like “Literacy program X is effective for children in urban areas, and who have the following range of incomes and prior test scores, when the following alternatives are not available in the school district, and the teachers have the following qualifications, and overall economic conditions in the district are within the following range.” And by the way, even this predictive rule stops working ten years from now, when different background conditions obtain in the society.

Manzi’s critique of the famous jam study is indicative. Can you truly generalise from 10 hours in one store with shoppers randomised into one hour chunks? Taken literally, the result implies that eliminating 75 per cent of products could increase sales by 900 per cent. However, that hasn’t stopped popularisers telescoping “the conclusions derived from one coupon-plus-display promotion in one store on two Saturdays, up through assertions about the impact of product selection for jam for this store, to the impact of product selection for jam for all grocery stores in America, to claims about the impact of product selection for all retail products of any kind in every store, ultimately to fairly grandiose claims about the benefits of choice to society.”

It’s not hard to come up other studies that are generalised in this matter. The Perry Pre-School project that found benefits for disadvantaged African American children in public pre-schools in the 1960s is generalised to promote more intensive early childhood education for everyone, regardless of country, race, socioeconomic status or era. A single Kenyan case study of deworming leads to a plan to deworm the world. And so on.

As a result, succeeding or failing in a single trial doesn’t usually constitute adequate evaluation of a program. Rather, promising ideas need to be subject to iterative evaluation in the relevant contexts.

Manzi’s reluctance to suggest RFTs will lead us to a new era also stems from the results of the few RFTs conducted in social science. Most programs fail replicated, independent, well-designed RFTs, so we should be sceptical of claims about the effectiveness of new programs. As Manzi states, innovative ideas rarely work.

In his review of RFTs in the social sciences, he does suggest one pattern emerges. Programs targeted at improving behaviour or raising skills or consciousness are more likely to fail than changes in incentives or environment. This might be considered a nod to both standard and behavioural economic tools.

At the end of the book, Manzi provides some guidance on how government should consider programs in an environment of high causal density.

First, he recommends that government build strong experimental capability. To keep the foxes out of the henhouse and avoid program advocates influencing results, he recommends a separate organisational entity be established to evaluate programs.

Second, there should be experimentation at the state level, or at the smallest possible competent authority. This might involve state by state deviation from Federal laws or programs on a trial basis.

Manzi recommends a broader scope for experimentation than you might normally hear advocated, with his suggestion that experimentation extend to examining different levels of coercion:

The characteristic error of the contemporary Right and Left in this is enforcing too many social norms on a national basis. All that has varied has been which norms predominate. The characteristic error of liberty-as-goal libertarians has been more subtle but no less severe: the parallel failure to appreciate that a national rule of “no restrictions on non-coercive behavior” (which, admittedly, is something of a cartoon) contravenes a primary rationale for liberty. What if social conservatives are right and the wheels really will come off society in the long run if we don’t legally restrict various sexual behaviors? What if some left-wing economists are right and it is better to have aggressive zoning laws that prohibit big-box retailers? I think both are mistaken, but I might be wrong. What if I’m right for some people at this moment in time but wrong for others, or what if I’m wrong for the same people ten years from now?

The freedom to experiment needs to include freedom to experiment with different governmental (i.e., coercive) rules. So here we have the paradox: a liberty-as-means libertarian ought to support, in many cases, local autonomy to restrict at least some personal freedoms.

To enable experimentation, Manzi uses an evolutionary framing and notes there is a need to encourage variation, cross-pollination of ideas and selection pressure. Encouraging variation requires a willingness to allow failure and deviation from whatever vision of social organisation we believe is best.

Our ignorance demands that we let social evolution operate as the least bad of the alternatives for determining what works. Subsocieties that behave differently on many dimensions are both the raw materials for an evolutionary process that sifts through and hybridizes alternative institutions, and also are analogous to the kind of evolutionary “reserves” of variation that may not be adaptive now but might be in some future changed environment. We want variation in human social arrangements for some of the same reasons that biodiversity can be useful in genetic evolution. This is the standard libertarian insight that the open society is well suited to developing knowledge in the face of a complex and changing environment. As per the first two parts of this book, it remains valid. But if we take our ignorance seriously, the implications of this insight significantly diverge from much of what the modern libertarian movement espouses.

Manzi highlights the importance of selection pressure is his discussion of school vouchers. He considers that “giving choice” to parents does not necessarily provide an environment in which trial-and-error improvement will occur as there may not be alternatives to status quo, the right incentives for market participants or adequate information for parents. Manzi is also sceptical as to whether taxpayer funded vouchers will come with so many controls to render the experiment useless.

Manzi’s proposals to provide selection pressure are not without problems. He suggests a comprehensive national exam for all schools receiving government funding, with those results published. But is the need to do well in this test is a form of control that kills off much of the experimentation, turning the education system into a group of organisations competing for high test scores?

One of Manzi’s more interesting ideas relates to immigration. Manzi supports programs to attract highly skilled immigrants, such as skills-based immigration programs, or offering entry to foreign students upon completing certain degrees. He proposes testing this idea by using a subset of the visas granted through lotteries to run a RFT. Immigrant outcomes could then be tracked.

Ultimately, however, Manzi’s message is one of humility. No matter what our worldview, we should be prepared to allow experimentation with alternatives, as we may well be wrong. And that favourite program you have been promoting? Feel free to experiment, but don’t expect success. And if it works in that context, test and test again, as it may not work somewhere else.

Manzi on the abortion-crime hypothesis

My recent reading of David Colander and Roland Kupers’s Complexity and the Art of Public Policy prompted me to re-read James Manzi’s Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society. I see the two books as riffs on a similar theme.

I’ll post a review of Uncontrolled later this week, but in the meantime, Manzi provides an interesting take on the Donohue-Levitt abortion-crime hypothesis. Their hypothesis is that abortion reduces crime as unwanted children are more likely to become criminals. As the legalisation of abortion increased access to abortion and decreased the number of unwanted children, decreases in crime through the 1990s and 2000s could be due to this legalisation.

Donohue and Levitt’s initial paper triggered a raft of responses, including one demonstrating an analytical error, which, once corrected for, resulted in the abortion-crime link disappearing. Donohue and Levitt then redid the work, and showed by recasting a few assumptions, the error could be corrected for and the link re-established. As Manzi states:

The revealing observation is not that there was an analytical error in the paper (which almost certainly happens far more often than we like to think), but that once it was found and corrected, it was feasible to rejigger the regression analysis to get back to the original directional result through various defensible tweaks to assumptions. If one could rule out either the original assumptions or these new assumptions as unreasonable, that would be better news for the technique. Instead we have a recipe for irresolvable debate.

Manzi also points out that Levitt, in his book Freakonomics (with Stephen Dubner), indirectly identified one of the reasons why Donohue and Levitt’s  claim is so tenuous:

In Freakonomics, Levitt and Dubner write that Roe [the Supreme Court decision in Roe v Wade establishing a right to abortion] is “like the proverbial butterfly that flaps its wings on one continent and eventually creates a hurricane on another.” But this simile cuts both ways. It is presumably meant to evoke the “butterfly effect”: meteorologist Edward Lorenz’s famous description of a global climate system with such a dense web of interconnected pathways of causation that long-term weather forecasting is a fool’s errand. The actual event that inspired this observation was that, one day in 1961, Lorenz entered .506 instead of .506127 for one parameter in a climate-forecasting model and discovered that it produced a wildly different long-term weather forecast. This is, of course, directly analogous to what we see in the abortion-crime debate and Bartels’s model for income inequality: tiny changes in assumptions yield vastly different results. It is a telltale sign that human society is far too complicated to yield to the analytical tools that nonexperimental social science brings to bear. The questions addressed by social science typically have none of the characteristics that made causal attribution in the smoking–lung cancer case practical.

A week of links

Links this week:

  1. Does public policy promote obesity? This month’s Cato Unbound on whether public policy can stop obesity could be interesting when the discussion begins, but the response essays so far have generally talked past each other.
  2. Three links via Tyler Cowen. New cars fake their engine noise. People turn down high-cost low-value treatments when they can pocket part of the savingsThe right won the economics debate; left and right are just haggling over details.
  3. The Dunning-Kruger Peak of Advertising.
  4. Chickens prefer beautiful humans. So much for the subjectivity of beauty.
  5. Default retirement savings in Illinois.

Grade inflation and the Dunning-Kruger effect

The famous Dunning-Kruger effect, in the words of Dunning and Kruger, is a bias where:

People tend to hold overly favorable views of their abilities in many social and intellectual domains

in part, because:

[P]eople who are unskilled in these domains suffer a dual burden: Not only do these people reach erroneous conclusions and make unfortunate choices, but their incompetence robs them of the metacognitive ability to realize it.

There have been plenty of critiques and explanations over the years, including an article by Marian Krajc and Andreas Ortmann who argue the overestimation of ability is partly a signal extraction problem. In environments where people are not provided with feedback on their relative standing, they are will tend to make larger estimation errors.

Krajc and Ortmann point out that the Dunning-Kruger study, as is typical, was done using psychology undergraduates at Cornell. This sample is already a self-selected pool that excludes those unable to gain admission. And once in University, the feedback they receive on their performance is not as useful as it could be. Krajc and Ortmann write [references largely excluded]:

In addition, it is well-known from studies of grade inflation that grades at the undergraduate level have – with the notable exception of the natural sciences – become less and less differentiating over the years: more and more students are awarded top grades. For example, between 1965 and 2000 the number of A’s awarded to Cornell students has more than doubled in percentage while the percentage of grades in the B, C, D and F ranges has consequently dropped (in 1965, 17.5% of grades were A’s, while in 2000, 40% were A’s). These data strongly suggest that Cornell University experiences the same phenomenon of (differential) grade inflation that Harvard experiences and the schools discussed in Sabot and Wakeman-Linn (1991). The dramatic grade inflation documented for the humanities and social-sciences devalues grades as meaningful signals specifically in cohorts of students that are newly constituted and typically draw on the top of high-school classes. Inflated grades complicate the inference problem of student subjects that, quite likely, were students in their first year or in their first semester.

Grade inflation is robbing people of feedback they could use to understand their level of competence.

*A post by Eaon Pritchard on the “Dunning-Kruger peak” reminded me that I was sitting on this passage.

The benefits of cognitive limits

Cleaning up some notes recently, I was reminded of another interesting piece from Gerd Gigerenzer’s Rationality for Mortals:

Is perfect memory desirable, without error? The answer seems to be no. The “sins” of our memory seem to be good errors, that is, by-products (“spandrels”) of a system adapted to the demands of our environments. In this view, forgetting prevents the sheer mass of details stored in an unlimited memory from critically slowing down and inhibiting the retrieval of the few important experiences. Too much memory would impair the mind’s ability to abstract, to infer, and to learn. Moreover, the nature of memory is not simply storing and retrieving. Memory actively “makes up” memories—that is, it makes inferences and reconstructs the past from the present. This is in contrast to perception, which also makes uncertain inferences but reconstructs the present from the past. Memory needs to be functional, not veridical. To build a system that does not forget will not result in human intelligence.

Cognitive limitations both constrain and enable adaptive behavior. There is a point where more information and more cognitive processing can actually do harm, as illustrated in the case of perfect memory. Built-in limitations can in fact be beneficial, enabling new functions that would be absent without them (Hertwig & Todd, 2003). …

Newport (1990) argued that the very constraints of the developing brain of small children enable them to learn their first language fluently. Late language learners, in contrast, tend to experience difficulties when attempting to learn the full range of semantic mappings with their mature mental capacities. In a test of this argument, Elman (1993) tried to get a large neural network with extensive memory to learn the grammatical relationships in a set of several thousand sentences, yet the network faltered. Instead of taking the obvious step of adding more memory to solve the problem, Elman restricted its memory, making the network forget after every three or four words—to mimic the memory restrictions of young children who learn their first language. The network with the restricted memory could not possibly make sense of the long complicated sentences, but its restrictions forced it to focus on the short simple sentences, which it did learn correctly, mastering the small set of grammatical relationships in this subset. Elman then increased the network’s effective memory to five or six words, and so on. By starting small, the network ultimately learned the entire corpus of sentences, which the full network with full memory had never been able to do alone.

Gigerenzer also makes the case that most visual illusions are “good errors” necessary in an intelligent animal. Assumptions used to create the illusions, such as “light tends to come from above”, inform what we “see”.

Perceptual illusions are good errors, a necessary consequence of a highly intelligent “betting” machine (Gregory, 1974). Therefore, a perceptual system that does not make any errors would not be an intelligent system. It would report only what the eye can “see.” That would be both too little and too much. Too little because perception must go beyond the information given, since it has to abstract and generalize. Too much because a “veridical” system would overwhelm the mind with a vast amount of irrelevant details. Perceptual errors, therefore, are a necessary part, or by-product, of an intelligent system. They exemplify a second source of good errors: Visual illusions result from “bets” that are virtually incorrigible, whereas the “bets” in trial- and-error learning are made in order to be corrected eventually. Both kinds of gambles are indispensable and complementary tools of an intelligent mind.

The case of visual illusions illustrates the general proposition that every intelligent system makes good errors; otherwise it would not be intelligent. The reason is that the outside world is uncertain, and the system has to make intelligent inferences based on assumed ecological structures. Going beyond the information given by making inferences will produce systematic errors. Not risking errors would destroy intelligence.

In other parts of his work Gigerenzer builds the case that many of the “biases” identified by Kahneman and friends fall into the “good errors” camp.

A week of links

Links this week:

  1. Skip your annual physical.
  2. The phrase “Statistical significance is not the same as practical significance” is leading us astray.
  3. The ineffectiveness of food and soft drink taxes (although not all calories are the same). The latest extension of the nanny state – banning junk food from playgrounds. And a new book worth looking at – Government Paternalism: Nanny State or Helpful Friend? (HT: Diane Coyle)
  4. Marijuana in Colorado – no surprise that the most grandiose claims of both sides haven’t come to fruition.
  5. Even more on lead and crime.

That chart doesn’t match your headline – fertility edition

Under the heading “Japan’s birth rate problem is way worse than anyone imagined“, Ana Swanson at The Washington Post’s Wonkblog shows the following chart:

Japan fertility rate

So, the birth rate problem is worse than forecast in 1976, 1986, 1992 and 1997. However, the birth rate is higher than was forecast in 2002 and 2006 – so has surprised on the upside. It’s only “worse than anyone imagined” if you’ve had your head in the sand for the last 10 or so years. As Noah Smith asks, didn’t any of the people tweeting the graph (it appeared at least half a dozen times in my feed) look at it?

That said, the chart demonstrates the lack of robust conceptual models that might be used to forecast fertility. As another example, the below figure comes from Lee and Tuljapurkar’s Population Forecasting for Fiscal Planning: Issues and Innovations and shows US Census Bureau forecasts through to 1996. As for the Japan forecasts, the tendency is to assume a slight reversion toward replacement fertility followed by constant fertility.

US Census forecasts

The Bureau of the Census produced high and low estimates (as in the figure below), but these don’t make the forecasting look any better. For many forecasts, the fertility rate was outside the range within 3 years. In 1972, fertility fell outside the range before the forecast was even published.

US Census High-Low forecasts

Over the last ten years, fertility “surprises” on the upside are typical in developed countries. Japan is not an outlier. Below are three consecutive projections from the Australian Government’s Intergenerational Report. IGR1 was published in 2002, IGR2 in 2007 and IGR 2010 in 2010 (obviously). As you can see, they’ve been chasing an upward trend in fertility. The fertility problem is less severe than once thought. Long-term fertility is assumed to be 1.60 in the 2002 forecast, but 1.90 in 2010. A new IGR is due out this year, so it will be interesting to see where that forecast goes.

IGR_2007 fertility chartIGR_2010 fertility chart

As for building better conceptual models of fertility, I don’t envy anyone attempting that task. But as I argue in a working paper, evolutionary dynamics will tend to drive fertility rates up from recent lows. Is that part of the story behind what we are seeing in Japan and elsewhere?

Bad statistics – cancer edition

Are two-thirds of cancer due to bad luck as many recent headlines have stated? Well, we don’t really know.

The paper that triggered these headlines found that two-thirds of the variation in log of cancer risk can be explained by the number of cell divisions. More cell divisions – more opportunity for “bad luck”.

But, as pointed out by Bob O’Hara and GrrlScientist, an explanation for variation in incidence is not an explanation for the absolute numbers. As they state, although all variation in the depth of the Marianas trench might be due to tides, tides are not the explanation for the depth of the trench itself. There might be some underlying factor X affecting all cancers.

My reason for drawing attention to this misinterpretation is that a similar confusion occurs in discussions of heritability. Heritability is the proportion of variation in phenotype – an organism’s observable traits – due to genes. If heritability of height is 0.8, 80 per cent of variation in height is due to genetic variation. But your height is not “80 per cent due to genes”.

To make this distinction clear, consider the number of fingers on your hand. Heritability of the number of fingers on your hand is close to zero. This is because most variation is due to accidents where people lose a finger or two. But does this mean that the number of fingers on your hand is almost entirely due to environmental factors? No, it’s almost entirely genetic – those five fingers are an expression of your genes. There is an underlying factor X – our genes – that are responsible for the major pattern.

Turning back to the cancer paper, as PZ Myer points out, there may be no underlying factor X affecting cancer and the two-thirds figure could be correct. Extrapolating one chart hints that might be the case. But that’s not what the paper states.

As an endnote, a recent study pointed out that most errors in scientific reporting start in the research centre press release. This case looks like no exception. From the initial John Hopkins press release (underlining mine):

Scientists from the Johns Hopkins Kimmel Cancer Center have created a statistical model that measures the proportion of cancer incidence, across many tissue types, caused mainly by random mutations that occur when stem cells divide. By their measure, two-thirds of adult cancer incidence across tissues can be explained primarily by “bad luck,” when these random mutations occur in genes that can drive cancer growth, while the remaining third are due to environmental factors and inherited genes.

And from their updated press release:

Scientists from the Johns Hopkins Kimmel Cancer Center have created a statistical model that measures the proportion of cancer risk, across many tissue types, caused mainly by random mutations that occur when stem cells divide. By their measure, two-thirds of the variation in adult cancer risk across tissues can be explained primarily by “bad luck,” when these random mutations occur in genes that can drive cancer growth, while the remaining third are due to environmental factors and inherited genes.

Good on them for updating, but it would have been nice if they had clarified why their first release was problematic.

A week of links

Links this week:

  1. Arnold Kling’s review of Complexity and the Art of Public Policy.
  2. Are some diets mass murder? HT: Eric Crampton
  3. Social conservatism correlates with lower cognitive ability test scores, but economic conservatism correlates with higher scores.”
  4. More on lead and crime.
  5. A risk averse culture. HT: Eric Crampton
  6. Welfare conditional on birth control.
  7. We may regret the eclipse of a world where 6,000 different languages were spoken as opposed to just 600, but there is a silver lining in the fact that ever more people will be able to communicate in one language that they use alongside their native one.” HT: Steve Stewart Williams
  8. If you want to feel older, read this. HT: Rory Sutherland