Philip Tetlock on messing with the algorithm
From an 80,000 hours podcast episode:
Robert Wiblin: Are you a super forecaster yourself?
Philip Tetlock: No. I could tell you a story about that. I actually thought I could be, I would be. So in the second year of the forecasting tournament, by which time I should’ve known enough to know this was a bad idea. I decided I would enter into the forecasting competition and make my own forecasts. If I had simply done what the research literature tells me would’ve been the right thing and looked at the best algorithm that distills the most recent forecast or the best forecast and then extremises as a function of the diversity of the views within, if I had simply followed that, I would’ve been the second best forecaster out of all the super forecasters. I would have been like a super, super forecaster.
However, I insisted … What I did is I struck a kind of compromise. I didn’t have as much time as I needed to research all the questions, so I deferred to the algorithms with moderate frequency. I often tweaked them. I often said they’re not right about that, I’m going tweak this here, I’m going to tweak this here. The net effect of all my tweaking effort, which was to move me from being in second place which I would’ve been if I’d mindlessly adopted the algorithmic prediction, to about 35th place. So that was … I fell 33 positions thanks to the cognitive effort I devoted there.
Tetlock was tweaking an algorithm that is built on human inputs (forecasts), so this isn’t a lesson that we can leave decision-making to an algorithm. The humans are integral to the process. But it is yet another story of humans taking algorithmic outputs and making them worse.
The question of where we should simply hand over forecasting decisions to algorithms is being explored in a new IARPA tournament involving human, machine, and human-machine hybrid forecasters. It will create some interesting data on the boundaries of where each performs best - although the algorithm described by Tetlock above and used by the Good Judgment team suggests that even a largely human system will likely need statistical combination of forecasts to succeed.
Robert Wiblin: [F]irst, you have a new crowdsourcing tournament going on now, don’t you, called Hybrid Mind?
Philip Tetlock: Well, I wouldn’t claim that it belongs to me. It belongs to IARPA, the Intelligence Advanced Research Projects Activity, which is the same operation and US intelligence community that ran the earlier forecasting tournament. The new one is called Hybrid Forecasting Competition, and it, I think, represents a very important new development in forecasting technology. It pits humans against machines against human-machine hybrids, and they’re looking actively for human volunteers.
So hybridforecasting.com is the place to go if you want to volunteer.
…
Well, there are a lot of unknowns. It may seem obvious that machines will have an advantage when you’re dealing with complex quantitative problems. It would be very hard for humans to do better than machines when you’re trying to forecast, say, patterns of economic growth in OECD countries where you have very rich, pre-quantified time series, cross-sectional data sets, correlation matrices, lots of macro models. It’s hard to imagine people doing much better than that, but it’s not impossible because the models often over fit.
So far, as the better forecasters are aware of turbulence on the horizon and appropriately adjust their forecasts, they could even have an advantage on turf where we might assume machines would be able to do better.
So there’s a domain, I think, of questions where there’s kind of a presumption among many people observe these things that the machines have an advantage. Then there are questions where people sort of scratch their heads and say how could the machines possibly do questions like this? Here, they have in mind the sorts of questions that were posed, many of the questions that were posed anyway, on the earlier IARPA forecasting tournament, the one that lead to the discovery of super forecasters.
These are really hard questions about how long is the Syrian civil war going to last in 2012? Is the war going to last another six months or another 12 months? When the Swiss and French medical authorities do an autopsy on Yasser Arafat, will they discover polonium? It’s hard to imagine machines getting a lot of traction on many of these quite idiosyncratic context-specific questions where it’s very difficult to conjure any kind of meaningful statistical model.
Although, when I say it’s hard to construct those things, it doesn’t mean it’s impossible.
Finally, Robert Wiblin is a great interviewer. I recommend subscribing to the 80,000 hours podcast.