An email from Philip Tetlock

by on May 9, 2012 at 3:42 am in Uncategorized | Permalink

Dear Tyler,

I wanted to thank you for encouraging your readers last year to volunteer for the Intelligence Advanced Research Projects Agency (IARPA) forecasting tournament. The Year 1 results are in–and they contained more than a few surprises. Most surprising was how well our forecasters performed. They collectively blew the lid off the performance expectations that IARPA had for the first year. Their original hope was that in Year 1 the best forecasting submissions might be able to outperform the unweighted average forecasts of the control group by 20%. When we created weighted-averaging algorithms that gave more weight to our most insightful and engaged forecasters, these algorithms beat that baseline by roughly 60% (exceeding IARPA’s expectations for Year 4).

Our forecasters did so well that some thoughtful observers now doubt it is possible to do much better — which is why we have taken the unusual step of skimming the best forecasters from our year 1 experimental conditions to create teams of “super forecasters.” These teams will be functioning more as research collaborators than as research participants (they will have access to our algorithms but the discretion to override the algorithms with their own judgment). In my view, these “super forecasters” are distinguished by three characteristics: (1) an intense curiosity about the workings of the political-economic world; (2) an intense curiosity about the workings of the human mind; (3) cognitive crunching power (“fluid intelligence” and a capacity for “timely self correction”).

Of course, the decision to skim off our best forecasters into elite teams — coupled with the inevitable attrition rate in a time-consuming exercise of this sort — means that we are in the market for new forecasters, ideally potential future “super-forecasters.” So we are launching a new recruiting drive.

My hope is that you might mention this project again to your readers and encourage them to visit the registration page at www.goodjudgmentproject.com/register.

If you would like more details about our project, please just ask (I didn’t want to inundate you with details and the Year 1 results are, of course, preliminary (distinguishing skill from luck is a perpetual challenge)).

Rahul May 9, 2012 at 4:02 am

Is it possible to benchmark this against predictions on something like InTrade? Are they doing better or worse?

A random control group is probably a low target.

gwern May 9, 2012 at 11:08 am

Not Intrade, since they declined to implement all the IARPA predictions and overlap only somewhat. The Inklings prediction markets apparently did copy the predictions, although I don’t know how liquid or accurate they might be…

Tom (UK) May 9, 2012 at 4:34 am

I can’t find details on whether you need to be a US citizen to take part in this – I would’ve thought so due to the funder. Does anyone know definitively?

Wilte May 9, 2012 at 4:52 am

You don’t need to be a US citizen. I’m Dutch & living in the Netherlands and I participated in year 1 (no super forecaster though)

Tom (UK) May 9, 2012 at 5:03 am

Thanks, although I suppose it’s not a promising sign that I incorrectly forecasted a different answer to this question…

Rahul May 9, 2012 at 7:56 am

You aren’t alone. I suspected it was US only too.

Andrew F May 9, 2012 at 3:05 pm

I participated, too. I was in the top 15% of my group, but I haven’t been told anything about ‘super-forecaster’ status, so I assume I’m also out. To be fair, I don’t think I dedicated enough time to getting up to speed on all the obscure African and S. American elections (they felt like throwing darts at a board anyway).

DK May 9, 2012 at 10:05 pm

My understanding is that there were 12 groups (~ 240 participants in each) and that the “super-forecasters” are top 5 from each group, divided into 5 groups. I got in (4 out of 234) and was told that the new group is going to be 12 people. I felt that the key to being good at this stuff was simply to keep closely up to date with the current events. Which at time felt weird (why the F do I care about the fate of the deposed president of Maldives?) and at times enlightening (wow, I had no idea how rotten to the core Pakistani politics is).

Wil W May 9, 2012 at 6:18 am

I participated this past year and will again. I think I was quite close to the bottom of the pack, but in a ranked system someone has to be there.

Anyone thinking about should try for it. It makes you think (always a good thing), and the questions make you more aware of the rest of the world.

Rahul May 9, 2012 at 7:58 am

Any idea what their selection strategy is?

Andrew F May 9, 2012 at 3:07 pm

Just to participate, they had a questionnaire about your educational background, interest in world affairs, level of current knowledge, etc. I’m not exactly a geopolitics wonk and I was selected, so I don’t think they are expecting that level of elite forecaster.

James May 9, 2012 at 8:08 am

There is a certain irony to the comment “They collectively blew the lid off the performance expectations that IARPA had for the first year.”

Tim May 9, 2012 at 8:11 am

Why would someone do this instead of inTrade?

Woj May 9, 2012 at 9:00 am

For participating you get paid $150, so there is no risk of losing money (inTrade). Also it helps some professors with their research, similar to undergrads participating in on-campus experiments. I participated in year 1 and finished in the top 20 of my group. It was an interesting challenge that I’m more than happy to devote some time to again for the 2nd year.

bluto May 9, 2012 at 10:23 am

I was one of the forecasters. I can’t say I cared all that much about the cash, I did it mostly to see where I’d end up, and because I thought it was a worthy project. One nice thing about this vs intrade, is mistakes aren’t as costly as they would be on intrade (the most punishing questions were things like a question that had a strict definition of hostilities in the south china sea, which would have likely cost an intrade participant most of their gains but didn’t really affect the ratings because everyone made the same costly mistake).

wiki May 9, 2012 at 11:17 am

How different is this from those investing games where you can’t lose money for wrong bets so part of the strategy is making high variance bets on highly risky industries? Is this really about making “good” forecasts?

Bernard Guerrero May 9, 2012 at 12:02 pm

That South China Sea question nearly killed me. Recovery was a long march.

DK May 9, 2012 at 10:09 pm

That was a classic Black Swan. Everyone correctly predicted the normal course of events and then…

enrique May 10, 2012 at 9:26 pm

the fact that no money is on the line makes the whole experiment worthless cheap talk — a glorified survey, like “happiness” studies

Jens Fiederer May 18, 2012 at 4:40 pm

Same question got quite a few people, including me, I suspect.

#21 out of 236, so not a bad result — but no super-forecaster either (was in the top 20 for a while), which is a good thing since I was pretty clueless about a vast majority of the question, and mostly bet on “more of the same”.

Law Of Averages May 9, 2012 at 8:52 am

I will be interested to see how many of the super forecasters regress to the mean. Perhaps it would be an idea to include some of the average forecasters in the new privileged group and see how well they compare in round 2. After all, the race is not always to the swift, and round 1 selected the lucky as well as the able.

MD2 May 9, 2012 at 9:04 am

Agreed, I would bet telling the best performers that they are “super-forecasters” is just to watch how their predictions are affected by boosted confidence, like a stock trader on a hot streak who has convinced himself he’s seeing real, explainable patterns in the market.

clayton May 9, 2012 at 10:00 am

I think you are overestimating the effect. This is not a job it is a hobby. I would compare this more to golf or bowling than the stock market. Some people are very interested in it and put a lot of time into it and some people are only slightly interested and don’t spend as much time. So perhaps the “super forecasters” are those who more or less self select into that status.

Finally, you get paid $150 regardless of how well you perform. There is no competition to do better other than self motivation. Because of the flat “salary”, it might make some people more likely to not spend the necessary time to improve their ability and monitoring.

dead serious May 9, 2012 at 11:26 pm

And yet there are anonymous people here more or less bragging to a likewise anonymous group of strangers that they fall into the select set of super-forecasters.

People can be competitive over even the most pointless things. I think wiki makes an excellent point.

clayton May 10, 2012 at 1:30 am

Yikes, reading comprehension is not one of your strong suits

Andrew Edwards May 9, 2012 at 9:04 am

This.

Remains to be seen whether they selected “super forecasters” or “the luckiest forecasters”.

Also I would be very interested to see whether access to the algorithms improves or harms their forecasting capabilities.

DK May 9, 2012 at 10:16 pm

Since I did well for no obvious reason, I pondered the issue of luck and it does not seem to be good explanation: 240 people, 72 questions and yet, the Top20 remained relatively stable throughout, with movement up and down typically restricted to 5. It is of course possible that the top 10%-20% was simply a group that put in more effort while the finer rankings within that group reflects merely dumb luck.

Rahul May 9, 2012 at 10:20 am

Maybe it is useful to think of it as a regression towards the mean, not all the way to. Successive inclusion of super-performers is likely to minimize chance effects after a couple of “generations”.

Dangerman May 9, 2012 at 10:28 am

Is it more fun or less fun than DAGGRE?

Jonah May 9, 2012 at 10:49 am

I participated last year and have signed up to do it again this year. It’s a lot of work to keep up with all the news, but I learned a ton and am glad I did it.

Mark Thorson May 9, 2012 at 10:52 am

Would psychic ability be considered skill or luck?

Andrew' May 9, 2012 at 10:56 am

(1) an intense curiosity about the workings of the political-economic world;

Check

(2) an intense curiosity about the workings of the human mind;

Double check!

(3) cognitive crunching power (“fluid intelligence” and a capacity for “timely self correction”).

Meh.

dead serious May 9, 2012 at 11:30 pm

But you do have a wealth of spare cycles to post comments here that you can divert to crowdsourcing instead, so there’s that.

Rashad May 9, 2012 at 11:02 am

As one of the top 20 finishers (in my group?), I think a lot of the prediction success came down to betting strategy rather than being right. Because there were tons of available questions, it made sense to be somewhat picky and spend large amounts on questions you were fairly certain about. Also, because many bettors weren’t very active, there were usually several questions that although basically decided, still had something like a 20% chance of not happening, so betting big on these questions late only got you a bit of money, but was a quick way to utilize extra cash lying around. Since there were no transaction costs, this was perfectly viable. Also, if you were on your game, it was literally impossible to ever lose any money, because you could pull out of questions at any time. Seems like an intrade system where you would have to basically bet against your previous bets to get money back might make more sense?

Forecaster May 9, 2012 at 11:20 am

I was a member of the forecasting team during the last year. I was disappointed and will not be coming back.

The reason is that yes, this is an experiment, but *you* are the white mouse, not someone in a lab coat :-) The questions required you to forecast mostly political events and to make a decent forecast you had to absorb a rather large amount of information about the circumstances of the event you are forecasting. Now, I’m an information junkie but even I couldn’t be bothered to spend hours researching the political situation in some small and far-off country. It’s just not all that interesting given that my motivation is a rather small carrot in the form of a rank in a table. Given that I’m neither retired nor a student my time isn’t unlimited either.

I ended up posting basically 15-second guesses based on general erudition and my general feeling of what was likely and what wasn’t. Evidently it was enough to keep me in the top 10 for a while before I got bored of this rather pointless activity and stopped forecasting altogether.

Basically there wasn’t anything interesting enough in the process to make it more than a waste of time. If I wanted to play with political and economic forecasting, there are *much* more engrossing and exciting questions that I could concern myself with.

JThomas May 9, 2012 at 12:10 pm

You realize that you didn’t have to answer every single question right? The point was to add to the discussion and make a prediction where you could add insight beyond the uninformed masses. There were over 90 questions…surely some must have appealed to you. You could have spent much time predicting those ones and let your group take over the questions you didn’t know much about.

Forecaster May 9, 2012 at 12:28 pm

My point is motivation. There is zero incentive to be selective in answering questions since that would drive your rank down and your rank is the *only* motivation you have. Adding insight beyond the “uninformed masses” is a pretty low threshold, too :-D

Not to teach the UPenn professors their business :-) but I expect this study to have severe problems with selection bias.

hamilton May 9, 2012 at 11:30 am

I’m with Forecaster. It was interesting enough being a subject, but there were far more questions than I could ever spend my time thinking about for more than a few seconds. Also, within my group, much of the discussion was based on the first five things you could find by Googling the prediction question. I dropped out after the first round of predictions.

Anybody know how Robin Hanson’s group did?

Rahul May 9, 2012 at 12:44 pm

I wish there were something similar for bloggers. Imagine a central clearinghouse where, say, Tyler or Krugman took up concrete positions on issues. Something on the 1-3 year horizon, preferably quantitative and falsifiable one way or another.

Say, stagnation or austerity or recessions. Pick some metric and let’s see how well your prediction did in 3 years.

There’s too much of vague talk on blogs and might be fun to see who gets to eat most crow. Instead of InTrade cash, let bloggers put their reputations on stake?

Neil B May 9, 2012 at 3:21 pm

You are on to something there. Very interesting.

gwern May 9, 2012 at 6:23 pm

Why would any of them agree to it? Especially when the current system works so well for them…

DK May 9, 2012 at 10:22 pm

Pundits putting their money/reputation where their mouths are? When the Hell freezes over! Tyler is much more comfortable putting up all those vague “Greece is going down the drain – some day, in some form” posts.

Rahul May 10, 2012 at 12:57 pm

Not everyone wants to be vague. My vote is for Steve Sailor signing on as the first forecaster. Interfluidity, Levitt and Robin Hanson seem less vague too.

CPV May 11, 2012 at 4:12 pm

The most popular bloggers will be best served by being vague. Up and comers are best served by being specific, then touting a track record if it materializes. It’s not too different from the strategy of Wall Street analysts.

zmil May 9, 2012 at 12:45 pm

Tried it, will not be participating again- 1) Not enough questions that I was interested in delving into deeply. 2) Kept forgetting about it, missing deadlines, and thus losing bets that I didn’t need to lose. 3) Doesn’t mix well with 1st year grad school…just not enough time to devote serious mental resources to it.

I’m glad it seems to be a success, though, and I look forward to seeing if it holds up in a second year.

Seth May 9, 2012 at 2:00 pm

My prediction: The forecasts will get worse. I’d bet on that.

DK May 9, 2012 at 11:07 pm

Can you be more specific? They are a lot of ways your prediction can be interpreted. Depending on exactly you meant, I might be very interested to take up your bet.

Seth May 10, 2012 at 12:23 pm

Present some terms and I will let you know if I’ll take that bet. If I don’t, I’ll let you know why.

DK May 10, 2012 at 12:31 pm

So as not to make it complicated and involve lawyers, simply a gentlemen’s agreement for just enough to make it interesting, around $200. But you haven’t yet indicated what your exact prediction is, so I don’t know if I will bet against it.

Seth May 11, 2012 at 1:10 pm

$200 is too rich for me. I do realize that a) I could be wrong, b) the super forecasters could get lucky, c) Tetlock could change the weighting-algorithm without telling us to rig his results and/or d) the control group contains some crazy guesses (and may even be selected, unknowingly, to contain crazy guesses).

To my best understanding, I think they are going to compare the algorithmically-weighted super forecasters to an un-weighted average of a new batch of forecasters for the control group.

I’d bet $20 that using the same weighting algorithm that the super-forecasters do not outperform the control group by 60% or more. If they outperform by less than 60% or do not outperform, I win. If they outperform by 60% or more, you win. If we find out the algorithm changed, either of us can use that to call off the bet.

Seth May 11, 2012 at 1:34 pm

I would also like to see them pick things to forecasted where Intrade could more easily be used as the control group. I think the success of the super forecasters could simply be that the control group is bad.

DK May 11, 2012 at 11:46 pm

Seth,

The bet is accepted on your conditions. I doubt it will ever be resolved properly but in case if it will, I am leaving a message on your blog with my real email. I have no idea what was the IARPA control group but I do suspect that it was bad. One thing I know is that there had to be more than just luck involved: top ~ 10% in the group formed after the first batch of ~ 20 questions has not changed appreciably after the subsequent 50+ questions.

TGGP May 9, 2012 at 4:57 pm

I signed up for Good Judgement last year because it was the first I heard about. It was much later that I learned of DAGGRE, which I really wanted to join, but had already signed up. When I heard there was going to be a new season, rather than renewing with Tetlock I tried to sign up with GMU. But I can’t sign in, even after generating a new password for DAGGRE. I’ve emailed them, but gotten no response. So if somebody from there is reading this, try to fix that. Or continue losing to Tetlock.

johnnyo May 9, 2012 at 7:44 pm

I participated in season 1. Finished 2nd out of 200+ in my group, upgraded to “super-forecaster.”

My strategy:
1) answer all questions, even those you know nothing about (score was weighted by number of questions answered, I believe).
2) do at least basic research (a few minute’s googling usually gets you to average-information status).
3) update frequently (especially poisson-process type questions with a time horizon… e.g. will event X occur between t=1 and t=2, if your initial estimate is p at t=1 you should cut to p=0.5 at t=1.5).

All this does take a good bit of time of time; I checked in every other day or so.

I found researching those questions about which I really knew nothing (e.g. why France would consider recalling its ambassador to Turkey, whether the Yemeni government will re-gain control of 2 towns from AQAP) to be the most fun.

I may have been lucky, more dedicated than the average, or actually good.

On to season 2!

DK May 9, 2012 at 10:28 pm

(score was weighted by number of questions answered, I believe)

I don’t think so. They quite explicitly told us: if you did not answer a question, it’s OK but you will be assigned your group’s average when it is resolved. So if one only answered 15 (minimum) out of 72 while others answer most, chances are very high that one can only do so-so with regard to the intra-group rankings.

Steve Sailer May 9, 2012 at 8:27 pm

Prof. Tetlock says:

“The Year 1 results are in”

Then, where are they? Are they posted somewhere on the WWW? The link in the post only goes to a recruitment site.

Steve Sailer May 9, 2012 at 8:30 pm

“Intelligence Advanced Research Projects Agency”

This is some kind of Joss Whedon spin-off promotion for Avengers 2, right? Like Strategic Homeland Intervention, Enforcement and Logistics Division.

Whedon is such a card…

Steve Sailer May 9, 2012 at 8:32 pm

If you really are a Super Forecaster, why would you bother with a penny ante contest like this rather than take your talents to the hedge fund industry? And, if this is a real test of an important talent, then why doesn’t the Efficient Markets Theory come into play?

DK May 9, 2012 at 10:43 pm

If you really are a Super Forecaster, why would you bother with a penny ante contest like this rather than take your talents to the hedge fund industry?

A good point here in that that there does not seem to be a baseline against which the forecasting “success” is measured. Would “real” prediction markets perform better? Or a bunch of NYT/FT/WSJ columnists? I would really like to find out if I am a “super forecaster”, for it seems quite certain that no one around me thinks so :-)

Rahul May 10, 2012 at 1:06 pm

Are all types of predictions monetizable? Isn’t that where the “genius” of financial engineering steps in? There’s also the difficulty of finding a counter-party to your prediction.

Steve Sailer May 10, 2012 at 12:28 am

If you recruited 1,000 forecasters, it wouldn’t be too surprising if 50 of them turned out to be SuperForecasters at the 5% level of statistical significance and 10 of them turned out to be SuperDuperForecasters at the 1% level of statistical significance.

Just sayin’ …

Mark Thorson May 10, 2012 at 10:37 am

I say take all these superforecasters to Gitmo. Find out what’s their secret. Dissect their brains, if necessary.

johnnyo May 10, 2012 at 1:46 pm

You’re right. Checked the scoring page.

Answering more questions and being right gives you a higher score.

Guess I was semi-intentionally pursuing an “all or nothing” strategy. Came up “all”.

Comments on this entry are closed.

Previous post:

Next post: