Allegedly Unique Events

One common response to yesterday’s post, What is the Probability of a Nuclear War?, was to claim that probability cannot be assigned to “unique” events. That’s an odd response. Do such respondents really believe that the probability of a nuclear war was not higher during the Cuban Missile Crisis than immediately afterwards when a hotline was established and the Partial Nuclear Test Ban Treaty signed?

Claiming that probability cannot be assigned to unique events seems more like an excuse to ignore best estimates than a credible epistemic position. Moreover, the claim that probability cannot be assigned to “unique” events is testable, as Phillip Tetlock points out in an excellent 80,000 Hours Podcast with Robert Wiblin.

I mean, you take that objection, which you hear repeatedly from extremely smart people that these events are unique and you can’t put probabilities on them, you take that objection and you say, “Okay, let’s take all the events that the smart people say are unique and let’s put them in a set and let’s call that set allegedly unique events. Now let’s see if people can make forecasts within that set of allegedly unique events and if they can, if they can make meaningful probability judgments of these allegedly unique events, maybe the allegedly unique events aren’t so unique after all, maybe there is some recurrence component.” And that is indeed the finding that when you take the set of allegedly unique events, hundreds of allegedly unique events, you find that the best forecasters make pretty well calibrated forecasts fairly reliably over time and don’t regress too much toward the mean.

In other words, since an allegedly unique event either happens or it doesn’t it is difficult to claim that any probability estimate was better than another but when we look at many forecasts each of an allegedly unique event what you find is that some people get more of them right than others. Moreover, the individuals who get more events right approach these questions using a set of techniques and tools that can be replicated and used to improve other forecasters. Here’s a summary from Mellers, Tetlock, Baker, Friedman and Zeckhauser:

In recent years, IARPA (the Intelligence Advanced Research Project Activity), the research wing of the U.S. Intelligence Community, has attempted to learn how to better predict the likelihoods of unique events. From 2011 to 2015, IARPA sponsored a project called ACE, comprising four massive geopolitical forecasting tournaments conducted over the span of four years. The goal of ACE was to discover the best possible ways of eliciting beliefs from crowds and optimally aggregating them. Questions ranged from pandemics and global leadership changes to international negotiations and economic shifts. An example question ,released on September 9, 2011, asked, “Who will be inaugurated as President of Russia in 2012?”…The Good Judgment Project studied over a million forecasts provided by thousands of volunteers who attached numerical probabilities to such events (Mellers, Ungar, Baron, Ramos, Gurcay, et al., 2014; Tetlock, Mellers, Rohrbaugh, & Chen, 2014).

In the ACE tournaments, IARPA defined predictive success using a metric called the Brier scoring rule (the squared deviation between forecasts and outcomes,where outcomes are 0 and 1 for the non-occurrence and occurrence of events, respectively; Brier, 1950). Consider the question, “Will Bashar al-Assad be ousted from Syria’s presidency by the end of 2016?” Outcomes were binary; Assad either stays or he is ousted. Suppose a forecaster predicts that Assad has a 60% chance of staying and a 40% chance of being ousted. If, at the end of 2016, Assad remains in power, the participant’s Brier score would be [(1-.60)^2 + (0-.40)^2] = 0.16. If Assad is ousted, the forecaster’s score is [(0 -.60)^2 + (1 -.40)^2] = 0.36. With Brier scores, lower values are better, and zero is a perfect score.

…The Good Judgment Project won the ACE tournaments by a wide margin each year by being faster than the competition at finding ways to push probabilities toward 0 for things that did not happen and toward 1 for things that did happen. Five drivers of accuracy accounted for Good Judgment’s success.They were identifying, training, teaming, and tracking good forecasters, as well as optimally aggregating predictions. (Mellers, et al., 2014; Mellers, Mellers, Stone, Atanasov, Rohrbaugh, Metz, et al., 2015a; Mellers, Stone, Murray, Minster, Rohrbaugh, et al., 2015b).

Comments

This is peak AT.

This is one of my top-five favorite Tabarrok posts of all time. Also, here is another example of the use of Brier scores to measure one's forecasting accuracy when predicting a unique event: https://priorprobability.com/2018/12/31/forecasting-the-forecasts/

It begs the question. The probability of a nuclear war is 100%. We don't know when, but we do know why. When the dominant superpower appears to have made themselves vulnerable to a nuclear war then it will happen. Simple as that. It is a kind of nuclear king of the mountain.

if something can happen it will happen...that is the lesson of history...what we don't know is when...

Classicist: Probability of event = 1 or 0 (depending on if it occurs or not)
Frequentist: Probability of event = p, defined in a long run, repeated expectation sense.

The frequentists have won the day.

That's a ridiculous argument. It presupposes determinism. An event of nuclear war consists of thousands of decisions made from free will. While pushing the red button is a 1, 0 choice, this choice is not equally likely for a given set of facts and circumstances.

You're correct that the issue is one of definitions. I want one that is useful, and while yours have some uses, they happen to be unhelpful here. Your traditional frequentist approach is to define probability as long run rate of occurrence of that event in a reference class of events. The problem is that a different approach, promoted by Tetlock, which defines something that is different than what the traditionalists call probability, yields something useful - not valid in their traditional frequentist frame, but useful nonetheless.

The question is whether you'd like to dismiss an empirically validated approach as formally meaningless because you won't consider other definitions. The alternatives are to decide that what they are talking about is called something other than probability, perhaps "credence", and get into stupid definitional arguments, or to shut up and accept that there's something that is meaningfully equivalent to probability that can be assigned to new events.

Don't forget the Bayesians, which is where this question should reside.

Using the same word, 'probability, to express both outcomes of random trials, and degrees of belief, yields no end of confusion.

I dont think this conflation is as large as you appear to think. There is not much difference between the number of jelly beans in a particular jar and the average of how many jelly beans a large group of people believe are in the jar. Simple averages though may be a biased estimator if our sample of estimates includes the beliefs of people who have never seen the jar. But even in that case, beliefs about the size of jelly beans (standardized fact) and the size of jar (nonstandard but with a narrow distribution) can make even uninformed guesses useful.

People make decisions every day from subjective probabilities. The species has survived hundreds of thousands of years doing a fairly good job of it.

I'm fairly sure that a moderately educated person could identify the moments in history we were close to a nuclear war and similarly predict those moments in the future from unfolding evidence. Even in the "Red October" fictional scenario, international tensions between superpowers were sufficiently heightened to evoke a relatively high probability without knowing of the existence of a first strike weapon. Those same people would know that even in possession of a first strike weapon, the USSR would be very unlikely to use it.

The Bayesians already failed with their answers to the unique event: "What is the probability that Donald Trump will be President of the United States?", in both respects that you are using. I don't think we're ready to trust them with the question of the probability of nuclear war.

Really doubling down on the imaginary numbers and fake precision by picking one criticism and then burying people in the details of calculating the Brier score.

The continued presidency of a specific dictator is a unique event. The continued presidency of a dictator in general is not. You can lump in all the dictators' fates together and create a pool of unique events. This is not the same thing as nuclear war. What are the other similar unique events that can be lumped into this bucket?

Next, I will provide detailed derivation of the 3-component decomposition of the Brier scrore, followed by the formula for the Brier skill score and how the later is not a proper scoring rule. That should impress people who also bought the unique event lumping.

Feel free to call it unique - that seem irrelevant, though. Why does it matter if an event is unique if we have a method of assigning a probability that performs far better than chance at predicting the event? I'm happier to accept the fact that this assigned probability is different than what you learned to call probability in graduate school, i.e. long run frequency in a reference class, and move on towards doing useful things with this number. If you want, you even assign the non-probability number a name - say, credence, and then use the number exactly as Tetlock does to define Brier scores. Or you can argue about definitions.

I don't think you, Alex or Tetlock understand the limitations of the Brier score when applied to these types of situations. It's like the novice statistician who ignores the normal distribution requirements for a statistical test, and is just happy to get a p-value from the software.

https://www.sas.upenn.edu/~baron/journal/17/17614c/jdm17614c.html

Very interesting points, and thanks for the reference.

I had a much more fundamental issue, which is that Brier scores, (AND the decompositions of scores discussed in the paper) are not incentive compatible for tournaments. I laid out the argument a bit more on twitter here: https://twitter.com/davidmanheim/status/1080458380806893568

And regarding “unique” event predictions: “you find that the *best* forecasters make *pretty well* calibrated forecasts *fairly* reliably over time and don’t regress *too much* toward the mean.” How many hedges are allowed before a sentence descends into nonsense?

The Nassim Taleb approach would be to measure the probability of a rare event using an estimation of our gap of knowledge in said probability.

The last step to getting that Taleb magic is to make the weak case in the most obnoxious way possible. To get true authentic Taleb, you need a nice nasal voice with a Balki Bartakamous accent, and not many of us can sustain that for too long

My understanding is that most statisticians would disagree. Consider suicide, something that happens far more often nuclear war. However, it is still so rare that we cannot predict it better than a coin flip in any given individual. And yet here Alex is glibly saying things like nuclear war might occur next year at 1.12% probably! Look at all those significant digits!

It is, of course, quite false that we cannot predict suicide. Elderly, white males, living by themselves in rural areas, for example, have higher predicted suicide rates than people not in this category and these predictions can be quite useful for reasons of policy.

Measuring that an outcome occurs, on average, at a higher rate among a population is not the same as predicting the outcome of an individual event.

But I digress...we all care too much about nuclear war because, marginally, we cannot affect the outcome. Nor are we likely to be a casualty of such war. As a matter of fact, experts say there is a 73.3% chance that nuclear war contracts the labor supply curve more than demand for finished goods and that, all else equal, my real wage will increase as a result of any such confrontation.

He might have you there. A risk of nuclear war is a risk that somebody somewhere will launch one. Not a specific risk that a short redhead, on a Tuesday, will launch one.

Straw man nuked from orbit. Val talks about predicting individual suicides. You're talking about suicide *rates*, Alex. Predicted suicide rates can be compared to actual suicide rates, which can be measured regularly - X people committed suicide per 100,000 over a given period of time. The actual nuclear war rate has been 0.00% for decades. Is this what you will measure predicted nuclear war rates against?

Predicting suicide rates is telling you about individuals! Who do you think make up the rates?

Moreover, the more information that is available the more you can predict. Just looking at a person can improve predictions of suicide above base rates. People with tattoos, for example, are more likely to commit suicide. (One theory is that people who have tattoos have shown an ability to inflict pain upon themselves. But there are other theories.)

I'm wrong? No, I'm not wrong. What I meant was [bs].

Its a game many children play.

I think you're making an Ecological fallacy there Alex.

I agree that we can estimate the probability of rare events. I think you just chose a poor example of it.

I’m a psychiatrist at an academic center. Alex, you missed the point. We cannot predict whether an individual will commit suicide. Population rates DO NOT tell you about the *individual* sitting in front you. Despite knowing 200 demographic factors and a complete medical history...I am no better than a coin flip when evaluating an individual sitting in front of me for probability of suicide. The analogy of an individual rate vs population rate and how it compares to the described approach to predicting a nuclear attack is persuasive as to its lack of validity.

No, you missed the point if you think that there is no predictive value to evaluating the probability of suicide of the individual sitting in front of you based on the characteristics he shares with other people who have committed suicide. In fact, if that's what a psychiatrist at an academic thinks, that's kind of scary.

Val, we really must be talking about different things because what you are saying is utterly baffling. I am confident that you as a psychiatrist don't give everyone the same diagnosis or prescribe the same treatment. Surely, some people come to you and you think this person is in danger or suicide I need to prescribe a SSRI. I need to discuss guns in the house. This person needs more counseling. And some people you don't. So you have made a probabilistic judgment.

Also, you cannot seriously mean that you are no better than a coin flip when evaluating an individual for possible suicide--a coin flip is 50/50 and you don't think it's a 50% probability that every individual is going to commit suicide. So you must mean relative to some base rate. But where is the base rate coming from? Obviously you must adjust the base rate for gender, age, race. etc. Moreover, if the individual is in front of you you must know a lot about them. How about previous history of suicide? How about history of depression? How about risk taking behaviors?

All of this gives you a tremendous amount of information about suicide which every good psychiatrist uses to adjust treatment.

I think what you mean is that your judgement is subjective and uncertain which is certainly true but subjective, uncertain judgment can still be quantified with a probability.

I'm still boggled that so many people think the "risk of nuclear war" is the "risk of a specific thing."

Nuclear war (or to use a better specific phrase "nuclear exchange") has happened. It's a credit to our species that it hasn't happened since, but our risk is not about one event.

We don't only have to worry about the "drunk Kim" scenario and then try to calculate that. We don't only have to worry about the "irate Ayatollah" scenario.

We have to worry generally, about all nuclear holders, terrorists, or James Bond villains, making the jump.

This is much more like "suicides per 100,000" than "Joe Smith, 32 Elm Street, commits suicide."

That "nuclear war" can mean very different things is no critique. The probability of anything that can be labelled "nuclear war" is the sum of the probabilities that each of them happen minus the joint probabilities.

Your critique concerns the exactness of specification of the model, not whether it can be modelled.

I'm on board with you, just not this "that's the same as predicting outcomes for one patient" stuff.

For the reasons Tetlock provides, I think you can take a stab at a "category event" happening. IOW, I found Tetlock convincing.

I agree with you on the single patient question. Alex is making an Ecological fallacy.

I'd listen to what he says, because he is actually living in the real world. Statistics cannot capture the complexity of individuals. There are too many confounding factors. There is no perfect information.

In the risk of nuclear exchange, there is no "one individual" you have to model. There is an aggregate risk of nuclear players.

I think you guys have gone totally into the weeds on this.

People are grasping at poor analogies.

@Val You don't seem to understand what is being measured.

> Population rates DO NOT tell you about the *individual* sitting in front you.

Lets say I have a bag of M&Ms that I know is 90% red and 10% blue. I stick my hand in the bag and blindly pick one M&M.

Under your philosophy the population rate cannot effect the color of the M&M I took. The M&M is either red or blue, and the bag has no effect on what I am holding now that I have already removed one from the bag.

Of course that argument is pure obfuscation. The M&M is going to (most likely) be red 9 out of 10 trials.

You dont quite understand what is being measured. Knowing the aggregate batting average for a baseball team provides no information whatsoever about the batting average of any one player. And if you are selecting a pinch hitter with bases loaded, two outs, the tying run on third and winning run on second, the team's batting average is useless but the individual batting averages aren't.

Knowing the probability that a depressed white male, blue collar worker from Tucumcari will commit suicide tells you nothing about the probability that a particular white male blue collar worker from Tucumcari will commit suicide.

Indeed, being white might be informative to the extent that the traits of being white and committing suicide are correlated. Tucumcari might have a high rate of suicide for reasons that either are or arent identifiable in the person in front of you.

You're making the Ecological Fallacy. This is like looking at a Black person and believing they are more likely to be a criminal than a white person simply because Blacks have a much higher crime rate. While that is fallacious, you could certainly judge with better than a coin toss that that particular Black person is a criminal from his appearance, demeanor, place, time of day, etc. You can employ different objective functions such as Utility Maximization or Minimizing Maximum Regret. This is valid because it doesnt involve the ecological fallacy.

What exactly makes the difference here between those properties of an individual that are useful for drawing inferences about likelihoods involving them and those which are not?

The fact that one percent of people commit murder does not imply that there is a one percent chance I will become a murderer. Now, given enough information about me personally, you could come up with a pretty good guess.

A guess that is close to 1 or 0.

Do you actually think that mental illness and suicide is akin to a bag of candy with two colors? Are you serious?

> I’m a psychiatrist at an academic center. We cannot predict whether an individual will commit suicide.

Your knowledge is out of date.

https://journals.sagepub.com/doi/abs/10.1177/2167702617691560?journalCode=cpxa

Participants were 5,167 adult patients with a claim code for self-injury (i.e., ICD-9, E95x); expert review of records determined that 3,250 patients made a suicide attempt (i.e., cases), and 1,917 patients engaged in self-injury that was nonsuicidal, accidental, or nonverifiable (i.e., controls). We developed machine learning algorithms that accurately predicted future suicide attempts (AUC = 0.84, precision = 0.79, recall = 0.95, Brier score = 0.14). Moreover, accuracy improved from 720 days to 7 days before the (specific individual) suicide attempt, and predictor importance shifted across time. These findings represent a step toward accurate and scalable risk detection and provide insight into how suicide attempt risk shifts over time.

Table 2. Discriminative and Calibration Performance of Models by Time Period Before Suicide Attempts

Prediction window | AUC [95% CI] | Precision* | Recall* | Brier score
7 days | 0.84 [0.83, 0.85] | 0.79 | 0.95 | 0.14
14 days | 0.83 [0.82, 0.84] | 0.79 | 0.95 | 0.15
30 days | 0.82 [0.82, 0.83] | 0.78 | 0.95 | 0.15
60 days | 0.82 [0.81, 0.82] | 0.77 | 0.95 | 0.15
90 days | 0.81 [0.81, 0.82] | 0.77 | 0.95 | 0.15
180 days | 0.81 [0.80, 0.82] | 0.76 | 0.94 | 0.16
365 days | 0.83 [0.82, 0.84] | 0.75 | 0.96 | 0.15
720 days | 0.80 [0.80, 0.81] | 0.74 | 0.95 | 0.16

*Precision ~ positive predictive value = the ratio of true positives divided by the sum of true positives and false positives.
*Recall ~ sensitivity = the number of true positives divided by the sum of true positives and false negatives.

Thank you for the reply.

I am probably not communicating my point well, so let's try https://journals.ametsoc.org/doi/full/10.1175/2009MWR2945.1 Quote: "Also for very rare (or very frequent) events BS becomes inadequate as a skill score. In fact, assigning by default p = 0 to rare events whose actual occurrence frequency is, say, f = 10−3, we get BS = 2 × 10−3. If, maybe after great research efforts, we make the correct prediction p = 10−3, then BS = 2 × 10−3(1–10−3), a gain of only 0.1% in score. Thus, BS is very unfair for evaluating rare (or common) events forecasts."

Haven't read the whole thing, but a relevant assumption in that paper seems to be: "In fact two sequences of forecasts that have assigned exactly the same probabilities to a series of observed events cannot gain different scores on the basis of probabilities assigned to never occurred events. Common sense suggests that both deserve the same score until some event treated as different is observed."

The paper is about evaluating which forecaster is best and this seems to be about fairness? The idea seems to be that we shouldn't rate forecasters based on their predictions of things that haven't happened yet? We shouldn't decide one forecaster is better than another until they make different predictions about some event and it actually happens, so we find out who is right. At least, for a rating system that pretends to be objective. Since nuclear war hasn't happened yet, this seems kind of limiting.

But informally, we do discuss all sorts of things. There's no preventing someone assigning a number to a future event and others judging them for it. The question is whether it's useful and deserves to be considered "objective".

Yes, brier scores are an example of proper scoring rules, and there are better ones for intuitively matching our ideas about accuracy for extreme events - but there's a relationship between proper scoring rules, and they all correctly identify "better" forecasts within a single question.

I agree suicide is predictable within some set of constraints; but these predictions are based on a very large number of observations.

You can *assign* a probability to anything based on any of millions of rationales. But there's nothin' sayin' it's accurate, which is where the probability of nuclear war and the probability of suicide part ways.

Whether or not one can narrow in on the probability of an uncommon event probably depends on the number of combinations of possible causes of the event. Is the event rare because it occurs as an unusual extreme of a common cyclical phenomenon (e.g., large asteroid impact on earth)? Then you can probably start guessing the probability. Is the event rare because an highly extraordinary combination of circumstances is required to drive it, even though it's in a dynamic environment? Then you might have a more difficult time converging to a realistic probability.

The probability of an event is a non-negative value assigned by a value function of events under certain axioms which permit algebraic manipulation of probabilities.

Many of you are stuck in a frequentist mentality where you restrict probability to mean only a long run frequency... which inhibits consideration of unique events.

One can (as Bayesian have for centuries) conceive of probability as a measure of belief (based on prior information and new information). In this manner, probabilities can be derived from various pieces of information or senarios and manipulated mathematically to express the strength of one’s “belief” in numerical manner.

Sure it’s possible that one’s computation isn’t exact truth but rather expresses one’s beliefs consistently and comparably numerically using structured thinking and axioms that grounds ones views (their derives probabilities) mathematically.

One can quite easily use a Bayesian type approach to formulate a “belief” via a belief function that assigns numbers to statements such that the larger the number the higher the confidence. These “belief” functions allow one to combine evidence (information or data) from various sources in order to arrive at a degree of belief (represented by the belief function and expressed numerically) that takes into account all the available evidence.

This is a valid use of mathematics and probability. Frequentists balk at this due to their narrow view of probability as a long frequency rather than an degree of belief given the available information.

Alex is right in his approach (he might be wrong in his conclusion or his assigned probability), you frequentists are just to narrowly focused to understand this. That’s on you, not him.

The comment and situation in this case is unique, and it's impossible to define long-run probabilities of counterfactuals, but I would still assign a high subjective probability that the insults that you end the post with make your argument less likely to be accepted.

Fair point... but frequentists don’t seem to want understand probability as anything other than a long run fixed frequency. Could have avoided that dig but rigid frequentists frustrate me.

"it is still so rare that we cannot predict it better than a coin flip" Have there been statistics done on this? For example, are there actual people out there predicting ___ person is going to commit suicide?

Are statistics really saying there's no correlation whatsoever between people talking about suicide and actually doing it? Find this hard to believe.

What is the probability of the apocalypse? This year? Within 100 years? Within 1,000 years?

Define that better; what is "the apocalypse"? Coppola reported filming "Apocalypse Now 2"? Or news reports verified by three independent sources of a huge old bearded man floating above San Franciso, seen nuking everybody with supercharged lightning bolts and laughing sardonically?

This idea that we can assign meaningful probabilities to unique events strikes me as rather similar to the claims one's estimate is x% correct given it's within the x% confidence interval.

That said, I have serious doubts that we can say nuclear war is a unique event but perhaps that is all about definitions.

In a way, literally every prediction about the future can be defined as being a "unique" event if you define the class small enough. Betting on who wins the 2nd horse race this afternoon is a unique event (how many times will this afternoon's horse race feature the same horses, same weather, same track, same age, same jockeys, same strategies, etc.etc), but the betting markets are obviously pretty accurate for those events.

Of course people who do not understand probability or gambling are unlikely to understand this, but what does it matter? There are a lot of stupid/ignorant people who are stupid/ignorant about a lot of things.

Good point. Great example.

The only thing I question is your claim about accuracy. Is there a Brier Score or some other score for horse racing odds? I'd love to see this.

I'd guess that one could use such a score to test whether gambling events are rigged.

"I mean, you take that objection, which you hear repeatedly from extremely smart people that these events are unique and you can’t put probabilities on them, you take that objection and you say, “Okay, let’s take all the events that the smart people say are unique and let’s put them in a set and let’s call that set allegedly unique events. Now let’s see if people can make forecasts within that set of allegedly unique events and if they can, if they can make meaningful probability judgments of these allegedly unique events, maybe the allegedly unique events aren’t so unique after all, maybe there is some recurrence component.”

This sounds like bullshit to me. You can predict unique events because some events are called unique but they really arent? Arent you just playing tricks on the term "unique"?

All events are unique. Even your next coin flip will be, uniquely across the entire universe, the only coin flip to take place at that exact time and location. Yet I still feel confident that judging it's gonna be 50/50.

So you make accurate predictions about unique events by analogizing to other unique events that have some degree of similarity.

I'd also suggest Mr. Tabarrok is making an elision here. Just because we can say that nuclear war was MORE probable during the Cuban Missile Crisis than today does NOT mean we can say what its probability was at any given time. There is a difference between the REALTIVE probability of a thing and its ABSOLUTE probability. It is one thing to say that it is more likely to rain in the afternoon today in Florida than it is to rain in any given afternoon in October, because we know the propensities of the seasons, but that is not at all the same thing as saying "it is x% likely to rain at a given time." The two statements are completely different and do not substantiate the claim made here. Perhaps the problem is not just "uniqueness" (though I would also argue that is itself a problem, also what do we mean by nuclear war, an exchange of devices or a single bombing? The needs clarification), but also do we refer to an event's discreteness or its relation to other events, especially when dealing with social phenomena.

Likewise, just because someone (esp. the gov't) is quantifying something and conducting a mathematical analysis, doesn't mean they are measuring anything at all even if the mathematics are internally robust.

Jeffrey Bristol

Indeed, yet despite having decades of weather observations, radar and satellite data, computer models, etc, weather forecasters in Florida do not assign 0.000x levels of precision to their predictions, even if they see clouds forming on the horizon.

I agree with Tabarrok's point about "unique events" and I think Tetlock also addresses the "excessive precision" criticism in his book Superforecasting.

What I'm still struggling with is connecting the Superforecasting-style probability of nuclear war with Tabarrok's claim that "the risk of nuclear war remains the world’s No. 1 problem".

Since reading the original post, I keep trying to compare Nuclear War with Total War between leading industrial nation-states. I can't decide which is potentially the bigger problem and which is more likely.

Perhaps like nuclear reactor accidents, we are underestimating the likelihood by vastly overestimating the impact of nuclear war and the superforecasting-style probabilities need time to reflect the new reality.

Every sports event is unique, yet betting markets predict outcomes.

Not really. The same teams play multiple times. The same players are generally playing on the field. The same coaches drawing up the plans.

Yes, we can put a lot of meaningless details into the equation to them make things complete unique but it's not clear they matter.

The whole point is about there being any actual population of events that some probability function can be defined against or if we have a singular observation.

I would suggest we model the event of nuclear war more as uncertainty than risk. In which case we should not waste time suggesting probabilities and focus on activities that will reduce the level of uncertainty.

That approach addresses the concern without any pretext of knowledge (the distribution shape and set of observations/events that drive it) that really doesn't exist.

And when a black swan "breaks" the odds, the house refunds the bets.

It became known many years later, post USSR at a conference of retired military officers that a submarine captain had specific orders that under certain circumstances he was to launch, and during the Cuban missile crisis those circumstances transpired. Yet he didn't. So for a period of time the odds were extremely high, yet it didn't happen. I wonder if there were Soviet Naval officers who knew the orders, watching the situation develop, expecting the worst to happen, then surprise at it not happening.

So ultimately a prediction like this is meaningless. Much more useful to figure out why it hasn't happened.

I think this is a potentially useful idea, grouping AUEs, but someone should warn you, if Taleb finds out you were disagreeing with him, he'll start tweeting and name calling. You'll get the Steven Pinker Treatment and wish you were never born.

Did any of these IARPA wiseguys see Trump coming?

Who do you think put him there? The electorate? BWAHAHAHAHA

I think the point of calling something a unique event is not that you can't assign a probability. You can assign anything a probability. It's just a way of saying we have a limited basis on which to calculate such a number as anything more than a gut feeling and/or it has some "strange" features.

It's not hard to calculate apparently plausible probabilities of nuclear war that cumulate into something very frightening over time. Look at the way someone like Graham Allison got nuclear terrorism so totally wrong in exactly this way.

But if these probabilities are at all right, how are we still here? The truly laughable example being the doomsday clock forever perched "minutes" from catastrophe.

Probably the answer is that it's either very hard to calibrate these forecasts or there's some path dependence or something such that the probability was genuinely something as high as 1 or 2% in some year but each passing year lowers that.

The discussion should center around the question of how knowledge of the “probability” influences decisions. It is useful, then, to start with the frequentist interpretation of probability—the one that claims probability is a good forecast as to the frequency of some event in a large number of trials.

Batting probability (aka, batting average) is very helpful. A big league starter will have around 500 at bats a season and so the difference between a .200 hitter and a .300 hitter is about 50 hits over a season—the difference between an All Star and a minor leaguer.

For unique events, like nuclear war, probability is more complex. Tetlock claims that for unique events probability is still a useful number because it predicts what would happen if you aggregated a large number of distinct unique events. I’m not sure. Let’s imagine that a large number of unique events—nuclear war, pandemic, Trumps election, whatever…---happened to each have a probability of 2% (let’s assume independence too, just to simplify the story). Then a frequentist would say that if the probability is true, about 2% of these things will actually happen. But so what? We need to know how to prepare for each of these events.

When we think about probabilities of unique catastrophic events, it makes more sense to translate those probabilities into a survival rate—e.g., there is a 50% probability that the earth will not collide with a large asteroid sometime in the next 200,000 years. Nothing wrong thinking that way but it still doesn’t solve the problem of unique events. A frequentist interpretation of the asteroid survival rate predicts that if we had 1000 identical parallel universes in about 500 of these the earth would be unchanged over a period of 200,000 years. Again, so what? I want to know about my universe.

As I think someone else mentioned, a completely different meaning of probability is as a subjective statement about one’s certainty. If I say there is a 99% chance that the Blues will win the Cup again next year (or if I say there’s a 1% chance), I’m expressing confidence in my opinion.

>>If I say there is a 99% chance that the Blues will win the Cup again next year (or if I say there’s a 1% chance), I’m expressing confidence in my opinion.

You aren't expressing confidence unless you are willing to bet money on the outcome.

Perhaps the chance of anything happening is either 100 percent or zero percent -- we just don't know it yet.

At the dawn of the cable-TV era I used to work until midnight so I'd watch the Atlanta Braves replay at 1 AM or so, having carefully avoided hearing the score at work from other baseball fans.

One day I mentioned to a friend that it seemed a little weird to be watching with suspense something that had already been decided hours ago. He insightfully suggested it was "that eerie air of predestination."

My issue with the original post (and unaffected by this post) is that Dr. Tabarrok treats the annual probability as independent by year over 75 years. Frankly, I expect someone with a PhD to at least acknowledge the statistical shortcomings of the approach.

Life expectancy was also used incorrectly, but I'm putting that down to simplification for the sake of a quick discussion.

Maybe I should be more generous with the independence argument as well, but I see too many crappy papers based on similarly bad assumptions to let that one ride.

There is a common misconception that many people have about how to think about probabilities and what they mean. It goes something like this: "This one-time unique event that either happens or it doesn't. It's nonsense to say that it's 5%, or 40%, or any other number, none of those probabilities can be correct or sensical because you can't repeat everything and see how often it happens. The only right answer is what actually does happen."

The misconception is that there has to be a correct answer, that probability is always something objective and determined by the world. This misconception is fueled by the way probability is taught in school, because in school problems, there is always a "right" answer - the probability of a coin landing heads is 1/2, the probability of a die roll getting >= 5 is 1/3, and so on. It is also fueled by the way language people use to talk about probability - the "probability of X _is_ Y", "the chance of this happening _is_ Z", as if there were an objective answer.

But there is a second distinct way to use probabilities that is meaningful, even for single unrepeatable events. Probability can simply be a way to express the strength of **a particular person's expectation/belief/anticipation** of something, in terms of betting odds. Take a coin flipping in the air. Is the probability of heads 1/2? As far as you're concerned, that might be the best prediction you can make. You'd be a fool to take 2:1 odds against it, or to take 1:2 odds for it. So 50% is a reasonable probability for *you* to give. But what if someone else was taking a live video of the coin with a sophisticated physics model on a computer? Maybe they would give a probability like 80%, even if their model wasn't perfect, it might let them predict the coin much better than pure chance.

Different people can have different knowledge and beliefs and models, and therefore give different probabilities, and that's fine! The mistake is not assigning the number, it's to say that the probability "is" 50%, or that it "is" 80% as if it were something objective, or that one probability can be "right" and another "wrong". The number simply means what odds you'd give, *in your opinion*.

Maybe you have an aversion to betting, you wouldn't actually personally bet at any odds. No problem - then just imagine a big impersonal corporation whose goal was to maximize their long-term expected profits, and *who only knew what you knew and had the same beliefs as you did*, the probability is about what odds you think *they* would give, such that they would eagerly take any odds more favorable for it, and eagerly take odds against more favorable the other way.

So, if you see a bunch of different people giving probabilities like 1% or 2% or 0.5% for something, those numbers are definitely still meaningful, even for isolated single events. It's not that there is some "objective" probability of the event. Yes, ultimately it will happen or not and that's it. The meaning is simply that those are the betting odds that each of those different people think would be fair, in *their opinion* given what *that particular person currently knows and believes*. It is still up to you to decide overall how much you trust each person - you might consider their track record of making good predictions in other things, or the specific analysis they did to come up their opinion in this particular situation. You might consider how many different people independently came up with similar odds, and so on.

This is a very insightful and helpful comment. In my opinion!

A benefit (or detriment?) of aging is that it gives one the ability to look back over a long period of one's life and observe the unique events that made a turning point in one's life. The unique events in my life have come unexpectedly, but not improbably. Does one even assess the probability of unique events in one's life? Probably not.

In predicting the probability of rain tomorrow, meteorologists look at current measurable and expected atmospheric conditions and compare to previous similar conditions. A 70% chance of rain means that in 70% of similar conditions, rain occurred. So this is not an exercise in rolling cosmic dice.

The problem with estimating the probability of a nuclear war in this fashion is that we have never seen a nuclear war unfold. Our information set is incomplete. But this does not mean it cant be estimated.

A nuclear war would necessarily have some precursors, and these precursors likely HAVE occurred in the past. While "Nuclear War" has not happened, "Not Nuclear War" has happened many times. The last step in an estimate is the most precarious: opining on the conditions that would have tipped the scale toward nuclear war. We do have two events of using atomic weapons in war, so we are not completely uninformed. What led to the decision to drop those bombs?

The temperament of the decision makers is an important factor. Geopolitical situation matters. Relative strength matters. I can say with certainty that many people believed Ronald Reagan was more likely to get us into a nuclear war than Barack Obama. How much more is a matter of opinion, but as long as beliefs have idiosyncratic errors, the average guess is probably golden.

The estimated probability is likely to have a large forecast error, but it is flat out wrong to say it can't be done.

It is more complicated than that. They have models that they run, models built that are as close as possible to the actual weather systems, run with all the current temperature and pressures etc. As the field has progressed the models are quite good. Except that a model will produce different answers due to the inaccuracy of the measurements. A temperature reading, for example, 10.2C. is it 10.20 or 10.29? So they run the models with the measurements at different of the error range. Then they plot the results.

This is chaos theory. The plot could be a tight donut or no pattern. They can calculate the probability from the pattern.

Economics and social science comes nowhere near meteorology when it comes to prediction science.

I didnt mean to suggest that social science could predict with the accuracy of physical phenomenon. I was framing the forecast methodology in a way that is familiar to everyone.

We could certainly calculate the probability of a blizzard in San Francisco. It would only require understanding the confluence of conditions necessary for that to happen, even if it has never happened.

I generally look to the comments for *the* answer, but this thread is triggering a memory of that terrible Sunday that "Ask Marilyn" broke America's brain with her response to the Monty Hall problem in Parade magazine.

The Breaking of America's Brain was a great Sunday!

One should also think about survivorship bias, as if there was a nuclear war, the chance of you being born is a lot less. So it is possible the probability of nuclear war is actually quite high, but the universe you can be born into probably is one that didn't result in nuclear war.

This issue is discussed extensively by Nick Bostrom and others in terms of anthropic biases. A good place to start for looking at the issues with reasoning about existential risks and probability is Cirkovic, Sandberg and Bostrom's paper here: https://nickbostrom.com/papers/anthropicshadow.pdf

There is a 100.0000% chance that is we DO have a nuclear war, Alex never mentions this thread again.

This is just the ordinary frequentist-versus-Bayesian dispute in how to use probability. If someone insists that frequentism is the only possible approach to statistics, he'll endlessly bitch when someone uses them in a Bayesian fashion.

Yes indeed, academics debating how many angels can fit on the head of a pin.

There is undoubtedly, a method to come up with a number to the nth degree of certainty that is intellectualy defensible And also ludicrous.

One interesting thing about probability is that the real world is deterministic. We use probability as a way of dealing with our imperfect knowledge of the world but it is not part of the world itself.

By the way, yes, I know that quantum physics claims reality is uncertain at the quantum level but at the macro level reality is deterministic. We just do not have enough information about reality to assign 1 or 0 to specific events.

Deterministic or stochastic process has nothing to do with it. Predicting the probability of rain tomorrow has nothing to do with modeling a cosmic dice roll. It is the examination of the frequency of rain given current weather conditions.

As I say earlier, I dont believe social science predictions can be as accurate as weather predictions, but the framing of the problem is similar. We are looking at past incidents similar to what he have now and perhaps extrapolating from new information. Clearly we were at a heightening risk of nuclear war during the Cuban Missile Crisis than some random day the prior year. You wouldnt have been rolling dice. Youd be intelligently judging from the information available. The wisdom of the crowds would predict a fairly good estimate of the probability of nuclear war P by merely randomly asking many people 1, 0 questions.

One good way to help people conceptualize rare or even one off events is to teach them a couple of multiple universe solutions. Example: You are in a cube with six doors. One of them leads to escape while the other five put you in another cube with the same conditions and spins you are dizzy so that you don't know which door brought you in. What is the expected number of doors you must walk through to escape?

You could solve this with an infinite series, but the simple solution is to pick the six different doors in six different universes so that you know you walk through six to escape one time.

some math zen re: the idea of "unique"

https://en.wikipedia.org/wiki/Seven_states_of_randomness#cite_note-9

Take it one step back. What might cause a nuclear war? A regime with nuclear capabilities feels that it will be annihilated? An apocalyptic terrorist group gets its hands on a nuclear weapon?...Why does it seem absurd to predict the likelihood of these events? We know how many nukes there are, who owns them, how often regimes collapse, how many apocalyptic terrorist organizations there are and how hard nukes are to manufacture and deliver.

Moreover, classifications of events as "unique" depends on somewhat arbitrarily chosen abstraction levels since all events above quantum level are unique if every detail is included.

The fact that one can determine that in the aggregate that one's estimates for the 'probability' of various events are systematically biased up or down doesn't entail anything about any one particular event.

To give a super formal example suppose that I put a probability distribution over the space of natural numbers (such that 99% of the measure is on numbers above a billion) and for each number I ask you to tell me the probability that this number is prime. It might be that the probability that a given number drawn from this distribution is prime is 50% but for each actual number (on most useful interpretations of probability) the probability that it's prime is either 0 or 1.

Also note that I can take a single event E (e.g. there will be a nuclear war) and depending on what aspect of it I focus on put it in different aggregations of events which occur at different rates. So suppose experts assign a probability of 1% to a nuclear war in the next 10 years. However, when queried on the space of major disasters experts turn out to, in aggregate, underpredict the rate they occur. When queried about the likelihood of wars those same experts, on aggregate, turn out to overestimate the rate they occur. So do we conclude the guess at the probability of nuclear war is under or over the true probability?

Ultimately, I grant there are various notions of probability that make the probability of unique events meaningful. But it's not clear any of these notions are ones we can get a good grip on. In particular, the need for a true probability measure to reflect logical omniscience raises big problems for ever knowing the probability of these kinds of events.

Yes, if you need justified true knowledge in a super formal sense, then until you iron out the details and "get a grip on" these question, you can't ever make decisions based on your expectations in that same super formal sense.

Is the deep philosophical problem here that you don't understand how the formal specification should work, or is it that you think there is a real problem? And if the latter, how do you justify any decision?

I'd add that the above is merely a super-formal response about the difficulties in philosophy of probability.

It does nothing to undermine the fact that it's epistemically warranted to take the probabilities given in these models as if it was the probability of the event in deciding how to act.

The sense in which that value might not really be the true probability relates to issues stemming from the need for probabilities to obey the probability axioms and the fact that this implicitly forces probability assignments to reflect a kind of logical omniscience not a practical epistemic barrier.

Comments for this post are closed