> Your two paragraphs have no relationship. How it could possibly "turn out" tha...

AlbertCory · on May 2, 2023

no, you can only "calibrate" his prediction abilities in general:

"given 100 predictions by Nate Silver, how many will be correct?"

that's what you just said.

In other words, if we forgot what actually happened, picked one past prediction out of a hat, asked "so was he right?" and then checked the results, he would be correct 95% of the time. That's what you're saying.

edmundsauto · on May 3, 2023

Correctness is actually a pretty bad evaluation because it reduces the outcome to a binary. There are better measures like a brier score that IIRC, they have pretty rigorously evaluated on their own predictions.

dragonwriter · on May 2, 2023

No, its saying...

Well, here:

https://projects.fivethirtyeight.com/checking-our-work/

AlbertCory · on May 3, 2023

OK, I read that.

It's telling that he uses MLB games, rather than Presidential elections, which are not covered at all in that article. Why?

There are many thousands of MLB games. There are a few thousand US House elections. There are only a handful of Presidential elections. It is not possible to measure their accuracy on those with any statistical rigor.

dragonwriter · on May 3, 2023

> OK, I read that.

Not very well.

> It’s telling that he uses MLB games, rather than Presidential elections, which are not covered at all in that article.

All the types of predictions 538 does are covered in the Brier skills chart on the overall summary; there’s detailed analysis covering each type of prediction (with further available breakdowns for subtypes available once you’ve gone into the type) available from the dropdown at the top of the page.

AlbertCory · on May 3, 2023

OK, I missed the dropdown at the top. There is not "detailed analysis" for Presidential elections, although I guess you mean that each state is a data point.

All that graph shows is that their "probability" is roughly in the ballpark with huge errors in the middle of the range, which is probably the swing states we care about most.

I'll suggest one concrete case where their probability of a candidate winning a 2-person race has some exact meaning:

  *You'll pick a number randomly from 1 to 100. If it's less than or equal to candidate A's "probability" of winning, then you bet $10,000 on A. Otherwise you bet $10,000 on B* (we need to put real skin in the game)

Are you willing to abide by that rule? If not, you don't really believe the probability.

dragonwriter · on May 3, 2023

> There is not “detailed analysis” for Presidential elections

There is, in fact, detailed analysis for Presidential elections.

> although I guess you mean that each state is a data point.

No, each individual prediction of each of the 56 (enumerated upthread) distinct elections for one or more Presidential electors is a data point.

> Are you willing to abide by that rule? If not, you don’t really believe the probability.

Well, no, not being willing to abide by that rule means either not being rich enough that winning and losing $10,000 are roughly symmetric in utility or not being an SBF-style nutball and demanding a non-zero risk premium. As it turns out, I am in both categories.

AlbertCory · on May 3, 2023

what does this mean, "each individual prediction of each of the 56 (enumerated upthread) distinct elections"?

how many predictions are there for each election? does 538 issue more than one per election? Or how many data points are there?

as for the last paragraph: I guess you don't really believe in their probabilities, then, which was my whole point. I have no idea what you're talking about, re "non-zero risk premium."

dragonwriter · on May 3, 2023

> how many predictions are there for each election?

One for each batch (which may be as small as one) of data (polls specifically in the polls-only model) added to the data; each takes into account current and past polls, with time.-based decay in weighting, distance from the election, and other factors. This includes polls for other contests in the same cycle, because the models accounts for correlation between them.

> does 538 issue more than one per election?

Sometimes multiple in a day for an election (especially the Presidential general election.)

> Or how many data points are there?

Well the scale for the bubbles on the Presidential election night calibration plot runs from 30-300 per bucket, and there’s 21 buckets.

> I have no idea what you're talking about, re "non-zero risk premium."

If you don't understand risk premiums and the asymmetric utility of nontrivial quantities of money, you have really no business talking about when a bet is reasonable.

astrange · on May 2, 2023

> no, you can only "calibrate" his prediction abilities in general:

Yeah, but that's what we care about - the past has already happened so we only care about future events.

AlbertCory · on May 2, 2023

OK, so maybe we're agreed: you can bet on his abilities in one election. Let's say he's been right 95% of the time (I don't know if that's true) and we believe that's likely to continue, knowing nothing except "this is an election and Silver's predicted the result."

Then if he says "Hillary is likely to win" we can have 95% confidence he's right.

If he says "Hillary has an 80% chance of winning" we ignore the 80, and just observe that it's more than 50.

astrange · on May 3, 2023

It's a bit more flexible than that, though - rather than ignore 80% vs 70% and make it all up or down, you can let him predict a few different events in a row, add up the errors, and see how off they are.

Or if you do want to review the past, you can look at the error for a category of elections or that entire year rather than his whole prediction career.

AlbertCory · on May 3, 2023

> predict a few different events in a row, add up the errors, and see how off they are.

I'm not seeing a formula there.

dllthomas · on May 3, 2023

How about here? https://en.wikipedia.org/wiki/Scoring_rule

AlbertCory · on May 3, 2023

that's the definition of a rule.

> One could note the number of times that a 25% probability was quoted, over a long period, and compare this with the actual proportion of times that rain fell.

it still depends on many samples, or "over a long period" in your doc.

You can't escape the fact that there are only one or two samples, no matter how much math you throw around.

dllthomas · on May 3, 2023

> that's the definition of a rule.

And there are several example rules on the page.

> You can't escape the fact that there are only one or two samples, no matter how much math you throw around.

That depends on what question you're asking. "How well calibrated are the electoral predictions that FiveThirtyEight makes?" is a sensible question with a lot of data points, seems to speak directly to the crowing about the one call being bad, and seems well suited to the application of a scoring rule for comparison between people making predictions about the same things.