We use a simple machine learning model, logistically-weighted regularized linear least squares regression, in order to predict baseball, basketball, football, and hockey games. We do so using only the thirty-year record of which visiting teams played which home teams, on what date, and what the final score was. No real “statistics” are used. The method works best in basketball, likely because it is high-scoring and has long seasons. It works better in football and hockey than in baseball, but in baseball the predictions are closer to a theoretical optimum. The football predictions, while good, can in principle be made much better, and the hockey predictions can be made somewhat better. These findings tells us that in basketball, most statistics are subsumed by the scores of the games, whereas in baseball, football, and hockey, further study of game and player statistics is necessary to predict games as well as can be done.

That is an almost Hayekian result, and I wonder what the people at 538 will think of it.

For the pointer I thank Agustin Lebron.

I thought that was bizarre too on first reading, but the rest makes it clear–they don’t mean “statistics” the branch of mathematics, they mean statistics as a sports fan would think of them. All they have is the final scores, who’s the home team, and the time series order, but they don’t have any details about who played what game or who had the ball for how long or so on.

Having the final score sure sounds like a statistic as a sports fan would think of it.

I was thinking that they somehow were getting good predictions just from scheduling information, and not from any results of play.

Can somebody explain what this means? Thanks

I think what they want to say by ‘statistics’ is the kind of detailled statistics that is available for example for Baseball, where you have runs per player and so on, basically creating a complex multi-dimensional predictive model.

Technically the method they used is just predictive statistics (logistical regression), but using only scores and home vs visitor data.

I don’t find it very surprising that you can make such predictions with a certain level of confidence, simply because “rich teams” (as in “Moneyball”) will win more often than poor teams, and therefore past wins are predictive of future wins. It all depends on what margin of error you are willing to accept.

Moneyball is about a poor team that uses a different set of statistics to identify undervalued players and win vs. “rich teams”. Not what you are implying.

Their use of different stats made them an outlier. So perhaps the model would have more variance when predicting outcomes of Oakland games, but better with non-Oakland games.

Though nowadays more teams use Moneyball tactics so it might not work as well.

Also, in basketball one of the reasons Golden State is reputed to be so successful is their obsession with player health. So much so that they sit some of their stars when playing in Denver because they found that the altitude can affect a player’s performance for up to two weeks afterwards. I wonder how good the model is at predicting outcomes in Denver?

the bigger problem for baseball is it’s the only sport where the fundamentals change dramatically game by game. The starting pitcher isn’t determined by home/away status it’s determined by “is best pitcher available/has had 4+days since last start” No, has second best pitcher…

also moneyball is about constructing a team given monetary constraints so i don’t think it s even relevant to this. “which visiting teams played which home teams, on what date, and what the final score was.” all that measures is stuff that’s happening after things were locked in.

But the main reason for their success, and nothing could be more obvious, is that Steph Curry is the best basketball player in the world.

Football, hockey, and basketball all have relevant salary caps and don’t vary all that much in what teams spend

30 years of hockey data includes a lot of non salary capped years

It seems the core of this method is:

“it (the model) measures the ability of teams over games instead of players over possessions, and it takes into account home-held advantage. We intentionally limit our data use to the date, home and visiting teams, and score of each game, and we compare our predictions to a theoretically near-optimal indicator. Doing so tells us what statistical information is contained just in the scores, and whether what are commonly referred to as “statistics” have real predictive power.”

For short, historical scores and home-visitor data tell a lot about basketball outcomes. In certain way, this makes the game boring.

Just guessing, reading the abstract, I think what they did was ‘back fit’ data going back 30 years to see what the likely ‘dummy variables’ are that are most likely to predict the outcome.

Simple example (making this up): suppose the Green Bay Packers always seem to win at home (90%) when playing their conference teams, but occasionally (25%) lose when playing non-conference teams at home. Using ‘machine learning’ you would input this into a linear equation having various coefficients, and if the team was the Chicago Bears playing the Packers in Green Bay, WI, you would know the GB Packers would win 90% of the time at home, but if it was the New Orleans Saints playing in Lambeau Field the Pack would only win 75% of the time. This is done for all teams, and tested against data going back 30 years, and finding the best fit (‘least squares’) for the linear equation for each team.

In short, much ado about nothing.

Of course the “least squares” would find the coefficient values for each dummy variable. In the example above, for Green Bay, the dummy of “HOME_FIELD_ CONFERENCE_OPPONENT” would be 0.90 for Green Bay, and the dummy of “HOME_FIELD_NON_CONFERENCE_OPPONENT” would be 0.75 for GB. Likewise for other dummy variables, as many as you want, all back tested with “machine learning” to see if they are significant dummies or not.

I’m making this up but it’s plausible.

Predicting winners is not that hard. Might be useful to compare to a naive predictor as well as the “optimal” one. How about no regressions, just predict based on each team’s prior season’s win percentage?

The hard part is predicting net of Vegas lines. That would be interesting. Do they win by more than expected?

This x1000. Much as nobody would care about a stock model that underperforms the S&P 500, a prediction model that underperforms the Vegas line is essentially valueless.

This strikes me as back to the future. Surely least squares on team wins was the sort of thing done with earliest home computers, before approaches grew in complexity.

Yes, but not logistically-weighted least squares!

Yeah, but how does it do against the spread?

You can bet odds rather than spread. In which case, being right 60% of the time might not be enough.

Basketball and football seem to load more heavily on skill which should make them easier to predict. IE you don’t have baseball teams winning 90% of their games…

what would football team records look like if they had a 32 game schedule?

Perhaps I am misunderstanding the model a bit, but relying on a thirty-year record (a period long enough for rosters to turn over 10-20 times) seems likely to end up capturing a lot of franchise effect. That is, it will be easier to predict matchups in sports where the same franchises are more regularly at the top of the standings. Considering basketball seems to have some of the least parity of the Big Four sports (I don’t follow closely, but that’s my understanding), it’s not surprising that a 30-year old boxscore would have more predictive power in basketball than in other sports.

“The Patriots are consistently good at football”

bbl, going to get this sentence published in an academic journal

Jeff Sagarin has been doing this for years using linear least squares regression. You can see all of his “power ratings” at USAToday.com.

What is less clear from the paper is if the method can predict outcomes better than Vegas odds.

I used Sagarin’s college basketball ratings to fill out my bracket (the “RECENT” method) and I cruised to a commanding lead in the first round of my family NCAA pool. It didn’t tell me Michigan State was going to lose in the first round, but it did tell me now to advance them past the Elite Eight.

Tyler –

What does “Hayekian” mean? If I referred to something as “Cowenian” what would that mean?

The analogy is with Hayek’s view of the price system. Prices summarise relevant information (e.g. I don’t need to know whether rainfall was high or low this year; I just need to look at the price of wheat) in the same way that all the relevant sports `statistics’ are summarised by the game scores.

Great description/summary of “Hayekian” theory, but what would “Cowenian” mean?

Thank you Ricardo. Strange that the life and work of a Nobel-prize winning economist can be synechdochized into four words of common sense.

Most economic research, indeed most social science research, can be summarized in concise common sense terms. Search theory, rational expectations, public choice, externalities, etc. There are only a few models or theories that are counter-intuitive: comparative advantage, Coase Theorem, maybe a few more.

This doesn’t seem surprising. Basketball is a series of a large number of essentially independent plays with almost no room for strategy. It should basically come down to the athleticism of the top couple of players who have the ball most of the time.

Football and hockey are strategically deep and have large teams so it’s hard for an individual to dominate. Today, it’s basically quarterbacks and goalies who have out-sized influence.

I have to admit I’m a little surprised by the result for baseball, which I would have put in between (football and hockey) and basketball. I wonder if it’s an artifact of pitcher rotations. Maybe if you treated team,pitcher combinations as the thing you were trying to predict you’d do better.

Baseball conclusion also surprised me. Is there too much noise in the single game result? I am fairly confident that underlying statistics will do the most to improve prediction for baseball games since they are individually very powerful predictors, but I don’t see why this result would come out of the study as described.

My guess for baseball is that PedroMartinez,RedSox is effectively a different team from TimWakefield,RedSox.

Agreed on baseball statistics: it’s relatively easy to measure very important aspects of individual performance, like batting.

lets quantify this in a snapshot for random year with both players: 1998

http://www.baseball-reference.com/teams/BOS/1998.shtml

going to round a decent amount

the top three red soxs pitchers

Petro martinez leaves the game in 1998 at the start of the 8th inning having given up 2.25 runs

Tim Wakefield leaves halfway through the 7th inning having given up 3.3 runs. [which undercounts since i’m using earned runs and presumably a knuckler will have more unearned runs]

Bret Saberhagan leaves with 2 outs in the 6th inning having given up 2.48 runs.

That’s a huge difference between Pedro and the others. your #1 pitcher essentially gives you 1 1/3 innings of

“guaranteed” scoreless innings. Pedro to Wakefield makes you net -1+ runs. OR: The total team averaged 4.5 runs allowed per game (above used earned runs). So “generic boston” earns 4.5 runs but switching Pedro to wakefield saves at least 22% of RA per game. S

I too sort of “object” to the title. Maybe “Team-” or “player-statistics-free”? Awkward, but much more accurate.

The type of machine learning described is essentially statistical, but the only statistic being used is the score. It would be surprising if more information (individual player statistics, etc.) did not make for better predictions.

that’s the exact same info that 538 uses for their nfl elo rankings and predictions; they beat the spread by a bit, not enough to cover the vigorish

Yup, I would have writren “minimal model” and writren more about what I left out (e.g. individual player statistics/data).

So yes, poorly worded.

And yes, a model that captures 95% of the variance is often far, far less useful than one that captures even just a little more ( like 98%).

Now to sarcasm, a bit unfair:

At the end of last year, I made the bold prediction that Golden State would be a top team this year and that the Cavs would be a top team in the East. And my model just uses winning percentage…

(Where naive models like this one get most interesting is when used as a comparator to live humans. A few years ago, a Canucks fan posited that the hockey team’s management was doing “worse than a potato,” the potato representing a naive drafting model (always draft the available forward with the most points from the most important Canadian junior league). The case was compelling that the team, in fact, did a worse job of drafting than the potato.)

The baseball result is fairly obvious given the inputs. The single most important player in any particular game is the starting pitcher, but that player rotates on a schedule. Composition of “the team” changes substantially day-to-day.

As a Broncos fan I was disappointed when 538 picked the Panthers in the Super Bowl. But having watched all the Broncos games since, oh, 1999, I felt like this year’s edition was a tougher team that had overcome more adversity to get to the Super Bowl, and that would prove to be the X-factor. Plus, GM John Elway had said that lack of team toughness was the reason he fired the head coach last year.

Predictions close to the “theoretical optimum”? Does the theory have a name?

I also wonder what this means.

Meh. Using scores of past games to predict future games. The only innovative thing that I see is that they are using a relatively new machine learning technique, RLSC (regularized least squares classification). Which may indeed yield better predictions than older techniques, but I’d be astounded if it was more than a marginal step forward. And most new techniques are clearly superior in only a fraction of their possible applications; other techniques will give superior results when used in other applications. In other words, there’s no guaranteed best crystal ball.

Comments on this entry are closed.