NFL Elo F.A.Q.

What are Elo ratings?
How do Elo ratings work for the NFL?
Why can teams lose Elo rating points after a win?
Why are you publishing these?
What are the goals of these NFL Elo ratings?
Why be objective?
How can the "accuracy" of these Elo ratings be judged?
How are these Elo ratings "objective?"?
What are the model parameters?
What data is used to calculate these Elo ratings?
How accurate are these ratings?
The NFL only has 17 games in a season, and every year there is a lot of roster and coach turnover. How can Elo ratings be effective for the NFL?
Why don't these Elo ratings use player injuries?
Why don't these Elo ratings use QB stats? Or offense or defense EPA stats? Or other stats?
Why use margin of victory? The object of the game is to win, not to win by a wide margin.
How can margin of victory be used when a 30-point win may be no better than a 20-point win?
Why are the ratings so slow to update? Team XYZ has won several games recently, and they're not highly rated.
Team XYZ gained too many rating points for beating an injured team. How is this accounted for?
Why is Team XYZ in the top 5? They've played terribly the last few weeks.
Why is Team XYZ in the top 5? They have a losing record.
Why is Team XYZ ranked so low? They have a great record.
Why do last season's ratings matter so much? This is a new season.
Team A played Team B. Team A appears to have gained more Elo rating points than Team B lost. Shouldn't they be the same?
Are these Elo ratings biased toward Team XYZ? Or against Team XYZ?
Where can I see discussion of these ratings each week?
Why was the model changed?
I have a comment, feedback, or suggestion. How can I get in touch?

↑ What are Elo ratings?

Elo ratings are numeric strength ratings for a set of competitors (see Wikipedia) where higher scores are better. For example, competitor "A" with an Elo rating of 1,650 is considered stronger than "B" with a rating of 1,600. This allows us to judge the strength of two competitors that haven't recently, or ever, had a head-to-head match.

The expected outcome of one or more contests between any two contestants can be calculated using their ratings. After a match, the contestant that exceeds expectations takes some amount of Elo rating points from their opponent. For the example ratings, say the weaker contestant "B" beats "A" and is awarded 30 Elo rating points. "B" would then be rated 1,630 and "A" would then be rated 1,620.

While originally invented for rating chess players, Elo ratings and similar systems are used in many areas.

↑ How do Elo ratings work for the NFL?

In applying Elo ratings to American football, we cannot treat an NFL game like a single chess game. A chess game can be won or lost in only a few moves, and the endgame is determined by which pieces remain and where they are physically positioned. This is more analogous to a football drive or even a single play than to an entire football game.

Another way of looking at it involves the approach taken by the players/teams involved. In a multi-game chess match, players do not treat each individual chess game as a "must win." Depending on whether they are playing the white or black pieces, or how the match or game progresses, they may attempt to invoke a draw or play conservatively. A chess match cannot be won or lost in a single game, and their goal is to win the match, not each game. Similarly, football teams do not treat each individual drive or play as a "must win." A football game cannot be won with a couple of scores in the first quarter. Their goal is to win the football game by "winning" more drives/plays than the other team over the course of the game.

For these reasons, it doesn't make sense to treat a football game like a single chess game with respect to Elo calculations. Instead, we'll treat a football game like a multi-game chess match. This works nicely because after a scoring drive, a football team kicks to their opponent and a new drive begins — to some degree the game is reset. Nothing like this happens in a single chess game.

For a chess match, player ratings are used to compute the expected winning percentage of games for both players. An expected performance of 0.5 means both players are expected to win as many games as they lose and draw the rest. For football, we can consider each game as a series of drives (or plays). If one team "wins" most of the drives within a game, they'll likely end up with a large margin of victory. If both teams "win" the same number of drives, the margin of victory will likely be much more narrow. Therefore, team ratings can be used to calculate an expected margin of victory for the favored team. For example, if both teams have an expected performance near 0.5, a close game is expected, with a tie being exactly 0.5. If one team is expected to score near a perfect 1.0, they are expected to win by a "blowout" margin. After a game, the two teams' Elo ratings are adjusted based on the difference between the actual and expected margin of victory.

↑ Why can teams lose Elo rating points after a win?

For Elo calculations, it makes more sense to treat an NFL game like a series of contests (drives or plays), where the goal is to win the football game, not every drive or play. This is more analogous to a multi-game chess match both for how the games play out and in how the players approach each game or football drive/play. If a team wins a football game, but by a smaller than expected margin, they will give Elo rating points to their opponent. See How do Elo ratings work for the NFL? for more background.

Also see Why use margin of victory?.

↑ Why are you publishing these?

FiveThirtyEight had NFL Elo ratings, and they were fun to watch over the course of a season. Their Elo ratings stopped being updated after the 2022 season. When the 2023 season began, I missed checking the Elo ratings enough to make my own. I published the first weekly update to reddit following week 5 of the 2023 season.

↑ What are the goals of these NFL Elo ratings?

These NFL Elo ratings are intended to be simple, objective, and as accurate as possible.

↑ Why be objective?

Human-written power rankings have value, but their authors may have some bias, unconscious or otherwise, that affects their power rankings. Elo-based ratings and rankings can reduce or eliminate biases for or against individual teams.

I believe that If I do any subjective model tuning to make the ratings "look more correct" to my eye, then I am nearly 100% likely to be making the model less accurate, or more biased in some way, or both. I want to avoid this altogether.

There are plenty of power rankings designed by their authors to look correct to their eye. (Reddit user mikebiox has been posting weekly averaged power rankings for years if you're interested in those.) There's no point in replicating any of that here, so these Elo ratings are intended to be objective.

↑ How can the "accuracy" of these Elo ratings be judged?

Simply put: these Elo ratings have an expected winner for each game. Over one or more seasons, we can see how many game winners are correctly picked.

These Elo ratings take a few factors into account, including home field advantage and rest days, to calculate expected winners for each game. Models that pick more game winners correctly are considered more accurate. To avoid overfitting to individual seasons, the standard deviation of the pick rate across seasons is also used: models that pick game winners more consistently from season to season are considered more accurate.

As a benchmark, the model pick rates can be compared to Vegas's straight-up pick rates.

↑ How are these Elo ratings "objective?"

Simply put: the model is tuned to match the average case over thousands of NFL games over 31 seasons.

The ratings are calculated with regular Elo rating math, but things like the "K-factor," home field advantage, how to define a close win vs. a blowout win, etc., are parameters that must be set to something.

To be "objective" we set the model parameters, and see how accurate the model is compared to the best known models. Then, of course, we simply go with whatever model is most accurate.

Initially during the 2023 season, I tried tuning these parameters by hand to find the most accurate model I could. I quickly found this to be an impossible task, and by the 2024 offseason I was doing automated testing of thousands of models to find the most accurate one from 2012-2023. Over the 2025 offseason I experimented with optuna hyperparameter optimization, and ended up testing tens of millions of models against the 1994-2024 seasons, and using optuna to fine-tune the best candidate models.

↑ What are the model parameters?

There are more than 20 parameters for the model including home field advantage, rest days advantage, what constitutes a "blowout" victory, and so on. The value I've created is in the details here, so I will not be publishing them.

I will continue to make the Elo ratings themselves freely available here, on my website, with no trackers, no cookies, and no ads (and not even any JavaScript!).

↑ What data is used to calculate these Elo ratings?

The ratings are calculated from:

margin of victory/defeat,
home field advantage,
rest/travel days between games,
whether an overtime period occurred,
offseason "parity reset" (reversion to the mean),
early season shenanigans/uncertainty,
division alignment (whether a game is a division matchup),
whether a game is meaningless with respect to playoff seeding,
and of course both teams' ratings.

↑ How accurate are these ratings?

This Elo rating model was tuned by backtesting against 8,200+ games NFL games from 1994 through 2024. Over that span of it correctly picks winners in 65.84% of games, trailing the Vegas straight-up pick rate of 66.63% (according to sportsoddshistory.com), though it did meet or exceed Vegas's pick rate in 8 of the past 14 seasons. This model also does better than the average game picker at NFL Picks Page. That being said, these ratings and rankings are not intended for use in informing sports betting decisions. This is a simple Elo model derived from a minimal data set and it does not account for many relevant factors that may impact game outcomes. The accuracy of this model for past seasons is not necessarily indicative of its accuracy for the current or future seasons.

This chart shows the pick rates by season between Vegas and the newest (at the time of writing) Elo models for the 2025 season.

↑ The NFL only has 17 games in a season, and every year there is a lot of roster and coach turnover. How can Elo ratings be effective for the NFL?

I guess it's a bit surprising, but the "slow to update" nature of these Elo ratings was found to give the best accuracy. This indicates that NFL teams do carry over much of their identity from one season to the next.

The model's accuracy, compared to Vegas and to the average person, demonstrates this.

↑ Why don't these Elo ratings use player injuries?

I don't think it's a good idea to directly change team Elo ratings as a result of player injuries. Practically, it is extremely challenging to quantify the impact of a single player. This would involve play-by-play analysis, and even this is kind of limited in usefulness because we don't know all the players' assignments on each play, and players are rotated in and out of games all the time. On top of that, sometimes players are replaced by quality backups or trades or by signing guys off the couch. And we do want to incorporate the strengths of each team's depth and coaching staff into their ratings, and injuries allow us to do this.

But ultimately, I don't want Elo ratings to be impacted by anything other than what happens on the field. If we change team ratings outside of game results I think we're straying too far from true "Elo" ratings.

I think this approach works fine for the most part. Vegas does take injuries into account, and these injury-unaware Elo ratings track the Vegas straight-up picks closely (even exceeding them in some years). Also see Why don't these Elo ratings use stats?.

↑ Why don't these Elo ratings use QB stats? Or offense or defense EPA stats? Or other stats?

Ultimately, I want to keep these ratings simple and traditional. I believe that incorporating more stats and variables makes the model more susceptible to bias, and makes the model harder to tune for accuracy. The model's accuracy is, in my opinion, already very good as-is.

Additionally, it appears that stats, roster changes, etc., don't seem to matter as much as we'd think. Again, I believe the accuracy of these ratings is proof of that.

↑ Why use margin of victory? The object of the game is to win, not to win by a wide margin.

I am very confident that teams attempt to score on most drives on offense, and try to prevent scores while playing defense. There are some situations where it's optimal to focus on burning clock or to force the opponent to use timeouts, but otherwise I believe teams try to score on every possession. At the end of the day, the best way to increase your chance of winning is to have a larger lead.

It also makes more sense, Elo-wise, to treat a football game like a multi-game chess match than a single chess game, where margin of victory is used to determine team performance. Additionally, using margin of victory instead of "expected chance to win" makes calculating model accuracy more straightforward.

↑ How can margin of victory be used when a 30-point win may be no better than a 20-point win?

Margin of victory is awarded on a curve. A 70-point win is not awarded 10 times as many Elo rating points as a 7-point win, and a 30-point win is not valued much more than a 20-point win. See more background about this curve here.

↑ Why are the ratings so slow to update? Team XYZ has won several games recently, and they're not highly rated.

Like all aspects of this system, the model appears to be slow to update because that results in the most correctly-picked games, on average.

One interpretation of this is that games (especially close games) are often won or lost due in part to factors not related to the underlying "true strength" of a team: sometimes they play at a stronger or weaker level depending on injuries, opponent, game plan, individual matchups between players, play calling, and even weather and field conditions, and there are also simple flukes and odd bounces of the ball.

If we were to change an improving or collapsing team's Elo rating too quickly, we might be more accurate for that team but at the expense of accuracy for other teams that simply had an outlier good or bad game. The best the model can do is shoot for the average case and try not to over-react to individual games.

↑ Team XYZ gained too many rating points for beating an injured team. How is this accounted for?

If a team is actually overrated (which does happen), they will by definition then under perform (as far as the model's concerned) in subsequent weeks (barring strange circumstances where a team benefits from this in successive weeks). This will drag their rating back down.

↑ Why is Team XYZ in the top 5? They've played terribly the last few weeks.

See above.

↑ Why is Team XYZ in the top 5? They have a losing record.

Elo ratings are intended to be informative about any hypothetical matchup today, not predict which teams will make the playoffs or win the Super Bowl.

A team's Elo rating is the predictor for how strong they are, not their record. Strong teams often have good records, but not always.

↑ Why is Team XYZ ranked so low? They have a great record.

See above.

↑ Why do last season's ratings matter so much? This is a new season.

Like other parameters in this system, the impact of the offseason parity reset and how much of the previous season's ratings to carry over has been adjusted to maximize the number of correctly-picked games, on average.

I do publish a separate "blank slate" model that does start fresh every season. The 2025 "blank slate" ratings are available here.

↑ Team A played Team B. Team A appears to have gained more Elo rating points than Team B lost. Shouldn't they be the same?

That is correct, and they are the same. Team ratings and the rating points exchanged after games are decimal values behind the scenes. They are only rounded for display on the website, which can make them appear to be different.

↑ Are these Elo ratings biased toward Team XYZ? Or against Team XYZ?

No. These Elo ratings are intended to be as accurate and objective as possible.

↑ Where can I see discussion of these ratings each week?

On reddit.

↑ Why was the model changed?

These Elo ratings are intended to be as accurate and objective as possible. Along those lines: if I find a more accurate model, I will switch to it.

As of 2024 and 2025, the models appear to have mostly plateaued, so I expect model updates to be annual (every offseason) at most.

Previous years' pages are still available as "frozen" pages. See the top of this post for links to some old versions.

↑ I have a comment, feedback, or suggestion. How can I get in touch?

I look forward to getting feedback in weekly reddit threads, and I can also be reached via email at my first name at this domain.

NFL Elo Home