How we reverse-engineered Chess.com's accuracy score
When you finish a game on Chess.com, you get a single number out of 100. Turning a messy game into one fair grade is a genuinely hard problem, and every site solves it a little differently. Here is how we built Backrank's accuracy score, start to finish, and how closely it lands to the number you already know.
Key takeaways
- An accuracy score is built in three steps: turn the engine's evaluation into winning chances, grade each move by how much it gave up, then combine those move grades into one number.
- The hard part is the last step. A plain average is far too forgiving, so a single blunder barely dents the score. We use a "power mean" so bad moves hurt the right amount.
- It lines up closely with Chess.com: across hundreds of games it agreed within 3 points three quarters of the time, with no systematic bias.
- The same number powers Backrank's free game review, which turns every mistake it finds into spaced repetition practice.
An accuracy score feels obvious: one number, 0 to 100, that says how well you played. But squeezing a long, messy game down to a single fair grade is harder than it looks, and every site does it a little differently.
We wanted the accuracy number in Backrank's free game review to be one you can trust and compare, so we built our own accuracy score to line up with the one most players already know, the one from Chess.com. Here is exactly how it works.
Backrank free game review
Backrank offers a free game review for Chess.com and Lichess.org players. Paste a game or add your username and it grades every move, no account needed. When you find a mistake, add it to your deck and it comes back as a spaced repetition flashcard until the pattern sticks. Try the free game review.
In this article
The engine speaks in centipawns, not grades
A chess engine never says "good move." It looks at a position and reports a number, say +0.8, meaning the side to move is about four fifths of a pawn ahead. The unit is the centipawn, one hundredth of a pawn, and it is wonderfully precise. It is also a terrible grade on its own.
The reason is that the same material means different things in different positions. A pawn up in a level middlegame is a real edge; one more pawn when you are already crushing barely changes anything, because you were winning before and you are winning now. A grade has to capture that, and raw centipawns do not. So the first step is to translate the evaluation into something human: your chances of actually winning the game.
From evaluation to winning chances
The conversion uses an S-shaped curve that maps the evaluation onto a winning percentage from 0 to 100.
Through the middle, where the game is roughly level, the curve is steep: a single pawn swings your winning chances a lot. At the edges it flattens, because once you are completely winning or lost, more material hardly matters. Going from level to a pawn ahead adds about six points of winning chances; going from five pawns to six adds only about four, the very same pawn. That matches how the game feels, and everything else builds on it.
Grading a single move
Once positions are measured in winning chances, a move's quality becomes easy to define. It is not the evaluation after your move; it is how much winning chances you gave up compared to the best move available. Play the best move and you lose nothing, so you score around 100. Give up a little and you stay high. Give up a lot and your score falls off a cliff.
That fall-off is deliberate. A small slip worth a couple of points still scores in the high 70s, a mistake costing nine points drops to the low 30s, and a blunder of twenty points or more lands near zero. Small imperfections barely register; genuine errors are punished hard, which is exactly how a grade should behave.
Adding up a whole game, the hard part
Now you have an accuracy score for every move. How do you combine twenty or forty of them into one number? The obvious answer, a plain average, is far too forgiving. Play nineteen good moves and one game-losing blunder, and a plain average still leaves you in the low 90s, because the good moves drown the disaster out. That is not how the game went, nor how a fair grade should treat it. A blunder should hurt. The fix is not something exotic; it is choosing a different kind of average.
The power mean: a dial between averages
There is more than one way to average a set of numbers, and they treat a single low score very differently.
- The arithmetic mean is the everyday average: add the scores and divide. It is generous, because a few high numbers easily hide a low one.
- The geometric mean multiplies the scores together instead of adding them. A single small number pulls the result down noticeably.
- The harmonic mean is the most extreme of the three. It is dominated by the smallest values, so one terrible score drags everything down with it. It is the same average you use for a round trip, where the slow leg sets the pace.
Take that same game, nineteen strong moves and one blunder, and score it each way. The plain average shrugs the blunder off and says 91. The geometric mean, which multiplies the scores rather than adding them, comes in around 81. The harmonic mean overreacts and says 49, treating one bad move as if the whole game fell apart. Neither extreme feels right, and the truth sits somewhere in between.
That in-between is exactly what a power mean gives you. Picture a single formula with a dial on it: turn it one way and it is the plain arithmetic average, turn it the other and it becomes the geometric mean, then the harmonic mean, then anything in between. The dial is one number, an exponent, and it sets precisely how much a bad move is allowed to hurt.
So we did not invent a new average; we took the general one and chose where to set its dial. We also put a small floor under each move, so one catastrophic move cannot drag a whole game to zero. With the dial set, that same game scores 85: well below the forgiving 91, nowhere near the panicked 49. A blunder costs real points without erasing an otherwise good game.
A plain average forgives a blunder. The harmonic mean panics over it. The right score sits between, and a power mean lets you dial in exactly where.
How well it matches Chess.com
So how close does it land? We compared our score to Chess.com's own game-review number across hundreds of games from strong public players, Hikaru Nakamura, GothamChess, and Daniel Naroditsky, and the two line up tightly. The scatter plot at the top of this article is that comparison: each dot is one game, hugging the diagonal where the two scores agree.
Three quarters of scores land within three points of Chess.com's, the typical gap is about two points, and there is no systematic bias: not consistently high or low, just scattered tightly around the right answer. Plot the differences and they form a narrow bell curve centered on zero.
The small gap that remains is most likely not a flaw in the formula. Both Chess.com and Backrank analyze with Stockfish, but the exact version and search depth differ, and those matter: evaluate the same position a little deeper, or with a different build of the engine, and the number can shift by a point or two. That alone is likely enough to account for the scatter you see.
What this means for you
The point of all this work is simple. The accuracy number in Backrank's free game review is one you can actually trust and compare, because it is built to agree with the score most players already know. And unlike a paid report, ours is free, works for both Chess.com and Lichess.org games, and does something with what it finds: every blunder and missed tactic it spots can become a spaced repetition flashcard, so the mistakes behind a low score turn into practice that raises it. If you want to see where a game really turned, reviewing it well is the next step.
Want to see your own number? The free game review grades any game move by move, no account needed.
Frequently asked questions
How does a chess accuracy score work?
It converts the engine's evaluation of each position into your winning chances from 0 to 100 percent, measures how much you gave up on each move compared to the best one, and combines those move scores into a single number for the game.
What is a good accuracy in chess?
Casual games often land in the 70s to low 80s, while strong players regularly post high 80s and 90s. The number depends on how sharp the position was, so it is most useful compared against your own games over time rather than against a fixed target.
Why is my accuracy different on different chess sites?
Each site chooses its own engine settings and its own formula for turning evaluations into a score, so the same game can grade slightly differently. Backrank's score is built to line up with Chess.com, so the same game scores within a couple of points of what Chess.com reports.
Is Backrank's accuracy the same as Chess.com's?
Not identical, but very close. Across about 928 analyzed games it matched Chess.com within 3 points 75% of the time, with an average gap of about 2 points and no systematic bias in either direction. A difference that small is roughly what you would expect from two analyses run at different search depths and Stockfish versions, rather than a real disagreement about how well you played.
Can I check my chess accuracy for free?
Yes. Backrank's free game review grades any Chess.com or Lichess.org game move by move and reports an accuracy score, with no account required.