This week's Episode is about the current rating system.
So basically i will go over the formula that calculates rating changes as well as rating decay and talk a bit about things I like about it and things I don't and how those could get fixed.
Sometime last week I got curious and was wondering how those rating changes were calculated after each ranked game. So I tried to figure out the formula that is used for it.
I wrote down some numbers from my own ranked games and tried to see a pattern:
Previous Ratings - Rating Change: 1813 1737 +13 1823 1561 +6 1829 1844 +14 1845 1777 +13 1858 1771 +13 1871 1414 +2 1873 1784 +12 1885 1753 +10
So a few things were pretty obvious: Rating changes depend on the difference between the previous ratings of the two players. Unfortunately after some guessing I came to the conclusion that its not something you can easily guess. I had something like K/(Ra-Rb) but that didnt work out. Ok so the next step was to look up other rating systems. I remembered something about the ELO-System that was used in Chess and I knew that League of Legends and Magic:the Gathering used something similar to that as well. So I looked that formula up on wikipedia and got a hit as the results matched the formula.
In chess rating changes are based on the expected number of points you get. (1 for a win, 1/5 for a draw and 0 for a loss) Now that would of course depend on the skill difference between you and your opponent. So the idea is to have a number that tells you how likely you will win or lose (or draw) vs a player with a certain number. For example it should always be the case that your expected points are 0.5 vs an equal rated opponent. Your chances should be the same as your rating indicates that you have the same skill level. Before I ramble on about this forever, here is the formula:
Where Ra and Rb are the previous rating points of the two players (lets call them A and B ;) ) I will talk about what K means in a second.
First of all, a few things that can be seen relatively quickly: It is indeed true that two opponents with the same rating number will have the same expected win percentage:
Let Ra=Rb. Then (Ra-Rb)=0 and 10^0=1. Ignoring the K for another moment we get the expected winrate of 1/(1+10^0) wish is indeed 0.5. Ok so thats great.
To be fair it is hard to see how the expected win rate correlates with the skill level measured by the rating numbers but at least it holds that a bigger difference gives less points to the better rated player and a lot more to the less rated player in case of a win. So as a 1900 rated player is highly likely to win against a 1500 rated player, he can only get a small number of points for a win but can lose a very high number of points if he loses, cause he is not expected to lose. So I hope that I could make clear that at least this formula does what we would expect a rating formula to do. I tried to find out more about the origin of this formula and found out that it is based on the idea that given a big enough pool of players, the skill level is distributed similar to the Gaussian distribution which makes sense considering that generally a lot of things are distributed that way, although it might not be true all the time. Another thing worth mentioning is that the formula is self correcting in the sense that a player who got a lucky win streak (and has therefore too many rating points) will lose them again if players at this rating level are actually better than him. Consider the special case of playing only people with his rating number, he is expected to win half of those games to stay at his level, but will lose more than 50% if his opponents are more skilled than him.
Of course, one thing you might want to ask at this point is: But what about luck? And while it is true that Scrolls does have both luck and skill to it, I claim that playing a big enough number of games, luck will cancel out as both you and your opponents have the same probabilities when it comes to drawing scrolls, or playing first. But I will mention that point once more later on.
Now finally lets talk about K, and I hope that this gets interesting for people who are not that interested in formulas as well. K is the number that weighs the relevance of a single game. Lets say K is set to 1. That would mean that the maximum amount of points you could gain or lose was 1 single point. Why? Cause if you have a very high rating difference, say something absurd as 800, then (Ra-Rb)/400 would be 2. Now 10^2 is already 100 and so the expected win percentage of the weaker player is 1/101 according to our beautiful formula. That means the weaker player could only lose 0.0099 points. So basically he got nothing to lose because his expected win rate is close to 0.
The same goes the other way around. In this case, the better player can only lose 1 point because K was set to 1. Now lets consider the usual K-value for chess which is 15 (according to wikipedia once more). That means the maximum amount of points to be won or lost will be 15, but if you play an opponent with equal rating number you will gain 7.5 points. I do think, although I am not sure, that our rating system in Scrolls does indeed use decimals and not only integers, but only displays the closest integer to your real rating number. Someone could check that using the formula and a few games if he wanted. What all of that means is, the higher you chose K the higher the variance gets. Winning streaks can shoot you up high with a high K and therefore losing streaks can put you down a lot. Now the question that is on your mind is probably: What is the K-value in Scrolls? -At least I hope that's your question after I talked about it for so long not telling you- If I didn't miscalculate the K-value in Scrolls is 32. In my opinion this is quite a high number (if you compare it to other games) and while the number isn't relevant when you play a very high amount of games I do think that it could need a little change. I have a few more thoughts on that but first I want to say a special things about our rating system. I actually didn't take the time to find out what happens to new players but if i remember correctly, rating below 1500 isn't rating in the same sense. Before hitting 1500 you will hardly lose any rating if you lose, but will gain about 100 for a win until you hit 1500 mark for the first time. I think that this is supposed to make you play a few rounds till you start with the real system. I haven't yet talked about rating decay, but again, if I am not mistaken, it will give you rating points instead of taking them away from you as long as you are below 1500. So 1500 is somehow the base value of the whole system. Another thing I want to mention very shortly is that while in chess any difference greater than 400 is set to 400, that doesn't seem to be true for Scrolls. Look at the 6th game in my little numbers table from the beginning and you will see that although the rating difference is bigger than 400 the change was only 2 points although 32*1/11 would be 2.9. To be fair that's not a proof and it might well be that I am mistaken here. Its not important anyway so lets forget about it.
Before I will make an attempt on giving suggestions for small changes to our current system, lets talk about rating decay. Rating decay got introduced in version 0.112 and in the patch notes it says: Added rating decay. Once a week, ratings will decay towards 1500. What that means is that on Sundays players with rating above 1500 points (the base value of the system) will lose rating. In fact the higher they are the more they will lose, while players with rating below 1500 will gain rating points. I think, and again I'm not entirely sure, that the developers wanted to make sure that rating doesn't get too high. This problem occurred with other games, as the before-mentioned Chess for example. The ELO number from 30 years ago can't be compared to the one now, or at least it doesn't tell you something about the skill of players 30 years ago compared to players now. But how much should rating decay be? That of course depends on the number of new players that join the game. Many new players would mean a higher rating inflation and less newer players would mean the opposite. I have to take other games for comparison again, as I don't have much data on Scrolls yet. League of Legends, considered the most played game ever, used the same system in Season 2 and the absolute highest rated players were between 2800-3000. But 2500 would still be top 100. So even with an insane amount of people playing the game, it is incredibly hard to get high ratings because of the way the formula works. Even more so in Scrolls as you might well get paired against people who are 300 points away from you in either direction. That means that even a win rate of 70-80% could mean that you lose points instead of gaining them. What I want to say is, that I hope Mojang monitors those numbers and check if the rating decay (in our current strength) is really needed.
Finally I want to suggest that the K-value gets looked at. It is common to have different K-values at different levels of the game (once more naming Chess as example). In my opinion the K-value shouldn't be as high as it is currently for players above, lets say, 1800. There is just no need for it to be that high. For example being at 1950 and losing 3 games (which can happen to the best as there is still a bit of luck in this game) means suddenly you are at 1875 or even slightly below that. That is just a discouraging system to play in, because it is very hard to get those points back. And of course I do believe that a ranking system should be encouraging for players at all levels to play the game. Ok so what I mean is: At a certain rating level, the variance should be decreased (let's say down to a K-value of 16- which is still a lot). That would mean two things: The maximum rating lost and gained per game would now be 16 instead of 32, so the variance of high rated players ratings would decrease, resulting in a more stable top 100. By no means does it change the expected win/loss ratio or the win rate of those players. On average it would actually change nothing, except reducing the number of people that are above or below their „real skill level“ Rating decay would actually make more sense, as now inflation happens on a new level. Normally the overall number of points doesn't change after a game, as the number that the winner gets is equal to the number the loser loses. Now that is still true for the case when both players are either above or below the threshold but in games with one player above and the other below that number, the worse player would get more than the better rated one loses or the better player would get more than the worse player loses, depending on who wins. Both times the over all rating number increases, meaning rating inflation happens which could be undone by rating decay (maybe even as high as it is right now).
Ok, so that's all for today, I hope this has not only been interesting for myself but for a lot of you guys. I wish you all the best in your ranked matches :)
Have fun and good luck! GuidoFubini
A small guide about what value is supposed to be and how you can succesfully use it to find the most optimal plays.