I am curious about the other assumptions. I would prefer decay to be much higher or much lower. I don't think people's skill decays much with time so I don't see why the earned confidence would go down with time. At the other end, a high decay gives you a leaderboard, which can be useful in its own light. The explanation of those assumptions are why I eagerly await the article.
There is an inherent flaw in any starting deviations. Average is not average. I understand and find this loss acceptable compared to the benefit of a better starting mu. Elo, and I think Glicko, are normally zero-sum, so the average of all is always the average, when everyone starts in the middle.
At first I though anything other than 350 initial phi was an absolute error, but I started considering the extremes. As mentioned before, if a coach plays one team to a high confidence, EVERY new team will start with a high confidence. To me, that is a problem. If a coach has played 25 teams to similar mus and all a high confidence and the last team starts at phi=350, that is also a problem.
First:
rp = number of previous races played
max = the current phi value of the least confident of the previous races played
tr = total other races (currently 26-1=25, -1 for the team being created)
New phi = ((rp*max)+((tr-rp)*350))/tr
It is the average of all other races (with unplayed races being 350) and assuming all played races have the least confident phi determined so far.
Second:
new phi = the average phi of all other races with unplayed races being 350 then capped at the current highest phi if needed.
In both ideas, the more races a coach has played the more confident we are that the new mu is reliable, but never more reliable than a previous mu.
I wrote a few other ideas, but then realized they were inherently flawed. They only worked reasonably for coaches that performed above average, not for coaches that performed below average.
Let's continue to use Straume as the example. If he takes Khemri to the next tournament, it would start with a mu of 1662.36 and a phi of 164.7 giving him a rating of 1250.61. This is higher than his current Orc rating of 1214.80. Back to Glicko theory, phi is a measure of confidence and mu-2.5*phi is the lower bounds of the confidence interval we have for the estimation of the coach's true skill. That is why that value is used for the rating. Glicko predicts that the player is at least that skilled. I think it is unreasonable to say that a newly initialized coach/race would have a high lower bound of skill than a previous race. Straume's new Khemri rating should not start better than his Orcs no matter how good he is with Dark Elves. Maybe Orcs are Straume's worst team, but maybe Khemri should be. I think this can be best fixed by adjusting the initialization of phi.
mubo wrote:350 (totally naive player)
Maybe I don't understand you or maybe this is where our disconnect is. Instead of me trying to guess what you mean and continuing to argue off on a tangent, would you please explain the quote? What does phi tell us about the player?
Straume: if you would prefer not to be used as an example please let me know and I will edit.