plasmoid wrote:That said, for the purpose of measuring "power", I still don't see how it does any good to include stats that very clearly do not relate to power (e.g. a 1 snotling team and an 11 griff team would still get a 50% win percentage in their respective mirror matches).
Relative power, no, but only if you're drilling down to a specific roster vs roster win rate... which isn't the case. You're trying to use aggregate win rate to detect relative power, which is confounded by the fact that you aren't controlling for composition. If you want to use aggregate win rates you should probably retain mirror matches... if you don't, it doesn't matter since mirror matches won't affect your measure of specific win rates between different rosters.
plasmoid wrote:I don't think I understand what you're saying here. I did try to take composition of the global meta into account...
You're not taking composition into account by simply kicking back and saying "obviously these numbers represent the types of games found in tournaments because the data comes from tournaments, so the composition is already appropriate". You aren't going to find a uniform distribution of rosters in any given tournament (unless making it happen is part of the tournament's design) nor can you assume that across a large enough number of tournaments the distribution can be assumed.
The changing of advantage and disadvantage part, if you don't understand that, I'll explain:
Consider the "tiers" of teams created by the aformentioned cluster analysis. Each "tier" is a grouping of teams which itself has a midpoint and an upper and lower bound (the latter two being defined by the edges of the win rate distributions of the top and bottom teams of a given tier). The "balancing" action for these tiers would be to try to apply a correction to each such that their centroid (the midpoint of the cluster/tier) roughly, or precisely, overlaps the midpoint of each other tier, bringing the tiers together, right?
...except what you've got then is a new hierarchy. The widest tier now defines the team with the biggest advantage over others, and the biggest DISadvantage relative to the others based on the top and bottom teams of the widest tier. The "balance" is then based on how similar the widths of those tiers are rather than the general success of the individual rosters. If, prior to balancing, you had tiers 0 through 4, say... and tier 2 was the largest, covering the largest span of win%s, then your new hierarchy makes the best team in tier 2 the best team
overall and the worst team in tier 2 the worst team
overall in your "balanced" new system.
Now, I saw el-dingus say the centroid of each cluster was within the 95CI of each roster contained in the cluster... that's going to be deeply misleading and based on very small sample sizes. We know from looking at large scale data from online play that the individual roster distributions will very quickly fail to do this. You cannot rely on a lack of information to justify a system's utility, and that's what using wide CIs from small samples represents.
plasmoid wrote:I think you're overstating here.
Sure, it will never be perfect. But I think it can be improved.
It may improve balance depending on how similar in width the roster groupings are, but the final balance depends on the intra-tier balance of the largest/widest tier. If you
know there's a flaw in the particular design of the balancing system, why continue to use it when you can replace it with something that lacks that flaw?
plasmoid wrote:Things could be improved with tiers.
Could be, but the stated flaws will exist in every single tier-based system - they are systemic in nature.
Prior to simply accepting that the Earth revolved around the Sun, astronomers built more and more complex models to explain new observations they had made with improved methods. Geocentric models were how things had always been done, so they kept layering on new and more complex methods to force a square peg into a round hole.
plasmoid wrote:But you can still strive for all factions being equally likely to win prior to choosing your side. That way no faction becomes the go-to choice. And that diversifies the meta. Which I think is a great thing. And as I've tried to show, creating this pre-selection balance does not have to wreck race-vs-race balance overall. It can improve it more than it damages it.
Are you using the term "factions" to refer to rosters, or tiers? It won't be the case either way in a tier based balance system for the reasons I laid out above, but sure, in the large scale you can make them closer... but at the tournament level variance is likely to be very, very high... and since this is meant to be a tournament balancing system not an aggregate tournaments results balancing system, that variance is important.
plasmoid wrote:I don't think that assigning tiers based on attending teams would make much sense, as you'd still only be facing 6 of them.
You wouldn't know which 6, however, and your balancing system would have been applied based on the teams that are present, not teams that might have been present on a different day.
plasmoid wrote:But I do think that match level balancing could be interesting - basically something like the inducement system, but ammounts granted would be based on race-vs-race win percentages instead. There'd still be the problem though, that lots of those race vs race stats have very wide CI95 ranges. But it sounds like it could Work.
The wide CIs are related to your dataset and the smaller sample sizes. I think it's pretty much guaranteed that the distribution of any given roster's win rates will be quite a bit narrower than the distribution of the aggregate win rates for all the rosters within a tier, making any objection to those CIs something that should be amplified when dealing with a tier system instead.