Examining the NAF meta

VoodooMike · Post by **VoodooMike** » Mon Dec 11, 2017 9:42 pm

plasmoid wrote:We're blessed with 24 "factions" in BB, which somewhat disguised the problem. But I think any PvP game with factions like, say, Overwatch (25?), Hearthstone (9) or Starcraft (3) ought to strive for a good balance between the factions. In this respect I find the information that "faction x never really beats itself" completely devoid of meaning, and this only gets worse of any faction gets played significantly more than average.

If you're using win rates that are not broken down into race1 vs race2 then no - you're pretending a race will never face itself. You're pretending it didn't face itself during the tournaments you're taking data from when calculating the win rates that you're going to use to "balance" things.

plasmoid wrote:But I think that the meta-game, the game of team selection, can be balanced - i.e. that no team comes in with a substantial advantage or disadvantage already before anyone has chosen their team.

You also thought orcs were underpriced, and that the analysis you launched this thread with was something other than gobbledygook. You're aiming to create a system that is overly simplistic (for mentioned reasons, though I'll end up elaborating if you don't understand them) while supporting it with bad math... or even legitimate stats misapplied (the latter being the fault of other people - plural).

plasmoid wrote:EuroBowl rules (and lots of Danish tournaments) have showed me that teams can be pushed into different performance tiers.

Your belief in the legitimacy of performance tiers is the real issue here. You can absolutely change which teams have an advantage and which have a disadvantage, but what you cannot do is use the sort of simplistic "tournament tier" you're aiming for: to create legitimate balance between the rosters without accounting for composition prior to determining placement for rosters in different categories.

plasmoid wrote:The problem, as you say, is that we may well just push teams around and create a new top/bottom. I tried to adress this by creating the excel sheet that would (to some extent) show the effect on other teams with each performance tweak so that for example, in theory, massively boosting High Elfs wouldn't just send Lizardmen into a tail spin.

You will just push the teams around, yes. Its not a potential outcome, it is an inevitable outcome if you try to use the sort of system you're trying to develop... or any of the tournament tier systems currently used.

plasmoid wrote:Is it possible to project win rates based on global composition, so in essence, given that we know which races have shown up to tournaments for the past 6 years, what the most likely outcome would.

No, and that should be obvious given what you know (and very clearly do NOT know) about those tournaments. You seem to think the tournament data will be better data than, say, FUMBBL data, because those tournaments are (usually) rez, and are the style of play you're trying to affect... yet, you don't know much about the sort of houseruling that was involved, and a whole lot of them already pushed the mess around using just the sort of tournament tier system you're trying to build. The external validity of anything made with that data is going to be wretched, and the confidence intervals wiiiiide.

plasmoid wrote:That's what I tried to do with what I think you refer to as normalized distribution.

I'm pretty sure you didn't do anything with what I refer to as the normal(not ized) distribution. You didn't work with distributions.

plasmoid wrote:I get that no 10 (or 24) team tournament will indeed be normal in distribution, but trying to figure out which team/faction is likely to win (i.e. a stronger or weaker team choice) is still a relevant consideration... In that most attending coaches will consider it.

The word you're aiming for is not "normal" it's "uniform". Because you don't know what the composition will be, you can't predict the outcome or even the advantage ahead of time. You seem convinced you can... but it's like trying to predict whether a racecar or a horse will win in a race without knowing the terrain. You can certainly declare that racecars are faster than horses in general, but if the race ends up being in a hilly forest, it isn't the horse that needs a heap of additional benefits to be competitive.

plasmoid wrote:I do think that strong tier rules can make a World of difference.

So does exposure to high levels of radiation.. those differences just aren't likely to be an objective improvement (except in superhero comics).

plasmoid wrote:Even so, I'm curious to hear your thoughts on how to fix/de-bias tournaments. Would you care to share your thoughts on that?

There isn't any way to do it using the sort of tournament tier system that is popular with tournaments. To legitimately eliminate, or at least significantly reduce bias you'd need a more complicated system... either one that assigns membership to "tier" groups only after rosters are fully selected, or which applied the balancing not at team creation but at the match level, etc.

TheVoiceofJericho · Post by **TheVoiceofJericho** » Fri Dec 15, 2017 7:55 pm

Haha wow calling people Pussy's over a statistics discussion. I've seen it all now.

Everybody else, a very interesting read. Thanks.

kyrre · Post by **kyrre** » Sat Dec 16, 2017 11:27 am

plasmoid wrote:Hi Kyrre,
nice Work!
Now if someone could set up a way to sort all of these (or some of these, if one wanted to start at a specific date) into win percentages for each race-race matchup, then that would be super helpful!
Anyone?

Thank you. As it turns out I am not all that great with spreadsheets and Libre Calc. Nevertheless, after hours and hours of agony, I came up with a slow atrocity that might answer your question. You have the option to edit the dates and whether mirror matches should be included. Look for the yellow background. There are 3 useful sheets. All matches, All races and versus race . The latter allows you to change the race you want statistics for. Mind the spelling and date format.

I have used LibreOffice Calc rather than Excel as I do not have access to the former. The original ODF has been converted to XLSX. While it looks to be fine, things might have been lost in the translation.

Edit February 16 2018:
Links moved here: viewtopic.php?f=81&t=40863&p=792162#p792162

plasmoid · Post by **plasmoid** » Mon Dec 18, 2017 11:29 am

Hi Kyrre,
that is a super awesome Tool. Thanks a lot for your hard work.
I hope I'm not the only one who will find this interesting.
Cheers
Martin

plasmoid · Post by **plasmoid** » Mon Dec 18, 2017 12:51 pm

Hi Mike,
sorry for the delay. I did some changes to my original write-up. Most of those ought to be clear from my replies below. As part of my look into "balance" I discovered a decimal error in my stats for Undead vs Vamps, which explains the mysterious result I previously had that Undead didn't seem to be bothered by Swiss pairing. That's fixed now.

Anyway,

If you're using win rates that are not broken down into race1 vs race2 then no - you're pretending a race will never face itself. You're pretending it didn't face itself during the tournaments you're taking data from when calculating the win rates that you're going to use to "balance" things.

I (think I) get what you're saying, and while I do still calculate the no-mirrors math just out of sheer interest, I do not continue down that path, but instead I proceed with mirror matches included.
That said, for the purpose of measuring "power", I still don't see how it does any good to include stats that very clearly do not relate to power (e.g. a 1 snotling team and an 11 griff team would still get a 50% win percentage in their respective mirror matches).

Your belief in the legitimacy of performance tiers is the real issue here. You can absolutely change which teams have an advantage and which have a disadvantage, but what you cannot do is use the sort of simplistic "tournament tier" you're aiming for: to create legitimate balance between the rosters without accounting for composition prior to determining placement for rosters in different categories.

I don't think I understand what you're saying here. I did try to take composition of the global meta into account...

You will just push the teams around, yes. Its not a potential outcome, it is an inevitable outcome if you try to use the sort of system you're trying to develop... or any of the tournament tier systems currently used.

I think you're overstating here.
Sure, it will never be perfect. But I think it can be improved.
I've added a section (the bottom one) to my first post, trying to look at this claim of inevitability. I think the numbers show that Things can improve quite a bit using this method - and even if none of the numbers used are reliable, the fact remains that Things could be improved with tiers.

You seem to think the tournament data will be better data than, say, FUMBBL data, because those tournaments are (usually) rez, and are the style of play you're trying to affect... yet, you don't know much about the sort of houseruling that was involved, and a whole lot of them already pushed the mess around using just the sort of tournament tier system you're trying to build. The external validity of anything made with that data is going to be wretched, and the confidence intervals wiiiiide.

I wouldn't be averse to looking at FUMBBL data for TV1200-1300 teams (having played no more than 15? games - in order to avoid high TV teams with lots of injuries). I just don't have those.
The reason I prefer the data for NAF tournaments is that the vast majority of tournaments only make minor balance tweaks. A skill here or 2 skills there. Which can only do so much. Meanwhile we have plenty of FUMBBL data that shows that win percentage varies quite a bit with changing team values (not to mention the impact of being a substantial underdog or overdog). So we do know that the majority of FUMBBL data will be more misleading than tournament data, when it comes to determining performance around the TV 1220-mark.

I'm pretty sure you didn't do anything with what I refer to as the normal(not ized) distribution. You didn't work with distributions.

Then clearly I don't know what you mean by that.
I started with the NAF data for all games played and resultingly each team being represented in different quantities, and then changed the meta into a meta where all (non tier 3) teams were represented equally, based on the assumption that they would if all teams were roiughly equally powerful.

The word you're aiming for is not "normal" it's "uniform". Because you don't know what the composition will be, you can't predict the outcome or even the advantage ahead of time. You seem convinced you can... but it's like trying to predict whether a racecar or a horse will win in a race without knowing the terrain. You can certainly declare that racecars are faster than horses in general, but if the race ends up being in a hilly forest, it isn't the horse that needs a heap of additional benefits to be competitive.

True. But I think this is true of most games with factions. It would be wonderful if all factions could be truly equal with all individual factions. But you certainly won't get that in BB. But you can still strive for all factions being equally likely to win prior to choosing your side. That way no faction becomes the go-to choice. And that diversifies the meta. Which I think is a great thing. And as I've tried to show, creating this pre-selection balance does not have to wreck race-vs-race balance overall. It can improve it more than it damages it.

There isn't any way to do it using the sort of tournament tier system that is popular with tournaments. To legitimately eliminate, or at least significantly reduce bias you'd need a more complicated system... either one that assigns membership to "tier" groups only after rosters are fully selected, or which applied the balancing not at team creation but at the match level, etc.

Interesting.
I don't think that assigning tiers based on attending teams would make much sense, as you'd still only be facing 6 of them.
But I do think that match level balancing could be interesting - basically something like the inducement system, but ammounts granted would be based on race-vs-race win percentages instead. There'd still be the problem though, that lots of those race vs race stats have very wide CI95 ranges. But it sounds like it could Work.

Cheers
Martin

plasmoid · Post by **plasmoid** » Mon Dec 18, 2017 12:52 pm

In case anyone is interesting, this is my simplistic application of the numbers.I'm fortunate enough that a tournament will be hosted using these rules at the beginning of match in Copenhagen:
http://www.plasmoids.dk/Copenhagen%20Parity%20Rules.pdf

Cheers
Martin

Vanguard · Post by **Vanguard** » Tue Dec 19, 2017 10:27 am

plasmoid wrote:In case anyone is interesting, this is my simplistic application of the numbers.I'm fortunate enough that a tournament will be hosted using these rules at the beginning of match in Copenhagen:
http://www.plasmoids.dk/Copenhagen%20Parity%20Rules.pdf

To save me cross-referencing everything, are the rosters included for reference or have you made changes other than the Tier bonuses?
Also, the Tier info for Tier 0 is practically unreadable against the navy background and there's a typo in the heading in Parity.
Also also, I don't see any restrictions on skill stacking. Is that deliberate or is it mentioned elsewhere?

VoodooMike · Post by **VoodooMike** » Tue Dec 19, 2017 11:54 am

plasmoid wrote:That said, for the purpose of measuring "power", I still don't see how it does any good to include stats that very clearly do not relate to power (e.g. a 1 snotling team and an 11 griff team would still get a 50% win percentage in their respective mirror matches).

Relative power, no, but only if you're drilling down to a specific roster vs roster win rate... which isn't the case. You're trying to use aggregate win rate to detect relative power, which is confounded by the fact that you aren't controlling for composition. If you want to use aggregate win rates you should probably retain mirror matches... if you don't, it doesn't matter since mirror matches won't affect your measure of specific win rates between different rosters.

plasmoid wrote:I don't think I understand what you're saying here. I did try to take composition of the global meta into account...

You're not taking composition into account by simply kicking back and saying "obviously these numbers represent the types of games found in tournaments because the data comes from tournaments, so the composition is already appropriate". You aren't going to find a uniform distribution of rosters in any given tournament (unless making it happen is part of the tournament's design) nor can you assume that across a large enough number of tournaments the distribution can be assumed.

The changing of advantage and disadvantage part, if you don't understand that, I'll explain:

Consider the "tiers" of teams created by the aformentioned cluster analysis. Each "tier" is a grouping of teams which itself has a midpoint and an upper and lower bound (the latter two being defined by the edges of the win rate distributions of the top and bottom teams of a given tier). The "balancing" action for these tiers would be to try to apply a correction to each such that their centroid (the midpoint of the cluster/tier) roughly, or precisely, overlaps the midpoint of each other tier, bringing the tiers together, right?

...except what you've got then is a new hierarchy. The widest tier now defines the team with the biggest advantage over others, and the biggest DISadvantage relative to the others based on the top and bottom teams of the widest tier. The "balance" is then based on how similar the widths of those tiers are rather than the general success of the individual rosters. If, prior to balancing, you had tiers 0 through 4, say... and tier 2 was the largest, covering the largest span of win%s, then your new hierarchy makes the best team in tier 2 the best team overall and the worst team in tier 2 the worst team overall in your "balanced" new system.

Now, I saw el-dingus say the centroid of each cluster was within the 95CI of each roster contained in the cluster... that's going to be deeply misleading and based on very small sample sizes. We know from looking at large scale data from online play that the individual roster distributions will very quickly fail to do this. You cannot rely on a lack of information to justify a system's utility, and that's what using wide CIs from small samples represents.

plasmoid wrote:I think you're overstating here.
Sure, it will never be perfect. But I think it can be improved.

It may improve balance depending on how similar in width the roster groupings are, but the final balance depends on the intra-tier balance of the largest/widest tier. If you know there's a flaw in the particular design of the balancing system, why continue to use it when you can replace it with something that lacks that flaw?

plasmoid wrote:Things could be improved with tiers.

Could be, but the stated flaws will exist in every single tier-based system - they are systemic in nature.

Prior to simply accepting that the Earth revolved around the Sun, astronomers built more and more complex models to explain new observations they had made with improved methods. Geocentric models were how things had always been done, so they kept layering on new and more complex methods to force a square peg into a round hole.

plasmoid wrote:But you can still strive for all factions being equally likely to win prior to choosing your side. That way no faction becomes the go-to choice. And that diversifies the meta. Which I think is a great thing. And as I've tried to show, creating this pre-selection balance does not have to wreck race-vs-race balance overall. It can improve it more than it damages it.

Are you using the term "factions" to refer to rosters, or tiers? It won't be the case either way in a tier based balance system for the reasons I laid out above, but sure, in the large scale you can make them closer... but at the tournament level variance is likely to be very, very high... and since this is meant to be a tournament balancing system not an aggregate tournaments results balancing system, that variance is important.

plasmoid wrote:I don't think that assigning tiers based on attending teams would make much sense, as you'd still only be facing 6 of them.

You wouldn't know which 6, however, and your balancing system would have been applied based on the teams that are present, not teams that might have been present on a different day.

plasmoid wrote:But I do think that match level balancing could be interesting - basically something like the inducement system, but ammounts granted would be based on race-vs-race win percentages instead. There'd still be the problem though, that lots of those race vs race stats have very wide CI95 ranges. But it sounds like it could Work.

The wide CIs are related to your dataset and the smaller sample sizes. I think it's pretty much guaranteed that the distribution of any given roster's win rates will be quite a bit narrower than the distribution of the aggregate win rates for all the rosters within a tier, making any objection to those CIs something that should be amplified when dealing with a tier system instead.

plasmoid · Post by **plasmoid** » Tue Dec 19, 2017 2:55 pm

Hi Vanguard,
thanks for the feedback. Quick reply on the daily commute:
*The blue looks fine on my screen, but I can change it all the same.
*No changes done to any rosters. Purely for reference.
*Stacking not allowed. I’ll have to add that.
Cheers
Martin

Talk Fantasy Football

Examining the NAF meta

Re: Examining the NAF meta

Re: Examining the NAF meta

Re: Examining the NAF meta

Re: Examining the NAF meta

Re: Examining the NAF meta

Re: Examining the NAF meta

Re: Examining the NAF meta

Re: Examining the NAF meta

Re: Examining the NAF meta