Model Madness 2019 - The EDM Bracket (Part 1)

in #model-madness5 years ago

This week, I'm going to analyze the bracket from the perspective of my best model, the Estimated Difference Model (EDM). We'll go region by region, round by round, picking some teams, performing some analysis and looking at which upsets are smart versus which upsets might be costly to your bracket.


orange.png

Here are some things to note before we begin with the analysis. The team with the highest EDM rating progresses and teams in the First Four are represented by the better team (by EDM) in this bracket. This means that if the lesser team wins, the final EDM bracket may change slightly from this one. Here are the EDMs of teams not featured in the 64-team bracket for reference. All teams will have their EDM rating next to their name as a reference.

11 Temple (1994)
11 St. John's (1688)
16 Prairie View A&M (1575)
16 NC Central (1177)

We'll be going over the East and West Regions today, then the South and Midwest Regions (and Final Four) tomorrow. I would love to dedicate time to showing predictions for all my models, but unfortunately I have a full-time job that prevents me from posting all that data in neatly formatted posts. But I'm giving you the results of the top performing model, so for those filling out brackets hopefully this can give you additional insight in how the tournament is structured this year.

The East Region


Opening Round

1 Duke (3068) beats 16 North Dakota State (1378)
8 VCU (2499) beats 9 UCF (2312)
5 Miss State (2366) beats 12 Liberty (2060)
4 Virginia Tech (2608) beats 13 Saint Louis (1963)
11 Belmont (2464) beats 6 Maryland (2222)
3 LSU (2400) beats 14 Yale (2031)
7 Louisville (2412) beats 10 Minnesota (1924)
2 Michigan State (2883) beats 15 Bradley (1611)

Duke is a pretty safe pick to move to the next round with them having an estimated 17-point advantage (1700 points) over North Dakota State. VCU edges UCF. Neither pick seems like a bad one, although VCU was a trendy name in past tournaments so picking UCF might have some upside potential.

Most folks look for upsets in the 5-12 matchup since this is often the spot where mediocre large conference teams tend to lose to underrated smaller schools who play in the smaller weak conferences. For those unfamiliar with the tournament, the tournament committee that seeds the tournament tend to underrate these smaller teams as they don't play a lot games against elite competition. That why 5-12 and 4-13 are trendy picks for upsets since the best small schools tend to get 11, 12 and 13 seeds.

Miss State is slightly favored in their game against Liberty and Virginia Tech is a comfortable pick against Saint Louis who is kind of overrated for their seed. I wouldn't pick upsets here as there are better options in other regions of the bracket. Belmont is the only upset here as they have a better score here, but this is contingent that they win their play-in game against Temple (1994).

LSU is projected to beat Yale, but LSU is a weak 3 seed, so a bold 14 seed pick isn't out of the question, but is still risky. Louisville is projected to defeat Minnesota who is a very weak 10 seed by this model. It should be noted that a lot of 10 and 11 seeds are weak this year due to a very weak bubble. This means there were a lot more mediocre teams to select from than in prior years of the tournament. Michigan State is a comfortable pick against the weakest 15-seed Bradley.

Round of 32

1 Duke (3068) beats 8 VCU (2499)
4 Virginia Tech (2608) beats 5 Miss State (2366)
11 Belmont (2464) beats 3 LSU (2400)
2 Michigan State (2883) beats 7 Louisville (2412)

It's no surprise that we see Duke moving to the next round. Given that the 8-9 seed is usually a toss-up, it is usually good practice to select the 1 seed in the Round of 32. Virginia Tech beats Miss State in a close match up.

Belmont upsets LSU to become a double digit seed to make the Sweet 16. For those that think Yale might upset LSU, a good way to hedge that bet is to select LSU to win that game but lose in this match up. You sacrifice perfection to reduce risk in the early rounds. Michigan State holds a fairly large advantage, so they should proceed to the Sweet 16 pretty comfortably.

Sweet 16

1 Duke (3068) beats 4 Virginia Tech (2608)
2 Michigan State (2883) beats 11 Belmont (2464)

Duke continues their run and Michigan State ends Belmont's streak. This is usually a good place to end runs for lower seeds since the odds of selecting the correct low seed drastically decreases as the field is narrowed down to only elite competition.

Elite 8

1 Duke (3068) beats 2 Michigan State (2883)

It should be noted that Duke's score might be a little lower than their true score since they have played a large chunk of the end of their season without their best player Zion Williamson. Now that's he is back, Duke is probably the favorite in this game. Michigan State provides a good safe value pick for those that doubt Duke.

The West Region


The Opening Round

1 Gonzaga (3608) beats 16 Fairleigh Dickinson (1664)
8 Syracuse (2174) beats 9 Baylor (2026)
12 Murray State (2449) beats 5 Marquette (2402)
4 Florida State (2561) beats 13 Vermont (2241)
6 Buffalo (2787) beats 11 Arizona State (1966)
3 Texas Tech (2777) beats 14 Northern Kentucky (1973)
7 Nevada (2695) beats 10 Florida (2222)
2 Michigan (2769) beats 15 Montana (2005)

Gonzaga lost their last game to Saint Mary's to lose their conference championship in a shocking upset. I suspect they'll be ready and should beat Fairleigh Dickinson or Prairie View A&M with relative ease. Syracuse edges by Baylor although Baylor has more upside in that Syracuse is the more popular team and has a history of tournament runs as a low seed. Which means your bracket looks better if you pick Baylor and everyone picks Syracuse and Baylor wins.

Murray State is a popular upset pick here at 12 and for a very good reason. They are a very good team that beat Belmont to make it to the tournament. And since they are a popular upset pick, that reduces the potential downside risk to selecting them. Florida State is projected to beat Vermont, but Vermont is fairly solid for a 13 seed if one felt like taking that risk.

Next up is probably the best mid-major team in the country: Buffalo. They have 30 wins to their name against decent competition. Since their competition looks weak (Arizona State or St. John's) they seem like a good pick to move to the next round. Texas Tech beats Northern Kentucky, Nevada should beat Florida who lost 15 games, and Michigan should beat Montana who they beat in the tournament comfortably last year.

Round of 32

1 Gonzaga (3608) beats 8 Syracuse (2174)
4 Florida State (2561) beats 12 Murray State (2449)
6 Buffalo (2787) beats 3 Texas Tech (2777)
2 Michigan (2769) beats 7 Nevada (2695)

Gonzaga comes into the second round as heavy favorites. Although Gonzaga is perceived as the weakest one seed, they did beat Duke and have only lost to other tournament teams. They have dominated their competition. Yet people will doubt Gonzaga here. This frankly isn't a smart place to do it. Florida State has a narrow edge against Murray State and is definitely the safer pick of the two. But if you have Murray State to the Sweet 16 and get it right that has a lot of potential to move your bracket up the ranks.

This region is probably the most chaotic of the bunch. You could easily pick any one of Michigan, Texas Tech, Buffalo, or Nevada to make it to the Elite 8 or Final Four. They're all within 100 EDM points of each other. That means their expected difference in their scores is less than one point. That's crazy. We select Buffalo and Michigan to move to the next round. It could easily be the opposite which makes this part of the bracket scary.

Sweet 16

1 Gonzaga (3608) beats 4 Florida State (2561)
6 Buffalo (2787) beats 2 Michigan (2769)

Gonzaga keeps rolling with our model. Florida State beat them last year, but Gonzaga looks like a different team and Florida State might have to contend with Marquette or a dangerous Murray State team. And Vermont isn't bad either. Gonzaga is the safe pick here.

But for those looking for chaos, we move to the other game. We have Buffalo shocking everyone to get to the Elite 8. By EDM, this should be expected. By common basketball consensus this is bold. Nevada would also be a reasonable bold pick to move onto the Regional Final as well.

Regional Final

1 Gonzaga (3608) beats 6 Buffalo (2787)

For those that think Gonzaga is a weak one seed, that may be true. EDM has a slight bias to teams that play in weaker conferences. That why Buffalo is here. Buffalo also only lost 3 times. But you should pick Gonzaga to go to the Final Four. Not because you like them, but because the other side of the West Region is a choatic mess where four teams each have a reasonable shot of making the Final Four. Gonzaga has a pretty straightforward path to the Regional Final. Math makes a compelling argument here. Also, Gonzaga is at least 9 point EDM favorites against all competition they might face in their region. Which adds to the reasons in their favor.


This concludes Part 1 of 2 of the EDM bracket. Tomorrow, we'll go over the other side of the bracket and summarize everything up. See you then.

Sort:  

You know, I know nothing about sports (except for the small bits that I've gleaned from my roommate's obsession with football and a general obsession of my own with games and game design) and I care even less, but what I'm interested in is math and matching systems. Which is effectively what we have here, except in the case of a matching system you're looking to find the match ups which are the most uncertain and thus produce the most memorable games for the players and not trying to discover the most likely winner. Still, you're looking for a series of exchanges which allow you to decide an abstract spatial difference between two players.

It occurs to me, and I'm sure that this is not a new idea, that there's nothing about these algorithms that say they have to be run on current data. Knowing the general impulses of sports fans, I'm betting that the last few decades of scores from pretty much everything you ever might want to try and predict brackets for our stored online somewhere, along with a pile of other metadata and other potential informative signifiers. It would be interesting to see the results of the algorithms run on historical data to see if they can predict how things actually turned out compared to the ground truth. It's probably not terribly interesting for people who have been immersed in algorithmic bracket creation for a while, but as someone from outside the field, it would be interesting. I would compare the predictive potential of different historical eras in whatever sport in question is in play; for example, is predicting the bracket of football teams in the 70s significantly different in algorithmic result than feeding the data from the 80s in? As a random example more than anything.

Sure, granted data gets more sparse the further back you get into time. Advanced statistics in the sports realm as only really gotten popular in the last twenty years. But given large amounts of time to waste on this problem it would be interesting to run these models on prior data and against prior events to see how effective they are over time. College basketball has been tending recently to being more random given the higher percentage of three point shots being taken nowadays versus twenty years ago. It would be interesting to see if this increased randomness makes games harder to predict nowadays versus in earlier periods.

See, that's the sort of thing that I find fascinating.

When the dynamics of the meta-game change, how does that affect our ability to predict the outcome of games going forward based on all the data we have versus smaller and more specific slices of the data for training? There's probably an entire PhD thesis waiting for somebody with a pile of sports knowledge, computing power, and the patience for sorting through a lot of tedious, scattered information – but I would definitely read that thesis.

It would also be interesting to compare the predictive power of the various models run against one sport versus another. Is the bracket density of college football more amenable to statistical analysis than college basketball? I have no idea; I barely know enough to even ask the question. But it's interesting!

I am all for more analysis of algorithms and looking at their applicability across novel regimes. That's cool stuff. Which is why I'm following this series of posts even though I really have no grounding or interest in the sports side of things. It's fascinating in and of itself.

Congratulations @statsplit! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You received more than 2000 upvotes. Your next target is to reach 3000 upvotes.

You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

Coin Marketplace

STEEM 0.28
TRX 0.12
JST 0.033
BTC 70130.51
ETH 3786.12
USDT 1.00
SBD 3.78