Tuesday, March 29, 2011

March Madness II

If you're into sports, I'm sure you've been hearing/talking about how crazy this year's NCAA Men's Basketball tournament has been.  Even if you're not into sports, you've probably been hearing about this. 11th seeded Virginia Commonwealth University (from Richmond, VA) has won 5 games as the underdog and leads the final field in terms of surprises.  Many experts thought that VCU shouldn't have even made the tournament.  Butler University (from Indianapolis, IN) is also in the Final Four as an #8 seed. Yes, they were in the Final Four (and championship) last year, but they were a 5 seed then and still considered a 'Cinderella team.' I'm sure this Cinderella status was helped by the fact that they came from a "mid-major" conference, a conference outside the traditional powerhouses.  The two other teams: Kentucky (a 4 seed) and University of Connecticut (a 3 seed).  While less surprising, these two teams were far from the "obvious" choices.

So, how crazy is this year's tournament?

Out of more than 5.9 million ESPN brackets, 2 people correctly picked the Final Four. Less than 7% of people picked all four of these four teams to win. Those picks were crazy.

Yeah, well most people don't know anything about basketball and just pick a bunch of high seeds or teams with mascots with funny names.  Doesn't make this crazy.

This is the first year since 1980 that no #1 seeds are in the Final Four. That's crazy.

Who says having all 4 #1 seeds would be normal?  It's only happened once before (2008). That would be crazy.

This is the first year ever (since seeding began in 1979) that there aren't any #1 or #2 seeds in the final four. It's crazy that none of the eight best teams are in the finals.

Yeah, yeah. Well on the day those eight teams played they weren't the best eight teams.  This is also the first year ever that there was a team that began with the letter V and a team that began with the letter B.  Doesn't make that special...or crazy.

According to the Vegas odds at the beginning of the tournament, Ohio State, the overall #1 seed, had a 62% chance of making it to the Elite Eight and they did not.  I could have made crazy money on that game1.

According to the Vegas odds at the beginning of the tournament, there was a 9.6% chance of no #1 seed making it to the Final Four.  Small, but an expectation that this will happen once a decade is not crazy small.

If this had been last year, VCU wouldn't have even made the tournament.  They only got a chance because the field expanded from 65 teams (with one "play in" game) to 68 teams (with four play-in games). It's crazy that the last team in is in the Final Four.

Well then I guess it was crazy of the selection committee to only include 65 teams.  The only reason to have VCU in the tournament is if they have a shot.  See, not crazy.

There's an 11 seed in the Final Four!  This team was, at best, ranked below 40 other teams. How crazy is that?

An 11 seed made the Final Four just 5 years ago (George Mason University).  And they were also from Virginia.  Not crazy at all.

The odds of VCU making the Final Four was 3 in 10,000.  That's crazy small.

Yes, but people would have been saying the same thing if any of the high seeds were in the final four.  In fact, the odds of an 8 through 16 seed of making the Final Four is not actually that small, it's 38.6% (actually, the math on this is not quite right because this percentage includes the possibility of having more than 4 teams seeded 8 or lower in the Final Four which, since there are only 4 slots, is not possible.  Any thoughts on how to resolve this issue without doing a lot of computations are welcomed). As a comparison, this is about the same percentage as Duke (40.3%).2  Not so crazy small after all.

Yeah, but there are two teams seeded lower than 8 in the final four.  That's only happened once before (2000) and they were both 8 seeds. An 8 and an 11 seed?  One of which is guaranteed to be in the final?  Crazy.

I spent some time thinking about the best way to compute the Vegas odds of having 2 teams with an 8 seed or lower in the final 4. You could find the probability of two specific teams making it by multiplying their individual probability and then do this for every possible 8-16 seed legal combination (remembering that teams from the same bracket can't both make it). This is what I ended up doing (by writing a Python program). I have yet to come up with a more clever way of thinking about this. Anyway, I got 3.59%3. And since I'm sure you're wondering, there's a 3.75% chance of having at least 2 teams seeded 8 or lower in the final 4.  A small percentage, yes, but not absurd...about the same odds as Gonzaga had of making the Final Four. Considering their success in recent years, I'm not sure people would have called this crazy.

So, what do you think?  Crazy? 

1Vegas odds are based on the underlying idea that bookies want to take the same number of bets for both teams...well, unless they get greedy. So if they determine that the chances of team A winning are 15% (based on whatever metric they devise), the line might be something like +850 for that team.  This means that if you bet $100 and the underdog wins, you'd win $850 (a good enough return to incentivize betting for the underdog).  Whereas the line for team B, the favorite, might be -870 which means that you need to bet $870 to win $100. So consider if 1000 people bet on both teams. The casino takes in $10,000 from the people who bet on the underdog and $87,000 from the people who bet on the favorite for a total of $97,000. If team A, the underdog, wins, the casino pays out 1000*(850+100)=$95,000 and banks $2,000.  If team B wins the casino pays out 1000*(870+100)=$97,000 and still makes $1,000. Wash. Rinse. Optimize for maximum profit. Repeat.

2This brings up a common misconception in probability theory.  If you flip a coin ten times, which is more likely:


In fact, both are equally likely even though the first looks much sketchier than the second. The problem is, the second "looks more random" than the first.

3Here's the code I used for this computation.

#Vegas odds for seeds 8 through 16 in each region

East=[.004, .012, .004, .017, .001, .0004, .0002, .0001, .000001]
West=[.009, .004, .012, .016, .0005, .002, .0006, .0001, .000004]
Southwest=[.024, .029, .014, .0003, .01, .0004, .0002, .0002, .00002]
Southeast=[.01, .012, .039, .064, .046, .001, .0003, .00009]

#step through each of the possible pairs of teams from different regions that
#could represent in the final 4
for i in range(0,8):
     for j in range(0,8):

print psum


  1. The sum of the seeds is 25... freakishly high.

  2. And the square root of the mean of the squares is over 7... a record unlikely to be matched in our lifetimes.

    8 and 11 vs 8 and 8, I think, is a big difference.


  3. @Jonathan: I should preface this by saying that I am definitely definitely no where close to being a statistics expert. So feel free to take this with a grain of salt. I agree that these statistical representations, in comparison to previous years, help further the argument that this year's tourney is an anomaly. That said, I think we have to be careful not to give meaning to the seed numbers that isn't there. The seeding isn't a ratio-scale measure (or, I would argue, even an interval scale), meaning all we can say is that a 1 seed is higher than a 2 seed. They are ordinal measurements. We can't say that it's twice as high (or anything else). So I'm not sure how informative things like the standard deviation will be.

    As for an 11 seed being more of a surprise than an 8 seed, I also don't see this as a forgone conclusion. Nate Silver at 538.com said it better than I, though.

  4. And a less statistical argument for an 11 seed potentially having a better shot than an 8 seed, consider the case where there are just a few great teams and then lots of good teams (the Women's tournament might be a good example of this). For simplicity sake, let's say the #1 seeds are the great teams. An 8 seed team will have to play this great team in the 2nd round of the tournament (unless the #1 seed loses its first game, which has never happened). An 11 seed wouldn't have to play this team until the regional finals (the 4th game). More opportunities for the #1 seed to get unlucky and get knocked off by another team (although it's worth noting that VCU did beat the #1 seed Kansas this year).

  5. Oh, I'm not taking myself too seriously. And I hesitate to argue with NS (though he does miss, occasionally)

    And I'm caught - my favorite team wins, I'm happy. My favorite team loses, I win a pool. (true)

    There's a more serious lesson in that! Maybe not.


  6. Well, what I am taking from this blog is that no matter what people are betting the casino's are still making money regardless of the outcome of the game. However, I am only taking a High School statistics class and nowhere near being able to say what is wrong nor what is right; but I will say that I heavily agree with all of the computations made based on the information given.

  7. ... but what are the chances Butler misses that many shots in a row?

  8. @Jonathan: How'd you do? Did your team win or did your pool win?
    @John: Thanks for visiting and commenting. I'd agree that, unless they get particularly greedy and stray from trying to get an equal # of bets on both sides, you're right that the Casino always wins.
    @Bowen: It's as if Butler had a bag filled with luck matter that they efficiently used over the course of the tournament (check out the last 30 seconds of the Pitt game) and that they then got to the final game and the bag was not only empty of luck matter, but filled with luck anti-matter. Or something like that...all I know is that was a painful game to watch.

  9. Following up to John's comment, there have been numerous times where the "Vegas Line" on a game is different from what they actually think in terms of the teams' abilities. Gambling lines are set with the intent of getting 50% of the people to bet on each side.

    If the gambling line isn't set well enough, it "shifts" which can create weird situations where gamblers are hedging bets. In one famous example, a betting line went from 1 point to 4 points, and a LOT of money was bet for one team to win the 1-point wager, then lose the 4-point wager -- a tiny loss if it doesn't happen, and a huge win if it does -- and the team won by 3.

    Go Butler! Has any team in the NCAA final shot worse from 2-point land than from 3-point land before?? That's CRAZY.