What is this place?
Hi, my name is Ken Roberts (firstname.lastname@example.org). I live in Raleigh, North Carolina, USA. I created this place to shed light on 3 things:
- Your team’s odds of making the playoffs, or winning the title for sports without playoffs.
- How today’s games impact those odds, so you know who to pull for.
- How well they need to finish the season to have a shot.
It knows the season schedule and scores for past games. As games are played it grabs the new scores from the internet (or gets scores sent in from fans) and simulates the rest of the season by randomly picking scores for each remaining game. The weighted method takes the opponents record and home field advantage into account when randomly picking scores, so the better team is more likely to win. The 50/50 method gives each opponent an equal chance of winning (or tying if the sport allows it) each game. When it’s finished "playing" all the remaining games it applies the league’s tie breaking rules to see where everyone finished. It repeats this random playing out of the season million of times, keeping track of how many "seasons" each team finishes where. Finally it updates the site with the new results for you to read with your morning coffee.
For example, this is how the tally progressed for the NHL Eastern Conference (partway through the 06-07 season). The Conference has 15 teams, 8 of which make the playoffs. The 3 division winners get the top 3 seeds.
|Number of times teams finish at each position after 1 simulation run.|
There are an infinite number of ways the season could play out, but by randomly picking a bunch we get a good approximation. This way of estimating is called the Monte Carlo method.
- The algorithm does not know about things like trades, injuries, and matchups. It does not know that a team has started believing in themselves.
- "Out" does not necessarily mean mathematically eliminated. It just means in the millions of times I played out the season the team never made the playoffs. Likewise for "In".
- The code could have bugs. The nice thing is the numbers are broken down such that bugs don’t stay hidden long. If you find a new one I’ll buy you a beer. Domestic. Milwaukee’s Best, Schaefer, something like that.
Is Sports Club Stats accurate?
Can you track your overall success rate year to year?
I have not done that, but I could with help from some statistician friends. I imagine it would be the same kind analysis that Bill James used to test his Pythagorean formula that underlies how I estimate a team's true strength. The formula was fine tuned by David Smyth for baseball and Alan Ryder for hockey.
OK, but how accurate is it?
It has 2 limitations:
The first is mostly a technicality. On the NHL Playoff Chances page I show odds for getting each seed. If I could run the simulation for an infinite number of seasons you likely would see a few extra 0's show up under seeds that previously where blank. You usually don't care, those new 0's are 1 in a million long shots, but you can't use the site to say a team is mathematically in or out. When the page shows "In" or "Out" it just means the simulation never found a case where they missed/made the playoffs.
The second limitation is important. I play each remaining game by flipping a coin.* The 50/50 version uses a true coin. Not very accurate. The weighted version uses a loaded coin. THIS IS THE KEY: If I magically knew the precise value to load each game's coin then every number on the site (the odds, the big games, the what ifs) would be perfectly accurate. But I can't know that, so for now I load each game's coin based on the 2 opponent's records.** Errors creep in when a team's true current strength is different from what their record implies. You have to read the numbers with this in mind. To take an extreme example, the Blues "chance will make playoffs" graph probably dips too low mid-season, because the system does not factor in injuries. If I were a Blues fan mid season I might say "these players are coming back, I bet they will give us X extra points in the second half of the season. I'm going to adjust Sports Club Stats numbers by looking X points higher up the "What If" table.
OK, but, again, how accurate is it?
I don't know yet. I find it accurate enough to be a useful, interesting tool. The more I use it the more it surprises me with unexpected results that turn out to make sense.
* For hockey I also give each game a 22% chance of going into OT (22ish percent of games went into OT in 2007-2008). And I randomly give the winner between 1-5 goals and the loser something less. I care about goals because it is one of the tiebreakers. Finally, for the weighted version I give the home team a 4% boost in their chance of winning, to match the league average. (Other leagues have their own values for ot/tie probability, typical points scored per game, and home field advantage.)
** For many sports it is actually based on each opponent's goals delta because, on average, that is a more accurate reflection of a team's true strength. And it applies a mathematical trick called "regression towards the mean" to give less credence to records early in the season when few games have been played. To "load the coin" it uses a mathematical trick called log5 to turn the 2 team "records" into the chance that the home team will win the game.
- I use C# to write the program that grabs the latest scores, computes the probabilities, creates the static web pages, and FTPs the pages to my webhost, A Small Orange.
- The blog parts, including the previous run on sentence, is generated with WordPress.
- Flot makes the graphs pretty.
- Chilkat Software makes the HTML parsing, email handling, and FTPing easy.
- The tabs along the top are from Sliding Doors of CSS by Douglas Bowman. I stole the css for the side tabs from GMail.
- The team logos are from Chris Creamer's wonderful SportsLogos.Net.
- The translation flags are from IconDrawer.