Edit: Any way to enable embedding of images in this forum? It's kind of clunky to have to click on all of the photobucket links in this post. Over the last few months, a colleague and I worked on coming up with a “points probability model” (similar to the win probability models that have been developed for baseball—for more info, see http://www.tangotiger.net/wiki/index.php?title=Win_Probability_Added). To construct the model, in a nutshell, we used historical data from MLS (2004-2008) to calculate, for each “game state” that Team X could be in (where game state was defined by what minute the game was in, whether Team X was the home team or the away team, and how many goals Team X was leading or trailing by), how many points Team X would expect to earn from a game on average (sort of, but not exactly—if you’re interested in the details of the mathematical geekage behind this, PM me, and I’ll tell you more) if they were in that game state. Just to give a couple of illustrative examples, according to the model... ...an away team that’s ahead by 1 goal in the 26th minute will earn 1.55 points from that game on average (or, more intuitively, if they’re in that exact situation in 4 different games, you’d expect them to earn about 6 points from those 4 games) ... an away team that’s ahead by 2 goals in the 58th minute will earn 2.66 points from that game on average (or, more intuitively, if they’re in that exact situation in 5 different games, you’d expect them to earn about 13 points from those 5 games) So then, using the model, we looked at every goal that was scored in MLS in 2009 and calculated how much it was worth in terms of expected points, given the state of the game when it was scored. For instance... ...a goal scored by the home team in a tie game in the 80th minute raises the home team’s expected point total from 1.30 to 2.63, so it’s worth (2.63 – 1.30) = 1.33 points ...a goal scored by the home team when it’s ahead by 2 goals in the 75th minute raises the home team’s expected point total from 2.92 to 2.99, so it’s worth (2.99 – 2.92) = 0.07 points We also made one additional adjustment—since about 80% of penalty kicks were converted in MLS over the time frame covered by our data, we decided to work under the assumption that winning a penalty actually gets you 80% of the way to a goal, while converting a penalty only gets you the remaining 20%. In light of that, we discounted PK goals by 80% when we calculated their value. So after doing all of this, we determined, for each player in the league, the aggregate value of the goals they scored in 2009. Some notable ones are listed below. In reading through, keep in mind that this doesn’t necessarily tell us anything about the inherent quality of these players, since 1) not every goal requires the same amount of skill/quality/whatever to score and 2) it’s debatable as to whether it takes more skill/quality/whatever to score goals in “high leverage” situations (ie, situations where a goal would be more valuable). Also, it’s obvious that virtually every goal that’s scored requires a contribution from multiple players, and not just the goal scorer. We’re not taking that into account here. Regardless, though, we think it’s an interesting exercise. The top 10... Pretty much what you’d expect. although Cunningham had a few more garbage-time goals that Casey did, so those two are flip-flopped with respect to goals and EPA. Also, Cummings and Shalrie (8 goals) come in at 9 and 10, ahead of guys like Ryan Johnson (11 goals), Luciano Emilio (10 goals), and Robbie Findley (12 goals) Also interesting—Robbie Findley’s 12 goals were worth only about two-tenths of a point more than Patrick Nyarko’s 4 goals and Chris Pontius’s 4 goals as well. Other guys who scored a decent number of goals that didn’t count for as much as Nyarko’s and Pontius’s Dasan Robinson’s 1 goal this season (scored in the stoppage time to give the Fire a 3-2 lead over Colorado on 8/23) moved the needle on his team’s expected point count by almost 2 full points, about 1.4 points more than the aggregate value of Cuauhtemoc Blanco’s 5 goals for Chicago (3 of which were from the penalty spot). And then there was the least consequential goal of the year, the penalty kick converted by Mathew Mbuta last Sunday to up RBNY’s lead over Toronto from 4-0 to 5-0. That one was worth about three ten-billionths of an expected point for the Bulls. So that’s an overview. Anyone have any thoughts, criticisms, suggestions for further analysis along these lines?

Huh, interesting. I guess Elninho had the same idea, although he used a different model. (Shockingly) I like my model better, since it's based on recent MLS data rather than older data from England. Does Elninho still keep up with tabulating this stuff? I'd like to see how well his numbers agree with mine.

The problem as I see it with a model based on real data is that for each minute and game state you only know what happens for an average team playing another average team.To take an extreme example you couldn't use such a model to give you accurate information as to how San Marino's points expectancy changed when they mugged England for a goal after 17 seconds. In virtually every scenario you're going to have to apply a fudge factor if you use average data because in most cases teams and their opponents won't be exactly average. If you take an Poisson type approach you can account for different initial team strengths,account for how they decay over the course of the game and even allow for in game events like dismissals. At the basic level all models designed to allocate points expectancies to individual players are simple "in running" probability models for the game in question.If you want to see the differing ways of doing this picked apart in minute detail check out the UK based soccer sites.Many of them are gambling based sites (in running betting on soccer as been mainstream for decades in Britain and getting the maths wrong can be very costly),but the methodology is not less stringent for that.

I think all this stuff's great. Can't you use the Z Score as the fudge factor. I'm using it as I try to work out a measure of team production that weights individual events/stats in terms of their relative worth in goals. Another question I have is where you guys are getting your comprehensive data. I have to go through two to three sites each week and compile the EPL data I use by hand.