What big data can learn from total football, and vice versa: part one

I was lucky enough to have a presentation using the title above accepted for Strata London in October. Unfortunately, due to other commitments, I will no longer be able to attend the event. Having already done some background research into the topic it seemed a shame for it to go to waste. To celebrate this weekend’s return of the Premier League I thought I’d write publish the results here instead.

Ever since the 2003 publication of Moneyball, and its account of the use of sabermetrics by Billy Beane and the 2002 Oakland Athletics’ to gain competitive advantage, questions have been asked about the potential applicability of statistics to other sports.

Interest in the UK naturally spiked following the release of the film of the book, which brought Moneyball to a wider audience and prompted questions about whether football was missing out by ignoring the potential of statistical analysis for competitive advantage.

I previously noted how almost 30 years before the 2002 Oakland Athletics, Dynamo Kyiv manager Valeriy Lobanovskyi instituted a scientific, data-led approach to tactics that enabled Dynamo Kiev to win the Soviet League eight times, the Ukrainian league five times, and the European Cup Winner’s Cup twice.

As much as he was a visionary, Lobanovskyi is also atypical of football managers, but there is other evidence that football has been ahead of the game in realising the potential for statistical analysis. After all, the three big names in football-related statistics – Amisco, Opta and Prozone – were all founded prior to the 2002 baseball season: in 1995, 1996 and 1998 respectively.

Each of these organisations, and many more besides, produce enormous amounts of data related to football matches which is sold to football clubs with a view to improving performance through statistical analysis.

As an example of the amount of data that can be generated in by football, the BBC recently reported that GPS monitors, routinely used by clubs in training if not in actual competitive games, “can collect up to 100 pieces of player data per second on speed, distance, heart rate, dynamic stress load, accelerations and decelerations.”

Having access to gobs of data is one thing; making sense of it is quite another. This is particularly the case in football which is much more fluid than baseball and other sports such as cricket that are essentially a series of repeated set-plays. This has led to sceptics claiming that statistics will never have the same impact in football as baseball due to its unpredictability.

Control the controllables
Our first lesson that data management can learn from football is to not worry about what statistics can’t tell you, and focus on what they can. Or in the words of Southampton FC manager Nigel Adkins: “control the controllables.” This precisely what Bolton Wanderers, one of the first football teams credited with adopting statistical analysis, did. Bolton did so by focusing on the aspects of the game that are set-plays.

Writing in the Financial Times Simon Kuper quotes Gavin Fleig, head of performance analysis at current Premier League champions Manchester City and former performance analyst at Bolton:

“We would be looking at, ‘If a defender cleared the ball from a long throw, where would the ball land? Well, this is the area it most commonly lands. Right, well that’s where we’ll put our man.’”

As a result, Bolton scored 45-50% of their goals from set-plays, compared to a league average of nearer 33%.

Perhaps the most significant set-play in football, certainly in terms of deciding between winners and losers, is the penalty shoot-out. Routinely dismissed by the losing team (England) as a lottery, the penalty shoot-out is anything but according to statistics analyzed by Prozone.

According to Prozone’s analysis:

  • the team taking the first kick wins 75% of the time
  • 81.2% of penalties taken to win shootouts were scored compared with 14.2% of those needed to keep the game alive
  • 71.4% of all penalty saves are in the lower third of the goal
  • None of the penalties aimed at the upper third of the net were saved (although they are more likely to miss)

  • Source: Prozone

    “Everything that can be counted doesn’t necessarily count”
    While statistics such as these suggest that the penalty shoot-out is less a lottery than a mathematical puzzle, our second lesson that data management can learn from football relates to the above quote from Albert Einstein, and the danger of assuming the relevance of statistics.

    Along with with Stefan Szymaski, Kuper is the author of Soccernomics (so good I bought it twice) – a treasure trove of stories and information on statistics and football.

    In Soccernomics, Kuper and Szymaski note that in the early days of statistical analysis in football players were initially judged on statistics that were easily counted: number of passes, number of tackles, number of shots, kilometres run etc.

    That last statistic turned out to be particularly meaningless. Kuper quotes Chelsea’s performance director, Mike Forde:

    “Can we find a correlation between total distance covered and winning? And the answer was invariably no.”

    Perhaps the greatest example of over-reliance on statistics comes from the surprise sale of Jaap Stam from Manchester United to Lazio in 2001.

    While it was widely reported that Manchester United manager Sir Alex Ferguson sold Stam due to controversial comments in his autobiography, Kuper maintains that it was a decision based on statistics: specifically, the fact that Stam, approaching 30, was tackling less often than he previously had. According to Kuper, Ferguson sold Stam based on that statistic alone.

    Whether it was the statistic or the autobiography, selling Stam was a decision Ferguson would later regret. Either way, it turns out that tackles made per game is about as useful a measure of a defender’s ability as the number of kilometres run.

    The proof of that comes in the shape of Paolo Maldini – arguably one of the greatest defenders the world has ever seen. As Kuper notes, statistically Maldini only made one tackle every two games. “Maldini positioned himself so well that he didn’t need to tackle.”

    All of which begs the question: if someone with the domain expertise of Sir Alex Ferguson, one of the greatest managers in the history of British football, armed with statistical evidence, can make an incorrect decision, is there really a role for statistics in football?

    In part two we will explore some of the other examples of statistical analysis influencing the beautiful game, including graph analysis and network theory; the great Liverpool Moneyball experiment, and the lessons learned from Total Football.


    Tags: , , , ,