The Dangers of Football Statistics

Billy Macfarlane returns to TFN with his take on the use of statistics in football…

Football is a sport with no place for statistics. It is a sport which is organised chaos. Moments which decide results are so few and far between that there is little use in trying to predict them. The emotional legacy of results on fans is the most important aspect of football to understand. From this point we can begin to qualitatively argue about the best players and sides but no conclusive answer can ever be reached.

Football is a sport inherently based on statistics. Quantitative measures of the number of times each side makes the ball legally cross the goal line result in teams receiving either 0, 1 or 3 points or progressing in a cup competition. From this point we can begin to quantify, analyse and predict the impacts of players on the causation of goals and securing victory. These measurements can be produced objectively regardless of who compiles them.

This, of course, is a false dichotomy. Those favouring qualitative appreciation of football do not view goals and points as irrelevant to making judgements. Likewise few of those who argue in favour for the importance of statistics would actually argue that there is no room for subjectivity and that you can form your judgements based purely on graphs and numbers.

The problem however is that the football statistics school has ballooned in terms of its prominence and proclaimed importance in recent years. The mainstream presentation of statistical information has been rising unchecked without really being challenged about what is presented.

Take  the  example  of  ‘Chances  Created’.  Chances  Created  are  defined by Opta as assists plus key passes, i.e. a pass which results in another player having either a shot or scoring a goal. The issue with this definition is that it has no real relation to the common lexicon of football fans when referring to a chance. What most football fans would describe as a chance is something akin to ‘a situation where a player has a reasonable opportunity to score a goal’.

There  are  two  major  flaws  with  Opta’s  definition,  first  the  way  the  term chance is used by the average football fan is subjective. You can see this in most post-match conversations where fans argue that a player should have scored  or  opposing  fans  argue  they  have  more  chances  than  the  other. Second, not everything that would be called a chance lead to a shot. A player who whips an inch perfect ball across the six yard box which is miraculously missed would not have created a chance. On the contrast the defender who passes to a midfielder whose speculative effort from 40 yards nestles into Row Z is classed as creating a chance.

These statistics measuring simplistic, descriptive events are often flawed either through their lack of context or poor definition. They are used, however, to inform a vast swathe of poor sports writing which often takes statistics at face value. Squawka and WhoScored publish pieces which are heavily stats-driven often without considering context or arriving at false conclusions. For example Who Scored’s statistically generated strengths and weaknesses show that Michael Dawson “has no significant weaknesses”, a stance which will amuse anybody who has ever seen Dawson face a striker with pace. Meanwhile Mile Jedinak had a better 2014 than Yaya Toure, Luka Modric, Nemanja Matic and Toni Kroos. The poor use of descriptive statistics is increasingly bleeding over into more conventional football media. They are frequently used by Sky Sports and Gary Neville and Jamie Carragher use them partly to inform their debate on who should be in the team of the season so far.

It is not only the media which has seen a growth in using football statistics and analytics, clubs are increasingly using them to analyse performances. Brendan Rodgers is fond of a post-loss soundbite referring to a victory possession percentage to detract from a poor result. As a further example former Arsenal player and current Arsenal scout Gilles Grimandi however challenged that this may be poor for the game and player development. Grimandi believes that the increased importance of winning duels means players may be less willing to commit to battles they won’t win. It is possible that this may translate into other statistics, is the young player who relies on safe passes doing this so that their post-match pass completion statistics are high?

Beyond these simplistic statistics there is a school of football writers who seek to delve deeper into the various factors which cause goals to be scored and conceded. This school of thought, operating under the term analytics, often attempts to apply methods which have been established in predominantly American sports such as baseball, ice hockey and basketball. Ted Knutson for example uses football data to produce radars which aim to describe the key measurable aspects of a players output respective of the position they play. Using this data in such a way produces an easy graphic to eyeball a player and see how well they are performing. Another, related, method as developed by Michael Caley aims to utilise a variety of factors to produce predicted goals and assists. This modelling is far more complex than simple descriptive football statistics and allows you to predict the likelihood of future events.

There are still issues with analytics however, and these are not ones necessarily intentionally caused by the analysts. One aspect is that at the heart of it the methodology is reductionist. A great, fluid counter-attacking goal such as Hazard’s vs. Newcastle deserves an appreciation on a qualitative level rather than merely being counted as another tick into a box for attack speed. A key factor which these models will always lack is players’ movement. Take Harry Kane’s first against Chelsea last week, this strike is created by Nacer Chadli and Christian Eriksen pulling Nemanja Matic and Branislav Ivanovic out of position. They are the two players who created the goal, Danny Rose who receives the assist has comparatively little to do with the goal. Movement, anticipation and intelligence is a central aspect of what marks a great footballer from a good one and so these models are missing an entire skillset.

Analytics methods with football are still in their infancy. There is a vast difference between Caley’s current configuration which relies on eight predictors of whether a goal is probable when compared with analytics writing from only a year earlier simply correlating key passes and goals. As these analytics models are developed it is probable that variables will be added and others discarded as their variance is explained by a combination of other more accurate factors.  This process is recognised by the authors when they write about their methodologies but is often discarded when it comes to shorter pieces or tweets. This is necessary as even those whose penchant it is for statistical writing would tire of reading a 1000 word methodology each time they read a piece, however the danger is that these sophisticated models are then seized upon by other writers and TV show producers who do not grasp that they are in development and are then used poorly.

The final point, and this applies to most quantitatively centred work, is that this operates within a positivist framework which assumes that football can be described in an objective way. It suggests that debates about who is the better side or the better player can be settled definitively through the correct application of numbers. This therefore would eliminate an entire aspect of football culture – the argument down the pub. The idea that the future version of ‘Who is better Gerrard or Lampard?’ will be settled with certainty by statistical models producing a single number, such as those used in FIFA computer games, is uncomfortable. Models may be able to genuinely separate the poor from the good in terms of player quality but in terms of true greatness this is an intangible concept related to clutch moments that cannot be simply noted down as a tick in the goal, assist, tackle or save column.

This may come across as blaming all bad opinions and poor writing on statistics. In a world where Adrian Durham has a regular column in a national newspaper this is clearly not the case. People arrive at nonsensical opinions without statistics all the time but these opinions can easily be mocked or countered, we’ve all got that friend who thinks that Mark Noble would be a good signing for Arsenal or that Jack Cork should have gone to the World Cup. The problem caused by statistics is that they are derived from a perspective which sees that there are definitive truths that can be presented without challenge in a way which may be detrimental to debates in football. Whenever statistics are used it should be because they are adding to the debate either in analysis, description or prediction. They can be used to support assumptions or challenge well-held viewpoints but the notion of reducing a beautifully chaotic sport to a series of numbers shouldn’t be the first port of call.

You can follow Billy on Twitter at @BillyMacfarlane and The False Nine at @The_False_Nine.

2 thoughts on “The Dangers of Football Statistics

  1. Good stuff. I felt it was a bit even-handed though, would have benefited from a more focused thrust.

    For me, if you’re trying to get at some ‘truth’ about a game, then fans recollections are just so riddled with memory failures and confirmation bias as to be basically useless. The question is whether a) such a truth exists and b) it would be fun to find it. For what’s the point of a sport that isn’t fun?

Leave a reply