Predicting Test Matches
Can you predict Test Match outcomes using player based stats?
As cricket fans we know that stats can help us to determine which team should win, but with so many stats it can be hard to know what is actually important, but this is what we have set out to achieve. To do so, we collected all the match data post-1990 and for any player that played in at least one of these games we collected their entire playing history (in some instances this stretched back into the 70’s). The reason for such a large amount of data was to make sure that a method which works today also worked 3, 5, 10 or even more years ago.
The factors which were consistently useful in predicting the outcome (i.e. who won) were player rankings (ICC rankings), bowling average, batting average as well as a home ground advantage factor. Dozens of other factors were trialed, but in the end it was a combination of these simple statistics that proved to be the most significant.
Back-testing was important (i.e. using only historical data to determine the factors, and the combination of these factors, to then test future games), and as a benchmark we used the bookies favourites. In the last 5 years, for games which had a result, the bookies favourites won ~70% of the time, however using our model we were able to predict the winner ~80% of the time.
The chart below shows the win percentages when each factor is considered in isolation over the last 10 years, as well as the bookies favourites and our model which combines these factors (note: that draws are excluded – only games with results are included in the data set):
(e.g. so the team with the bowlers with the better combined bowling rank won just over 70% of the games which had a result – this was actually better than the bookies favourites)
The Sth Africa vs England match earlier this year starting January 14 is a good example of a game the model correctly picked a non-favourite to win. Despite being played in Johannesburg, the model gave non-favourite England a 78% chance of winning due to their statistically superior batting and bowling line-ups.
The next match in this series however saw a large number of players change and a lot of experience back in the South African line-up which lead to the model seeing the game as being a lot closer and gave England only a 52% chance of victory (which agreed with the market) – in this case both the model and the market was wrong and South Africa won. No model is correct 100% of the time!
However to find one of the greater upsets the model predicted we need to go back to 22 October 2014. Pakistan was playing Australia and the market gave Pakistan just a 25% chance of victory. However the model, off the back of their “home” ground advantage (it was played in Dubai) and a superior batting line up (Australia had a couple of new batsmen in Doolan and Mitch Marsh), had Pakistan as favourites and gave them a 60% chance of victory. They won by 221 runs.
In doing this analysis a few interesting cricket myths came up, the first being “is the coin toss important?”
This interesting topic was discussed in a recent cricinfo blog, however the modelling found no evidence that, once all other stats had been taken into account, the toss was a factor in who won Test Matches. We did the analysis both including and excluding the coin toss and found that for some periods the model’s accuracy deteriorated when the coin toss was included. So I guess you could say “Myth Busted”.
Another common cricketing perception is that the sub-continent countries struggle outside the sub-continent, and also that other countries struggle when travelling to the subcontinent. We found both of these to be true. Whilst it varies overtime, the subcontinent teams have a home-ground advantage roughly four-times greater than other countries (when playing a team from outside the sub-continent). They do however have a greater away-ground disadvantage, which is roughly double that of other countries, when they travel outside the sub-continent. So in this case “Myths confirmed”.
Other than including the ICC player rankings, no other “form” statistic was found to be reliable when predicting a winner. A lot of time was spent trying to find the right “form” indicator (e.g. recent averages), but in the end career averages were more useful for our purposes – as they say form is temporary, class is permanent.
Whilst home-ground advantage does play a part in forecasting the result, if we remove this and use the players named in the last match that each country played, we can come up with the following team ranking:
(As a scale I have used the average team – which is actually very close to the last New Zealand team)
So that’s all well and good looking into the past, but what about the future?
Well our plan is to forecast and predict the result of every upcoming test match and we will publish our predictions on this website for anyone who is interested. However because the predictions are based on the actual teams, we will have to wait until closer to the games to start making these.
Looking even further into the future, we hope to produce similar models for both this year’s BBL and next year’s IPL.
Back on test match cricket, if we allow ourselves to look at the next match and assume that both England and Sri Lanka name similar teams to their last test matches, and this should come as no surprise given the above rankings, the model is currently giving Sri Lanka less than a 5% chance of winning.
England should also win the following series against Pakistan, although this is predicted to be closer. Pakistan have a superior team (based off recently named teams), however England will have a significant home advantage.
We look forward to testing the model, and you are welcome to follow our progress.