Using world ranking to predict the results of the 2019 Rugby World Cup pool stages
Data visualisation for the pool stages of the 2019 Rugby World Cup
This year (2019) is a rugby world cup year. I like data visualisation, and I like rugby. So here's my primitive attempt to calculate the results of the pool stages of the 2019 World Cup. My simple heuristic is that world ranking (as of Aug 22nd 2019) is a predictor of a team's success. In effect, my simple algorithm states that a team with a higher ranking will always beat a team with a lower ranking.
Just here for the predictions? skip to the end
Note: I'm not trying to predict who will win a match, just who is expected to win. When the inevitable "upsets" occur (like Japan vs. South Africa in 2015), I want to be able to say "wow! They only had a
{N}%
chance of winning, but they did it!"
So let's translate that calculation into code:
const chanceOfWinning = (teamRank, oppositionRank) => {
const combinedRanks = teamRank + oppositionRank;
const invertedRanking = combinedRanks - teamRank; // Becasue a rank of 1 is the best
const percentage = (invertedRanking / combinedRanks) * 100;
return percentage.toFixed(1); // Round to 1 decimal place
};
chanceOfWinning(2, 13); // 86.7
// NZL are ranked `2`, and ITA are ranked `13`
// Therefore NZL have an 86.7% chance of beating ITA
With this primitive algorithm, I can produce a "likelihood of winning" percentage for any pairing of teams. And on first inspection it looks pretty good (based on my own subjective opinion of who should win a given match). New Zealand are currently ranked #2 in the world, and should be expected to crush a #13 side like Italy. Samoa and Russia (#16 and #20 respectively) should be a much closer match, but you'd expect Samoa to emerge victorious.
There are problems with using rank
But this method starts to look a bit shaky when we include the #1 ranked team in the world (a crown recently claimed by Wales at the time of writing). Not because Wales are particularly special, but because this algorithm massively favours lower rankings. I'd expect Wales to crush Uraguay (#19 in the world), but would not expect them to have such an easy time against Australia (ranked #6). The ranking-based algorithm predicts both matches would be walkovers:
And there's another problem with using world rankings. Rankings, by their very nature, are ordinal. By ranking alone, the difference between #1 and #2 is the same as the difference between #2 and #3, and so on... Whereas in reality, some teams are much closer than their mere ranking would suggest.
Using points rather than ranking
A better metric to use as the base for our calculation would be points. Word Rugby, the sport's governing body, uses a points system to determine the world rankings. These points are based on match performance, and range from zero to one hundred (the top side generally has a rating of somewhere near 90 points). In late August 2019, Wales have 89.43 points and New Zealand have 89.40 - it's tight at the top! Australia are on 84.05 and Uraguay have 65.18 points.
Using points
rather than rank
changes our algorithm slightly (we no longer need to invert the team's value, as higher points are better).
const chanceOfWinning = (teamPoints, oppositionPoints) => {
const combinedRanks = teamPoints + oppositionPoints;
const percentage = (teamPoints / combinedRanks) * 100;
return percentage.toFixed(1);
};
Plumbing our examples into this calculation produces a much tighter set of matches. The end results are still the same (in this system, a team with higher points will always beat a team with lower points, in just the same way as the team with the better ranking always wins).
These results look a little better than the ranking-only method. The delta between WAL/URA and WAL/AUS looks more realistic, and whoever is in the #1 spot has less of an unfair advantage. But now the amounts look wrong. Any theory that gives Italy a 44.6% of beating New Zealand must be inaccurate.
Increasing the weighting
The points-based system is a better reflection of the team's relative chance of winning, but to my eyes the results aren't extreme enough. It gives too much credit to the lower-tier teams, and not enough to the top-tier ones. For the calculation to better match my expectations, it needs to favour the teams at the top of the rankings. Not only that, but it needs to do it progressively - so a team in the middle gets a bit of a boost, but not as much as those at the top get.
I need to write a function that will adjust the points value of each team. The easiest way to get the result I'm after is to multiply each team's points by a power.
const adjustment = num => Math.pow(num, 5);
const chanceOfWinning = (teamPoints, oppositionPoints) => {
const combinedRanks = adjustment(teamPoints) + adjustment(oppositionPoints);
const percentage = (adjustment(teamPoints) / combinedRanks) * 100;
return percentage.toFixed(1);
};
I started with 2
as the exponent, and that was better than nothing, but still not enough. 10
was too extreme, and in the end I settled on 5
. Increasing each team's points by a power of 5
gave me a set of probabilities that looked about right. That formula added just enough of a notch in the middle of the graph - and thereby increasing the likelihood of a top-tier team beating a lower-tier one.
Results for all the pools
This is of course only based on my experience of rugby and my own highly subjective opinions. But it is still anchored in reality because I'm using the points as a starting point, and treating each team equally (as much as I want to give England a boost, the algorithm doesn't support it).
Ironically, this calculation shows that the draw for this world cup does give England a slight boost. The top 8 teams make it through to the quarter finals as you would expect. But when it comes to the semis, 4th ranked South Africa miss out, while 5th ranked England manage to sneak in. A side-effect of the pools being drawn years before the event. On the other hand, it probably shows that the draw-process works fairly well if, given all the top 8 make it into the quarters (or at least shows that the rankings have been comparatively static).
I'm not expecting these predictions to come true - there's a lot more to success in rugby than simple rankings. But I do find this kind of objective analysis useful for setting expectations. Looking at these predictions, I'll make more of an effort to see matches I might otherwise have passed on. Tonga vs. USA, for instance, looks like it'll be a close one. As do Scotland vs. Japan and New Zealand vs. South Africa (although after this year's Championship you don't need an algorithm to tell you that'll be a real grudge match!).
Pool A matches
Pool A results
Pos. | Team | Wins |
---|---|---|
1st | Ireland | 4 wins |
2nd | Scotland | 3 wins |
3rd | Japan | 2 wins |
4th | Samoa | 1 win |
5th | Russia | 0 wins |
Pool B matches
Pool B results
Pos. | Team | Wins |
---|---|---|
1st | New Zealand | 4 wins |
2nd | South Africa | 3 wins |
3rd | Italy | 2 wins |
4th | Canada | 1 win |
5th | Namibia | 0 wins |
Pool C matches
Pool C results
Pos. | Team | Wins |
---|---|---|
1st | England | 4 wins |
2nd | France | 3 wins |
3rd | Argentina | 2 wins |
4th | United States | 1 win |
5th | Tonga | 0 wins |
Pool D matches
Pool D results
Pos. | Team | Wins |
---|---|---|
1st | Wales | 4 wins |
2nd | Australia | 3 wins |
3rd | Fiji | 3 wins |
4th | Georgia | 1 win |
5th | Uraguay | 0 wins |
Related posts
If you enjoyed this article, RoboTom 2000™️ (an LLM-powered bot) thinks you might be interested in these related posts:
Algorithmically predicting the results of the 2019 Rugby World Cup
Data visualisation and prediction algorithm for the 2019 Rugby World Cup
Similarity score: 80% match . RoboTom says:
Rugby prediction: retrospective
Data visualisation and prediction algorithm for the 2019 Rugby World Cup
Similarity score: 74% match . RoboTom says: