Using world ranking to predict the results of the 2019 Rugby World Cup pool stages

Data visualisation for the pool stages of the 2019 Rugby World Cup

This year (2019) is a rugby world cup year. I like data visualisation, and I like rugby. So here's my primitive attempt to calculate the results of the pool stages of the 2019 World Cup. My simple heuristic is that world ranking (as of Aug 22nd 2019) is a predictor of a team's success. In effect, my simple algorithm states that a team with a higher ranking will always beat a team with a lower ranking.

Just here for the predictions? skip to the end

Note: I'm not trying to predict who will win a match, just who is expected to win. When the inevitable "upsets" occur (like Japan vs. South Africa in 2015), I want to be able to say "wow! They only had a {N}% chance of winning, but they did it!"

So let's translate that calculation into code:

const chanceOfWinning = (teamRank, oppositionRank) => {
    const combinedRanks = teamRank + oppositionRank;
    const invertedRanking = combinedRanks - teamRank; // Becasue a rank of 1 is the best
    const percentage = (invertedRanking / combinedRanks) * 100;
    return percentage.toFixed(1); // Round to 1 decimal place
};

chanceOfWinning(2, 13); // 86.7
// NZL are ranked `2`, and ITA are ranked `13`
// Therefore NZL have an 86.7% chance of beating ITA

With this primitive algorithm, I can produce a "likelihood of winning" percentage for any pairing of teams. And on first inspection it looks pretty good (based on my own subjective opinion of who should win a given match). New Zealand are currently ranked #2 in the world, and should be expected to crush a #13 side like Italy. Samoa and Russia (#16 and #20 respectively) should be a much closer match, but you'd expect Samoa to emerge victorious.

2nzl
86.7%
13.3%
ita13
16sam
55.6%
44.4%
rus20

There are problems with using rank

But this method starts to look a bit shaky when we include the #1 ranked team in the world (a crown recently claimed by Wales at the time of writing). Not because Wales are particularly special, but because this algorithm massively favours lower rankings. I'd expect Wales to crush Uraguay (#19 in the world), but would not expect them to have such an easy time against Australia (ranked #6). The ranking-based algorithm predicts both matches would be walkovers:

1wal
95%
5%
ura19
1wal
85.7%
14.3%
aus6

And there's another problem with using world rankings. Rankings, by their very nature, are ordinal. By ranking alone, the difference between #1 and #2 is the same as the difference between #2 and #3, and so on... Whereas in reality, some teams are much closer than their mere ranking would suggest.

Using points rather than ranking

A better metric to use as the base for our calculation would be points. Word Rugby, the sport's governing body, uses a points system to determine the world rankings. These points are based on match performance, and range from zero to one hundred (the top side generally has a rating of somewhere near 90 points). In late August 2019, Wales have 89.43 points and New Zealand have 89.40 - it's tight at the top! Australia are on 84.05 and Uraguay have 65.18 points.

Using points rather than rank changes our algorithm slightly (we no longer need to invert the team's value, as higher points are better).

const chanceOfWinning = (teamPoints, oppositionPoints) => {
    const combinedRanks = teamPoints + oppositionPoints;
    const percentage = (teamPoints / combinedRanks) * 100;
    return percentage.toFixed(1);
};

Plumbing our examples into this calculation produces a much tighter set of matches. The end results are still the same (in this system, a team with higher points will always beat a team with lower points, in just the same way as the team with the better ranking always wins).

2nzl
55.4%
44.6%
ita13
16sam
51.6%
48.4%
rus20
1wal
57.8%
42.2%
ura19
1wal
51.6%
48.4%
aus6

These results look a little better than the ranking-only method. The delta between WAL/URA and WAL/AUS looks more realistic, and whoever is in the #1 spot has less of an unfair advantage. But now the amounts look wrong. Any theory that gives Italy a 44.6% of beating New Zealand must be inaccurate.

Increasing the weighting

The points-based system is a better reflection of the team's relative chance of winning, but to my eyes the results aren't extreme enough. It gives too much credit to the lower-tier teams, and not enough to the top-tier ones. For the calculation to better match my expectations, it needs to favour the teams at the top of the rankings. Not only that, but it needs to do it progressively - so a team in the middle gets a bit of a boost, but not as much as those at the top get.

I need to write a function that will adjust the points value of each team. The easiest way to get the result I'm after is to multiply each team's points by a power.

const adjustment = num => Math.pow(num, 5);

const chanceOfWinning = (teamPoints, oppositionPoints) => {
    const combinedRanks = adjustment(teamPoints) + adjustment(oppositionPoints);
    const percentage = (adjustment(teamPoints) / combinedRanks) * 100;
    return percentage.toFixed(1);
};

I started with 2 as the exponent, and that was better than nothing, but still not enough. 10 was too extreme, and in the end I settled on 5. Increasing each team's points by a power of 5 gave me a set of probabilities that looked about right. That formula added just enough of a notch in the middle of the graph - and thereby increasing the likelihood of a top-tier team beating a lower-tier one.

2nzl
74.6%
25.4%
ita13
16sam
57.9%
42.1%
rus20
1wal
82.9%
17.1%
ura19
1wal
57.7%
42.3%
aus6

Results for all the pools

This is of course only based on my experience of rugby and my own highly subjective opinions. But it is still anchored in reality because I'm using the points as a starting point, and treating each team equally (as much as I want to give England a boost, the algorithm doesn't support it).

Ironically, this calculation shows that the draw for this world cup does give England a slight boost. The top 8 teams make it through to the quarter finals as you would expect. But when it comes to the semis, 4th ranked South Africa miss out, while 5th ranked England manage to sneak in. A side-effect of the pools being drawn years before the event. On the other hand, it probably shows that the draw-process works fairly well if, given all the top 8 make it into the quarters (or at least shows that the rankings have been comparatively static).

I'm not expecting these predictions to come true - there's a lot more to success in rugby than simple rankings. But I do find this kind of objective analysis useful for setting expectations. Looking at these predictions, I'll make more of an effort to see matches I might otherwise have passed on. Tonga vs. USA, for instance, looks like it'll be a close one. As do Scotland vs. Japan and New Zealand vs. South Africa (although after this year's Championship you don't need an algorithm to tell you that'll be a real grudge match!).

Pool A matches

Pool A results

Pos. Team Wins
1st Ireland 4 wins
2nd Scotland 3 wins
3rd Japan 2 wins
4th Samoa 1 win
5th Russia 0 wins

Pool B matches

Pool B results

Pos. Team Wins
1st New Zealand 4 wins
2nd South Africa 3 wins
3rd Italy 2 wins
4th Canada 1 win
5th Namibia 0 wins

Pool C matches

Pool C results

Pos. Team Wins
1st England 4 wins
2nd France 3 wins
3rd Argentina 2 wins
4th United States 1 win
5th Tonga 0 wins

Pool D matches

Pool D results

Pos. Team Wins
1st Wales 4 wins
2nd Australia 3 wins
3rd Fiji 3 wins
4th Georgia 1 win
5th Uraguay 0 wins


Signup to my newsletter

Join the dozens (dozens!) of people who get my writing delivered directly to their inbox. You'll also hear news about my miscellaneous other projects, some of which never get mentioned on this site.