# Using world ranking to predict the results of the 2019 Rugby World Cup pool stages

## Data visualisation for the pool stages of the 2019 Rugby World Cup

This year (2019) is a rugby world cup year. I like data visualisation, and I like rugby. So here's my primitive attempt to calculate the results of the pool stages of the 2019 World Cup. My simple heuristic is that **world ranking** (as of Aug 22^{nd} 2019) is a predictor of a team's success. In effect, my simple algorithm states that a team with a higher ranking will always beat a team with a lower ranking.

Just here for the predictions? skip to the end

Note: I'm not trying to predict who

willwin a match, just who isexpectedto win. When the inevitable "upsets" occur (like Japan vs. South Africa in 2015), I want to be able to say "wow! They only had a`{N}%`

chance of winning, but they did it!"

So let's translate that calculation into code:

```
const chanceOfWinning = (teamRank, oppositionRank) => {
const combinedRanks = teamRank + oppositionRank;
const invertedRanking = combinedRanks - teamRank; // Becasue a rank of 1 is the best
const percentage = (invertedRanking / combinedRanks) * 100;
return percentage.toFixed(1); // Round to 1 decimal place
};
chanceOfWinning(2, 13); // 86.7
// NZL are ranked `2`, and ITA are ranked `13`
// Therefore NZL have an 86.7% chance of beating ITA
```

With this primitive algorithm, I can produce a *"likelihood of winning"* percentage for any pairing of teams. And on first inspection it looks pretty good (based on my own *subjective* opinion of who should win a given match). New Zealand are currently ranked #2 in the world, and should be expected to crush a #13 side like Italy. Samoa and Russia (#16 and #20 respectively) should be a much closer match, but you'd expect Samoa to emerge victorious.

## There are problems with using rank

But this method starts to look a bit shaky when we include the #1 ranked team in the world (a crown recently claimed by Wales at the time of writing). Not because Wales are particularly special, but because this algorithm massively favours lower rankings. I'd expect Wales to crush Uraguay (#19 in the world), but would *not* expect them to have such an easy time against Australia (ranked #6). The ranking-based algorithm predicts *both* matches would be walkovers:

And there's another problem with using world rankings. Rankings, by their very nature, are ordinal. By ranking alone, the difference between #1 and #2 is the same as the difference between #2 and #3, and so on... Whereas in reality, some teams are much closer than their mere ranking would suggest.

## Using points rather than ranking

A better metric to use as the base for our calculation would be *points*. Word Rugby, the sport's governing body, uses a points system to determine the world rankings. These points are based on match performance, and range from zero to one hundred (the top side generally has a rating of somewhere near 90 points). In late August 2019, Wales have 89.43 points and New Zealand have 89.40 - it's tight at the top! Australia are on 84.05 and Uraguay have 65.18 points.

Using `points`

rather than `rank`

changes our algorithm slightly (we no longer need to invert the team's value, as higher points are better).

```
const chanceOfWinning = (teamPoints, oppositionPoints) => {
const combinedRanks = teamPoints + oppositionPoints;
const percentage = (teamPoints / combinedRanks) * 100;
return percentage.toFixed(1);
};
```

Plumbing our examples into this calculation produces a much tighter set of matches. The end results are still the same (in this system, a team with higher points will always beat a team with lower points, in just the same way as the team with the better ranking always wins).

These results look a little better than the ranking-only method. The delta between WAL/URA and WAL/AUS looks more realistic, and whoever is in the #1 spot has less of an unfair advantage. But now the *amounts* look wrong. Any theory that gives Italy a 44.6% of beating New Zealand **must** be inaccurate.

## Increasing the weighting

The points-based system is a better reflection of the team's relative chance of winning, but to my eyes the results aren't extreme enough. It gives too much credit to the lower-tier teams, and not enough to the top-tier ones. For the calculation to better match my expectations, it needs to favour the teams at the top of the rankings. Not only that, but it needs to do it *progressively* - so a team in the middle gets a bit of a boost, but not as much as those at the top get.

I need to write a function that will adjust the points value of each team. The easiest way to get the result I'm after is to multiply each team's points by a power.

```
const adjustment = num => Math.pow(num, 5);
```

```
const chanceOfWinning = (teamPoints, oppositionPoints) => {
const combinedRanks = adjustment(teamPoints) + adjustment(oppositionPoints);
const percentage = (adjustment(teamPoints) / combinedRanks) * 100;
return percentage.toFixed(1);
};
```

I started with `2`

as the exponent, and that was better than nothing, but still not enough. `10`

was too extreme, and in the end I settled on `5`

. Increasing each team's points by a power of `5`

gave me a set of probabilities that looked about right. That formula added just enough of a notch in the middle of the graph - and thereby increasing the likelihood of a top-tier team beating a lower-tier one.

## Results for all the pools

This is of course only based on my experience of rugby and my own highly subjective opinions. But it is still anchored in reality because I'm using the points as a starting point, and treating each team equally (as much as I *want* to give England a boost, the algorithm doesn't support it).

Ironically, this calculation shows that the draw for this world cup *does* give England a slight boost. The top 8 teams make it through to the quarter finals as you would expect. But when it comes to the semis, 4^{th} ranked South Africa miss out, while 5^{th} ranked England manage to sneak in. A side-effect of the pools being drawn years before the event. On the other hand, it probably shows that the draw-process works fairly well if, given all the top 8 make it into the quarters (or at least shows that the rankings have been comparatively static).

I'm not expecting these predictions to come true - there's a lot more to success in rugby than simple rankings. But I do find this kind of objective analysis useful for setting expectations. Looking at these predictions, I'll make more of an effort to see matches I might otherwise have passed on. Tonga vs. USA, for instance, looks like it'll be a close one. As do Scotland vs. Japan and New Zealand vs. South Africa (although after this year's Championship you don't need an algorithm to tell you that'll be a real grudge match!).

### Pool A matches

### Pool A results

Pos. | Team | Wins |
---|---|---|

1st | Ireland | 4 wins |

2nd | Scotland | 3 wins |

3rd | Japan | 2 wins |

4th | Samoa | 1 win |

5th | Russia | 0 wins |

### Pool B matches

### Pool B results

Pos. | Team | Wins |
---|---|---|

1st | New Zealand | 4 wins |

2nd | South Africa | 3 wins |

3rd | Italy | 2 wins |

4th | Canada | 1 win |

5th | Namibia | 0 wins |

### Pool C matches

### Pool C results

Pos. | Team | Wins |
---|---|---|

1st | England | 4 wins |

2nd | France | 3 wins |

3rd | Argentina | 2 wins |

4th | United States | 1 win |

5th | Tonga | 0 wins |

### Pool D matches

### Pool D results

Pos. | Team | Wins |
---|---|---|

1st | Wales | 4 wins |

2nd | Australia | 3 wins |

3rd | Fiji | 3 wins |

4th | Georgia | 1 win |

5th | Uraguay | 0 wins |

*Thanks for reading! It would be great if you could share this post on Twitter, if you can spare the time. It really helps increase my reach, and helps me decide what sort of content to create in the future.*