RSS Olympic prediction competition 2024

Context

Each summer the Royal Statistical Society’s Statistics in Sport Section organizes a prediction competition. For 2024 the competition involves predicting the medals table for the Paris Olympics.

This year’s competition is novel in several ways. Firstly, forecasters will be asked to provide a single predicted ranking whose distance from the true ranking at the end of the games will determine their score. This stands in contrast to previous forecasting competitions, in which forecasters have been required to specify probabilities for the outcomes of individual sports matches in a tournament. Secondly, the ‘true ranking’ for the Olympics is not as well defined or officially endorsed as it is for other events.

For the purposes of this competition the true final ranking will be determined by the number of gold medals won by each country. Silver medals will be used only to break ties between countries with equal numbers of gold medals. Bronze medals will be used to break ties between countries after silver medals are accounted for. Countries still tied after accounting for bronze medals will remain tied.

Scoring

Predicted rankings will be scored using a statistic called Kendall’s tau, or the Kendall rank correlation coefficient. There are (at least) two ways to compute this statistic, which shed light on its meaning. To understand what the statistic quantifies it is first necessary to think about the all the possible pair-wise comparisons between countries. With \(n\) participating countries there are \(n(n-1)/2=\binom{n}{2}\) of these. A forecaster will gain points for every pair which they have placed in the correct order, and they will lose a point for every pair which they placed in the wrong order. If a pair countries are tied, either in the predicted ranking or the true ranking, this pair does not contribute to the score. The score is then scaled by the total number of pairs so that it falls between -1 and 1.

We can write this down as

\[\begin{align} \tau & = \frac{\text{number of concordant pairs}-\text{number of discordant pairs}}{\text{number of pairs}} \\ & = \frac{2}{n(n-1)} \sum_{i<j}\text{sign}(R_i-R_j)\text{sign}(S_i-S_j) \end{align}\]

where ‘concordant’ is a word used in the academic literature to mean ‘in agreement’. Similarly ‘discordant’ means ‘in disagreement’. For example, if a forecaster ranks Kiribati higher than Tanzania and, indeed, Kiribati is higher than Tanzania in the final true ranking then we say that the forecaster has a concordant pair with the true ranking. As alluded to in the equation above, it is also possible to show that Kendall’s tau is the correlation between the signs of the differences in ranks. In this expression \(R_i\) denotes the numerical rank assigned by a forecaster to a country (e.g. if a forecaster predicted country \(i\) to be second from the top in the medals table they would set \(R_i=2\)). The quantity \(S_i\) denotes the true rank for country \(i\). The \(\text{sign}\) function used here returns value one when its argument is positive, minus one when its argument is negative and zero when its argument is zero.

There are 206 countries participating in the 2024 Olympic games, although officially they are referred to as National Olympic Committees (NOCs) rather than countries. In the submission template you will find a list of the NOCs and, as a default, they are all ranked jointly at 206th. This default entry will receive a Kendall’s tau score of zero because, as mentioned above, tied pairs contribute nothing to the score - they are considered neither concordant nor discordant. Forecasters are invited to modify the ranks in any way they see fit. You could, for example,

  • cluster countries into groups all with the same rank,
  • use non-integer ranks,
  • use ranks that go below 1!

The only thing that matters is that your ranks allow for the pairwise comparisons that contribute to Kendall’s tau.

Submission template

You can download the submission template here. Only modify the numbers in the Rank column. In particular do not permute the rows of the NOCs.

Email completed submissions to ben.powell@york.ac.uk with the subject line SiS forecasting competition 2024. This email should contain an attached .csv file called “RSS_pred_comp_submission_NAME.csv” where NAME is the name you would like to appear on the leader board.

If you would like to be considered for the methodology prize, please provide a brief description of how you made your forecast predictions.

Please make your submissions before Monday 22nd July. The games start on Friday 26th July but it will be necessary to have submission files a bit earlier in order to initialize the forecast scoring procedure and the leader board.

Prizes

The principal and most valuable prize for winning the forecasting competition is prestige (and a certificate)! Special attention and corresponding certificates will be awarded for

  1. Best overall score
  2. Best student score
  3. Most innovative methodology

The winners of these three awards will also be invited to the Royal Statistical Society conference to present their methods. This year’s conference will be held in Brighton in early September.

Leader board

Note: the leader board is currently populated with simulated data for testing purposes.

Forecaster Rank Tau
user3 1 0.0753493
user2 2 0.0103718
user1 3 -0.0576841

Medals table

Note: the table below is populated with data from 2020 for the purposes of testing the leader board

Code NOC Rank Gold Silver Bronze
USA United States 1 39 41 33
CHN China 2 38 32 18
JPN Japan 3 27 14 17
GBR Great Britain 4 22 20 22
RUS Russia 5 20 28 23
AUS Australia 6 17 7 22
NED Netherlands 7 10 12 14
FRA France 8 10 12 11
GER Germany 9 10 11 16
ITA Italy 10 10 10 20
CAN Canada 11 7 6 11
BRA Brazil 12 7 6 8
NZL New Zealand 13 7 6 7
CUB Cuba 14 7 3 5
HUN Hungary 15 6 7 7
POL Poland 16 4 5 5
CZE Czechia 17 4 4 3
KEN Kenya 18 4 4 2
NOR Norway 19 4 2 2
JAM Jamaica 20 4 1 4
ESP Spain 21 3 8 6
SWE Sweden 22 3 6 0
SUI Switzerland 23 3 4 6
DEN Denmark 24 3 4 4
CRO Croatia 25 3 3 2
IRI Iran 26 3 2 2
SRB Serbia 27 3 1 5
BEL Belgium 28 3 1 3
BUL Bulgaria 29 3 1 2
SLO Slovenia 30 3 1 1
UZB Uzbekistan 31 3 0 2
GEO Georgia 32 2 5 1
TPE Chinese Taipei 33 2 4 6
TUR Turkey 34 2 2 9
GRE Greece 36 2 1 1
UGA Uganda 36 2 1 1
ECU Ecuador 37 2 1 0
IRL Ireland 39 2 0 2
ISR Israel 39 2 0 2
QAT Qatar 40 2 0 1
BAH Bahamas 42 2 0 0
KOS Kosovo 42 2 0 0
UKR Ukraine 43 1 6 12
BLR Belarus 44 1 3 3
ROU Romania 46 1 3 0
VEN Venezuela 46 1 3 0
IND India 47 1 2 4
HKG Hong Kong, China 48 1 2 3
PHI Philippines 50 1 2 1
SVK Slovakia 50 1 2 1
RSA South Africa 51 1 2 0
AUT Austria 52 1 1 5
EGY Egypt 53 1 1 4
INA Indonesia 54 1 1 3
ETH Ethiopia 56 1 1 2
POR Portugal 56 1 1 2
TUN Tunisia 57 1 1 0
EST Estonia 61 1 0 1
FIJ Fiji 61 1 0 1
LAT Latvia 61 1 0 1
THA Thailand 61 1 0 1
BER Bermuda 64 1 0 0
MAR Morocco 64 1 0 0
PUR Puerto Rico 64 1 0 0
COL Colombia 65 0 4 1
AZE Azerbaijan 66 0 3 4
DOM Dominican Republic 67 0 3 2
ARM Armenia 68 0 2 2
KGZ Kyrgyzstan 69 0 2 1
MGL Mongolia 70 0 1 3
ARG Argentina 72 0 1 2
SMR San Marino 72 0 1 2
JOR Jordan 75 0 1 1
MAS Malaysia 75 0 1 1
NGR Nigeria 75 0 1 1
BRN Bahrain 81 0 1 0
KSA Saudi Arabia 81 0 1 0
LTU Lithuania 81 0 1 0
MKD North Macedonia 81 0 1 0
NAM Namibia 81 0 1 0
TKM Turkmenistan 81 0 1 0
KAZ Kazakhstan 82 0 0 8
MEX Mexico 83 0 0 4
FIN Finland 84 0 0 2
BOT Botswana 92 0 0 1
BUR Burkina Faso 92 0 0 1
CIV Ivory Coast 92 0 0 1
GHA Ghana 92 0 0 1
GRN Grenada 92 0 0 1
KUW Kuwait 92 0 0 1
MDA Moldova 92 0 0 1
SYR Syria 92 0 0 1
AFG Afghanistan 206 0 0 0
ALB Albania 206 0 0 0
ALG Algeria 206 0 0 0
AND Andorra 206 0 0 0
ANG Angola 206 0 0 0
ANT Antigua and Barbuda 206 0 0 0
ARU Aruba 206 0 0 0
ASA American Samoa 206 0 0 0
BAN Bangladesh 206 0 0 0
BAR Barbados 206 0 0 0
BDI Burundi 206 0 0 0
BEN Benin 206 0 0 0
BHU Bhutan 206 0 0 0
BIH Bosnia and Herzegovina 206 0 0 0
BIZ Belize 206 0 0 0
BOL Bolivia 206 0 0 0
BRU Brunei 206 0 0 0
CAF Central African Republic 206 0 0 0
CAM Cambodia 206 0 0 0
CAY Cayman Islands 206 0 0 0
CGO Republic of the Congo 206 0 0 0
CHA Chad 206 0 0 0
CHI Chile 206 0 0 0
CMR Cameroon 206 0 0 0
COD Democratic Republic of the Congo 206 0 0 0
COK Cook Islands 206 0 0 0
COM Comoros 206 0 0 0
CPV Cape Verde 206 0 0 0
CRC Costa Rica 206 0 0 0
CYP Cyprus 206 0 0 0
DJI Djibouti 206 0 0 0
DMA Dominica 206 0 0 0
ERI Eritrea 206 0 0 0
ESA El Salvador 206 0 0 0
FSM Federated States of Micronesia 206 0 0 0
GAB Gabon 206 0 0 0
GAM The Gambia 206 0 0 0
GBS Guinea-Bissau 206 0 0 0
GEQ Equatorial Guinea 206 0 0 0
GUA Guatemala 206 0 0 0
GUI Guinea 206 0 0 0
GUM Guam 206 0 0 0
GUY Guyana 206 0 0 0
HAI Haiti 206 0 0 0
HON Honduras 206 0 0 0
IRQ Iraq 206 0 0 0
ISL Iceland 206 0 0 0
ISV Virgin Islands 206 0 0 0
IVB British Virgin Islands 206 0 0 0
KIR Kiribati 206 0 0 0
KOR South Korea 206 0 0 0
LAO Laos 206 0 0 0
LBA Libya 206 0 0 0
LBN Lebanon 206 0 0 0
LBR Liberia 206 0 0 0
LCA Saint Lucia 206 0 0 0
LES Lesotho 206 0 0 0
LIE Liechtenstein 206 0 0 0
LUX Luxembourg 206 0 0 0
MAD Madagascar 206 0 0 0
MAW Malawi 206 0 0 0
MDV Maldives 206 0 0 0
MHL Marshall Islands 206 0 0 0
MLI Mali 206 0 0 0
MLT Malta 206 0 0 0
MNE Montenegro 206 0 0 0
MON Monaco 206 0 0 0
MOZ Mozambique 206 0 0 0
MRI Mauritius 206 0 0 0
MTN Mauritania 206 0 0 0
MYA Myanmar 206 0 0 0
NCA Nicaragua 206 0 0 0
NEP Nepal 206 0 0 0
NIG Niger 206 0 0 0
NRU Nauru 206 0 0 0
OMA Oman 206 0 0 0
PAK Pakistan 206 0 0 0
PAN Panama 206 0 0 0
PAR Paraguay 206 0 0 0
PER Peru 206 0 0 0
PLE Palestine 206 0 0 0
PLW Palau 206 0 0 0
PNG Papua New Guinea 206 0 0 0
PRK North Korea 206 0 0 0
RWA Rwanda 206 0 0 0
SAM Samoa 206 0 0 0
SEN Senegal 206 0 0 0
SEY Seychelles 206 0 0 0
SGP Singapore 206 0 0 0
SKN Saint Kitts and Nevis 206 0 0 0
SLE Sierra Leone 206 0 0 0
SOL Solomon Islands 206 0 0 0
SOM Somalia 206 0 0 0
SRI Sri Lanka 206 0 0 0
SSD South Sudan 206 0 0 0
STP São Tomé and Príncipe 206 0 0 0
SUD Sudan 206 0 0 0
SUR Suriname 206 0 0 0
SWZ Eswatini 206 0 0 0
TAN Tanzania 206 0 0 0
TGA Tonga 206 0 0 0
TJK Tajikistan 206 0 0 0
TLS East Timor 206 0 0 0
TOG Togo 206 0 0 0
TTO Trinidad and Tobago 206 0 0 0
TUV Tuvalu 206 0 0 0
UAE United Arab Emirates 206 0 0 0
URU Uruguay 206 0 0 0
VAN Vanuatu 206 0 0 0
VIE Vietnam 206 0 0 0
VIN Saint Vincent and the Grenadines 206 0 0 0
YEM Yemen 206 0 0 0
ZAM Zambia 206 0 0 0
ZIM Zimbabwe 206 0 0 0

Data & resources

The International Olympic Committee maintains a suite of webpages that contain a large amount of data for all previous games. You can access them here although it is not easy to scrape the data in an automated way. Wikipedia has a useful all-time medals table here. Perhaps most useful, however, is the set of medals tables provided by Rob Wood on his website topendsports.com. Even more data is available from a generous Kaggle user (R.Griffin) here.

You are free to use any data you can find to inform your predictions.

You might also be interested in YouTube video-seminars from participants in previous prediction competitions. These include presentations from the winners of the 2020 prediction competition and the winners of the 2023 competition. Finally, if you would like to be kept up to date on the activities of the RSS’s Statistics in Sports Section you can sign up to their mailing list here.

Example scoring calculation

To help forecasters understand the scoring system we provide the following cartoon example. Below is a fictional medals table that is used to compute a true rank for each of five NOCs. The table also includes a fictional set of predicted ranks. The numbers on the far left of the table are just labels arbitrarily (alphabetically) assigned to the NOCs to help refer to them - a bit like the NOC codes in the real medals table.

NOC True_rank Forecast_rank Gold Silver Bronze
3 Cambodia 1 4 2 2 0
4 Denmark 2 4 0 1 0
2 Bahamas 3 2 0 0 1
1 Afghanistan 5 2 0 0 0
5 Ecuador 5 4 0 0 0

To score the predicted ranks we enumerate all pairs of NOCs and, for each one, check whether the true ordering and the predicted ordering match. These orderings are quantified using the \(\text{sign}\) function (which returns values -1, 0 or 1) applied to the difference between the ranks. If the orderings agree the forecaster scores a point, if they disagree the forecaster loses a point. If either the true or predicted ranks are tied for a particular pair of NOCs then no points are scored or lost. To quantify this sort of agreement we multiply the signs of the rank differences together. The results of these calculations for this toy example are presented in the table below.

i j NOC_i NOC_j True_rank_i True_rank_j True_sign Forecast_rank_i Forecast_rank_j Forecast_sign Forecast_score
1 2 Afghanistan Bahamas 5 3 1 2 2 0 0
1 3 Afghanistan Cambodia 5 1 1 2 4 -1 -1
1 4 Afghanistan Denmark 5 2 1 2 4 -1 -1
1 5 Afghanistan Ecuador 5 5 0 2 4 -1 0
2 3 Bahamas Cambodia 3 1 1 2 4 -1 -1
2 4 Bahamas Denmark 3 2 1 2 4 -1 -1
2 5 Bahamas Ecuador 3 5 -1 2 4 -1 1
3 4 Cambodia Denmark 1 2 -1 4 4 0 0
3 5 Cambodia Ecuador 1 5 -1 4 4 0 0
4 5 Denmark Ecuador 2 5 -1 4 4 0 0

The forecaster’s tau score is their cumulative score (the sum of the right-most column) divided by the number of pairs, i.e. -3/10=-0.3.

Extra info

Bonus fact: Kendall’s tau was devised and analysed by Maurice Kendall who served as President of the Institute of Statisticians, which broke away from then merged with the Royal Statistical Society.

Bonus reminder: We are orienting our ranks so that smaller numbers correspond to more medals. When we talk about the medals table, countries at the top are those with the most medals and the smallest ranks.