RSS Olympic prediction competition 2024

Context

Each summer the Royal Statistical Society’s Statistics in Sport Section organizes a prediction competition. For 2024 the competition, which is being sponsored by Amelco, involves predicting the medals table for the Paris Olympics.

This year’s competition is novel in several ways. Firstly, forecasters will be asked to provide a single predicted ranking whose distance from the true ranking at the end of the games will determine their score. This stands in contrast to previous forecasting competitions, in which forecasters have been required to specify probabilities for the outcomes of individual sports matches in a tournament. Secondly, the ‘true ranking’ for the Olympics is not as well defined or officially endorsed as it is for other events.

For the purposes of this competition the true final ranking will be determined by the number of gold medals won by each country. Silver medals will be used only to break ties between countries with equal numbers of gold medals. Bronze medals will be used to break ties between countries after silver medals are accounted for. Countries still tied after accounting for bronze medals will remain tied.

Scoring

Predicted rankings will be scored using a statistic called Kendall’s tau, or the Kendall rank correlation coefficient. There are (at least) two ways to compute this statistic, which shed light on its meaning. To understand what the statistic quantifies it is first necessary to think about the all the possible pair-wise comparisons between countries. With \(n\) participating countries there are \(n(n-1)/2=\binom{n}{2}\) of these. A forecaster will gain points for every pair which they have placed in the correct order, and they will lose a point for every pair which they placed in the wrong order. If a pair countries are tied, either in the predicted ranking or the true ranking, this pair does not contribute to the score. The score is then scaled by the total number of pairs so that it falls between -1 and 1.

We can write this down as

\[\begin{align} \tau & = \frac{\text{number of concordant pairs}-\text{number of discordant pairs}}{\text{number of pairs}} \\ & = \frac{2}{n(n-1)} \sum_{i<j}\text{sign}(R_i-R_j)\text{sign}(S_i-S_j) \end{align}\]

where ‘concordant’ is a word used in the academic literature to mean ‘in agreement’. Similarly ‘discordant’ means ‘in disagreement’. For example, if a forecaster ranks Kiribati higher than Tanzania and, indeed, Kiribati is higher than Tanzania in the final true ranking then we say that the forecaster has a concordant pair with the true ranking. As alluded to in the equation above, it is also possible to show that Kendall’s tau is the correlation between the signs of the differences in ranks. In this expression \(R_i\) denotes the numerical rank assigned by a forecaster to a country (e.g. if a forecaster predicted country \(i\) to be second from the top in the medals table they would set \(R_i=2\)). The quantity \(S_i\) denotes the true rank for country \(i\). The \(\text{sign}\) function used here returns value one when its argument is positive, minus one when its argument is negative and zero when its argument is zero.

There are 206 countries participating in the 2024 Olympic games, although officially they are referred to as National Olympic Committees (NOCs) rather than countries. In the submission template you will find a list of the NOCs and, as a default, they are all ranked jointly at 206th. This default entry will receive a Kendall’s tau score of zero because, as mentioned above, tied pairs contribute nothing to the score - they are considered neither concordant nor discordant. Forecasters are invited to modify the ranks in any way they see fit. You could, for example,

  • cluster countries into groups all with the same rank,
  • use non-integer ranks,
  • use ranks that go below 1!

The only thing that matters is that your ranks allow for the pairwise comparisons that contribute to Kendall’s tau.

Submission template

You can download the submission template here. Only modify the numbers in the Rank column. In particular do not permute the rows of the NOCs.

Email completed submissions to ben.powell@york.ac.uk with the subject line SiS forecasting competition 2024. This email should contain an attached .csv file called “RSS_pred_comp_submission_NAME.csv” where NAME is the name you would like to appear on the leader board.

If you would like to be considered for the methodology prize, please provide a brief description of how you made your forecast predictions.

Please make your submissions before Monday 22nd July. The games start on Friday 26th July but it will be necessary to have submission files a bit earlier in order to initialize the forecast scoring procedure and the leader board.

Prizes

The principal and most valuable prize for winning the forecasting competition is prestige (and a certificate)! Special attention and corresponding certificates will be awarded for

  1. Best overall score
  2. Best student score
  3. Most innovative methodology

The winners of these three awards will also be invited to the Royal Statistical Society conference to present their methods. This year’s conference will be held in Brighton in early September. Conference fees and travel/accommodation costs are being provided by sponsors Amelco.

Leader board

The competition is now over and our winner is John Edwards. Well done John and everyone who took part!

Forecaster Rank Tau
John Edwards 1 0.599
mohamed 2 0.564
OlymPicks 3 0.548
Hammers_O’Callaghan 4 0.542
ALEC 5 0.538
fa2410 6 0.536
AJM 7 0.533
ChristopherWharton 8 0.531
JKH 9 0.512
Orla_S 10 0.510
KaitoGoto 11 0.508
JoePenn 13 0.507
JohnDSouza 13 0.507
JPN2020 14 0.506
HarrySnart 15 0.499
WilliamBowers 16 0.494
LexieB 17 0.493
ANIK 18 0.485
Rank_Deficient 19 0.479
WeatherQuant 20 0.458
Kizmet24 21 0.426
tghaynes 22 0.395
Everyone_is_(equally)_awesome 23 0.000

Medals table

Last updated: 2024-08-11 21:13:10

Code NOC Rank Gold Silver Bronze
USA United States 1 40 44 42
CHN China 2 40 27 24
JPN Japan 3 20 12 13
AUS Australia 4 18 19 16
FRA France 5 16 26 22
NED Netherlands 6 15 7 12
GBR Great Britain 7 14 22 29
KOR South Korea 8 13 9 10
ITA Italy 9 12 13 15
GER Germany 10 12 13 8
NZL New Zealand 11 10 7 3
CAN Canada 12 9 7 11
UZB Uzbekistan 13 8 2 3
HUN Hungary 14 6 7 6
ESP Spain 15 5 4 9
SWE Sweden 16 4 4 3
KEN Kenya 17 4 2 5
NOR Norway 18 4 1 3
IRL Ireland 19 4 0 3
BRA Brazil 20 3 7 10
IRI Iran 21 3 6 3
UKR Ukraine 22 3 5 4
ROU Romania 23 3 4 2
GEO Georgia 24 3 3 1
BEL Belgium 25 3 1 6
BUL Bulgaria 26 3 1 3
SRB Serbia 27 3 1 1
CZE Czechia 28 3 0 2
DEN Denmark 29 2 2 5
AZE Azerbaijan 31 2 2 3
CRO Croatia 31 2 2 3
CUB Cuba 32 2 1 6
BRN Bahrain 33 2 1 1
SLO Slovenia 34 2 1 0
TPE Chinese Taipei 35 2 0 5
AUT Austria 36 2 0 3
HKG Hong Kong, China 38 2 0 2
PHI Philippines 38 2 0 2
ALG Algeria 40 2 0 1
INA Indonesia 40 2 0 1
ISR Israel 41 1 5 1
POL Poland 42 1 4 5
KAZ Kazakhstan 43 1 3 3
JAM Jamaica 46 1 3 2
RSA South Africa 46 1 3 2
THA Thailand 46 1 3 2
ETH Ethiopia 47 1 3 0
SUI Switzerland 48 1 2 5
ECU Ecuador 49 1 2 2
POR Portugal 50 1 2 1
GRE Greece 51 1 1 6
ARG Argentina 54 1 1 1
EGY Egypt 54 1 1 1
TUN Tunisia 54 1 1 1
BOT Botswana 58 1 1 0
CHI Chile 58 1 1 0
LCA Saint Lucia 58 1 1 0
UGA Uganda 58 1 1 0
DOM Dominican Republic 59 1 0 2
GUA Guatemala 61 1 0 1
MAR Morocco 61 1 0 1
DMA Dominica 63 1 0 0
PAK Pakistan 63 1 0 0
TUR Turkey 64 0 3 5
MEX Mexico 65 0 3 2
ARM Armenia 67 0 3 1
COL Colombia 67 0 3 1
KGZ Kyrgyzstan 69 0 2 4
PRK North Korea 69 0 2 4
LTU Lithuania 70 0 2 2
IND India 71 0 1 5
MDA Moldova 72 0 1 3
KOS Kosovo 73 0 1 1
CYP Cyprus 78 0 1 0
FIJ Fiji 78 0 1 0
JOR Jordan 78 0 1 0
MGL Mongolia 78 0 1 0
PAN Panama 78 0 1 0
TJK Tajikistan 79 0 0 3
ALB Albania 83 0 0 2
GRN Grenada 83 0 0 2
MAS Malaysia 83 0 0 2
PUR Puerto Rico 83 0 0 2
CIV Ivory Coast 90 0 0 1
CPV Cape Verde 90 0 0 1
PER Peru 90 0 0 1
QAT Qatar 90 0 0 1
SGP Singapore 90 0 0 1
SVK Slovakia 90 0 0 1
ZAM Zambia 90 0 0 1
AFG Afghanistan 204 0 0 0
AND Andorra 204 0 0 0
ANG Angola 204 0 0 0
ANT Antigua and Barbuda 204 0 0 0
ARU Aruba 204 0 0 0
ASA American Samoa 204 0 0 0
BAH Bahamas 204 0 0 0
BAN Bangladesh 204 0 0 0
BAR Barbados 204 0 0 0
BDI Burundi 204 0 0 0
BEN Benin 204 0 0 0
BER Bermuda 204 0 0 0
BHU Bhutan 204 0 0 0
BIH Bosnia and Herzegovina 204 0 0 0
BIZ Belize 204 0 0 0
BOL Bolivia 204 0 0 0
BRU Brunei 204 0 0 0
BUR Burkina Faso 204 0 0 0
CAF Central African Republic 204 0 0 0
CAM Cambodia 204 0 0 0
CAY Cayman Islands 204 0 0 0
CGO Republic of the Congo 204 0 0 0
CHA Chad 204 0 0 0
CMR Cameroon 204 0 0 0
COD Democratic Republic of the Congo 204 0 0 0
COK Cook Islands 204 0 0 0
COM Comoros 204 0 0 0
CRC Costa Rica 204 0 0 0
DJI Djibouti 204 0 0 0
ERI Eritrea 204 0 0 0
ESA El Salvador 204 0 0 0
EST Estonia 204 0 0 0
FIN Finland 204 0 0 0
FSM Federated States of Micronesia 204 0 0 0
GAB Gabon 204 0 0 0
GAM Gambia 204 0 0 0
GBS Guinea-Bissau 204 0 0 0
GEQ Equatorial Guinea 204 0 0 0
GHA Ghana 204 0 0 0
GUI Guinea 204 0 0 0
GUM Guam 204 0 0 0
GUY Guyana 204 0 0 0
HAI Haiti 204 0 0 0
HON Honduras 204 0 0 0
IRQ Iraq 204 0 0 0
ISL Iceland 204 0 0 0
ISV Virgin Islands 204 0 0 0
IVB British Virgin Islands 204 0 0 0
KIR Kiribati 204 0 0 0
KSA Saudi Arabia 204 0 0 0
KUW Kuwait 204 0 0 0
LAO Laos 204 0 0 0
LAT Latvia 204 0 0 0
LBA Libya 204 0 0 0
LBN Lebanon 204 0 0 0
LBR Liberia 204 0 0 0
LES Lesotho 204 0 0 0
LIE Liechtenstein 204 0 0 0
LUX Luxembourg 204 0 0 0
MAD Madagascar 204 0 0 0
MAW Malawi 204 0 0 0
MDV Maldives 204 0 0 0
MHL Marshall Islands 204 0 0 0
MKD North Macedonia 204 0 0 0
MLI Mali 204 0 0 0
MLT Malta 204 0 0 0
MNE Montenegro 204 0 0 0
MON Monaco 204 0 0 0
MOZ Mozambique 204 0 0 0
MRI Mauritius 204 0 0 0
MTN Mauritania 204 0 0 0
MYA Myanmar 204 0 0 0
NAM Namibia 204 0 0 0
NCA Nicaragua 204 0 0 0
NEP Nepal 204 0 0 0
NGR Nigeria 204 0 0 0
NIG Niger 204 0 0 0
NRU Nauru 204 0 0 0
OMA Oman 204 0 0 0
PAR Paraguay 204 0 0 0
PLE Palestine 204 0 0 0
PLW Palau 204 0 0 0
PNG Papua New Guinea 204 0 0 0
RWA Rwanda 204 0 0 0
SAM Samoa 204 0 0 0
SEN Senegal 204 0 0 0
SEY Seychelles 204 0 0 0
SKN Saint Kitts and Nevis 204 0 0 0
SLE Sierra Leone 204 0 0 0
SMR San Marino 204 0 0 0
SOL Solomon Islands 204 0 0 0
SOM Somalia 204 0 0 0
SRI Sri Lanka 204 0 0 0
SSD South Sudan 204 0 0 0
STP São Tomé and Príncipe 204 0 0 0
SUD Sudan 204 0 0 0
SUR Suriname 204 0 0 0
SWZ Eswatini 204 0 0 0
SYR Syria 204 0 0 0
TAN Tanzania 204 0 0 0
TGA Tonga 204 0 0 0
TKM Turkmenistan 204 0 0 0
TLS East Timor 204 0 0 0
TOG Togo 204 0 0 0
TTO Trinidad and Tobago 204 0 0 0
TUV Tuvalu 204 0 0 0
UAE United Arab Emirates 204 0 0 0
URU Uruguay 204 0 0 0
VAN Vanuatu 204 0 0 0
VEN Venezuela 204 0 0 0
VIE Vietnam 204 0 0 0
VIN Saint Vincent and the Grenadines 204 0 0 0
YEM Yemen 204 0 0 0
ZIM Zimbabwe 204 0 0 0

Data & resources

The International Olympic Committee maintains a suite of webpages that contain a large amount of data for all previous games. You can access them here although it is not easy to scrape the data in an automated way. Wikipedia has a useful all-time medals table here. Perhaps most useful, however, is the set of medals tables provided by Rob Wood on his website topendsports.com. Even more data is available from a generous Kaggle user (R.Griffin) here.

You are free to use any data you can find to inform your predictions.

You might also be interested in YouTube video-seminars from participants in previous prediction competitions. These include presentations from the winners of the 2020 prediction competition and the winners of the 2023 competition. Finally, if you would like to be kept up to date on the activities of the RSS’s Statistics in Sports Section you can sign up to their mailing list here.

In this YouTube video Dr. Jess Hargreaves (Univ. York) hears from Dr. Johan Rewilak (Univ. South Carolina) on his thoughts about predicting the Olympics.

Lexie Bonas, an undergraduate student at the University of York, has also provided a quick explanation of her submission.

Example scoring calculation

To help forecasters understand the scoring system we provide the following cartoon example. Below is a fictional medals table that is used to compute a true rank for each of five NOCs. The table also includes a fictional set of predicted ranks. The numbers on the far left of the table are just labels arbitrarily (alphabetically) assigned to the NOCs to help refer to them - a bit like the NOC codes in the real medals table.

NOC True_rank Forecast_rank Gold Silver Bronze
3 Cambodia 1 4 2 2 0
4 Denmark 2 4 0 1 0
2 Bahamas 3 2 0 0 1
1 Afghanistan 5 2 0 0 0
5 Ecuador 5 4 0 0 0

To score the predicted ranks we enumerate all pairs of NOCs and, for each one, check whether the true ordering and the predicted ordering match. These orderings are quantified using the \(\text{sign}\) function (which returns values -1, 0 or 1) applied to the difference between the ranks. If the orderings agree the forecaster scores a point, if they disagree the forecaster loses a point. If either the true or predicted ranks are tied for a particular pair of NOCs then no points are scored or lost. To quantify this sort of agreement we multiply the signs of the rank differences together. The results of these calculations for this toy example are presented in the table below.

i j NOC_i NOC_j True_rank_i True_rank_j True_sign Forecast_rank_i Forecast_rank_j Forecast_sign Forecast_score
1 2 Afghanistan Bahamas 5 3 1 2 2 0 0
1 3 Afghanistan Cambodia 5 1 1 2 4 -1 -1
1 4 Afghanistan Denmark 5 2 1 2 4 -1 -1
1 5 Afghanistan Ecuador 5 5 0 2 4 -1 0
2 3 Bahamas Cambodia 3 1 1 2 4 -1 -1
2 4 Bahamas Denmark 3 2 1 2 4 -1 -1
2 5 Bahamas Ecuador 3 5 -1 2 4 -1 1
3 4 Cambodia Denmark 1 2 -1 4 4 0 0
3 5 Cambodia Ecuador 1 5 -1 4 4 0 0
4 5 Denmark Ecuador 2 5 -1 4 4 0 0

The forecaster’s tau score is their cumulative score (the sum of the right-most column) divided by the number of pairs, i.e. -3/10=-0.3.

Extra info

NOCs without state affiliation: The submission template provided did not include rows for Olympics Committees for neutral athletes or refugees. Their ranks will not contribute to the prediction competition scores. The template did include rows for Belorussian and Russian NOCs, which are not taking part in the games. Predictions for these NOCs will not be used when calculating the prediction competition scores.

Bonus fact: Kendall’s tau was devised and analysed by Maurice Kendall who served as President of the Institute of Statisticians, which broke away from then merged with the Royal Statistical Society.

Bonus reminder: We are orienting our ranks so that smaller numbers correspond to more medals. When we talk about the medals table, countries at the top are those with the most medals and the smallest ranks.