RSS Olympic prediction competition 2024

Context

Each summer the Royal Statistical Society’s Statistics in Sport Section organizes a prediction competition. For 2024 the competition, which is being sponsored by Amelco, involves predicting the medals table for the Paris Olympics.

This year’s competition is novel in several ways. Firstly, forecasters will be asked to provide a single predicted ranking whose distance from the true ranking at the end of the games will determine their score. This stands in contrast to previous forecasting competitions, in which forecasters have been required to specify probabilities for the outcomes of individual sports matches in a tournament. Secondly, the ‘true ranking’ for the Olympics is not as well defined or officially endorsed as it is for other events.

For the purposes of this competition the true final ranking will be determined by the number of gold medals won by each country. Silver medals will be used only to break ties between countries with equal numbers of gold medals. Bronze medals will be used to break ties between countries after silver medals are accounted for. Countries still tied after accounting for bronze medals will remain tied.

Scoring

Predicted rankings will be scored using a statistic called Kendall’s tau, or the Kendall rank correlation coefficient. There are (at least) two ways to compute this statistic, which shed light on its meaning. To understand what the statistic quantifies it is first necessary to think about the all the possible pair-wise comparisons between countries. With \(n\) participating countries there are \(n(n-1)/2=\binom{n}{2}\) of these. A forecaster will gain points for every pair which they have placed in the correct order, and they will lose a point for every pair which they placed in the wrong order. If a pair countries are tied, either in the predicted ranking or the true ranking, this pair does not contribute to the score. The score is then scaled by the total number of pairs so that it falls between -1 and 1.

We can write this down as

\[\begin{align} \tau & = \frac{\text{number of concordant pairs}-\text{number of discordant pairs}}{\text{number of pairs}} \\ & = \frac{2}{n(n-1)} \sum_{i<j}\text{sign}(R_i-R_j)\text{sign}(S_i-S_j) \end{align}\]

where ‘concordant’ is a word used in the academic literature to mean ‘in agreement’. Similarly ‘discordant’ means ‘in disagreement’. For example, if a forecaster ranks Kiribati higher than Tanzania and, indeed, Kiribati is higher than Tanzania in the final true ranking then we say that the forecaster has a concordant pair with the true ranking. As alluded to in the equation above, it is also possible to show that Kendall’s tau is the correlation between the signs of the differences in ranks. In this expression \(R_i\) denotes the numerical rank assigned by a forecaster to a country (e.g. if a forecaster predicted country \(i\) to be second from the top in the medals table they would set \(R_i=2\)). The quantity \(S_i\) denotes the true rank for country \(i\). The \(\text{sign}\) function used here returns value one when its argument is positive, minus one when its argument is negative and zero when its argument is zero.

There are 206 countries participating in the 2024 Olympic games, although officially they are referred to as National Olympic Committees (NOCs) rather than countries. In the submission template you will find a list of the NOCs and, as a default, they are all ranked jointly at 206th. This default entry will receive a Kendall’s tau score of zero because, as mentioned above, tied pairs contribute nothing to the score - they are considered neither concordant nor discordant. Forecasters are invited to modify the ranks in any way they see fit. You could, for example,

  • cluster countries into groups all with the same rank,
  • use non-integer ranks,
  • use ranks that go below 1!

The only thing that matters is that your ranks allow for the pairwise comparisons that contribute to Kendall’s tau.

Submission template

You can download the submission template here. Only modify the numbers in the Rank column. In particular do not permute the rows of the NOCs.

Email completed submissions to ben.powell@york.ac.uk with the subject line SiS forecasting competition 2024. This email should contain an attached .csv file called “RSS_pred_comp_submission_NAME.csv” where NAME is the name you would like to appear on the leader board.

If you would like to be considered for the methodology prize, please provide a brief description of how you made your forecast predictions.

Please make your submissions before Monday 22nd July. The games start on Friday 26th July but it will be necessary to have submission files a bit earlier in order to initialize the forecast scoring procedure and the leader board.

Prizes

The principal and most valuable prize for winning the forecasting competition is prestige (and a certificate)! Special attention and corresponding certificates will be awarded for

  1. Best overall score
  2. Best student score
  3. Most innovative methodology

The winners of these three awards will also be invited to the Royal Statistical Society conference to present their methods. This year’s conference will be held in Brighton in early September. Conference fees and travel/accommodation costs are being provided by sponsors Amelco.

Leader board

Note: the leader board is currently populated with simulated data for testing purposes.

Forecaster Rank Tau
Rank_Deficient 1 0.563
user2 2 0.013
Everyone_is_(equally)_awesome 3 0.000
user1 4 -0.047

Medals table

Note: the table below is populated with data from 2020 for the purposes of testing the leader board

X Code NOC Rank Gold Silver Bronze
198 USA United States 1 39 41 33
39 CHN China 2 38 32 18
97 JPN Japan 3 27 14 17
69 GBR Great Britain 4 22 20 22
159 RUS Russia 5 20 28 23
11 AUS Australia 6 17 7 22
136 NED Netherlands 7 10 12 14
65 FRA France 8 10 12 11
73 GER Germany 9 10 11 16
93 ITA Italy 10 10 10 20
34 CAN Canada 11 7 6 11
27 BRA Brazil 12 7 6 8
142 NZL New Zealand 13 7 6 7
49 CUB Cuba 14 7 3 5
84 HUN Hungary 15 6 7 7
102 KOR Korea 16 6 4 10
152 POL Poland 17 4 5 5
51 CZE Czechia 18 4 4 3
99 KEN Kenya 19 4 4 2
140 NOR Norway 20 4 2 2
95 JAM Jamaica 21 4 1 4
60 ESP Spain 22 3 8 6
179 SWE Sweden 23 3 6 0
176 SUI Switzerland 24 3 4 6
52 DEN Denmark 25 3 4 4
48 CRO Croatia 26 3 3 2
87 IRI Iran 27 3 2 2
171 SRB Serbia 28 3 1 5
18 BEL Belgium 29 3 1 3
30 BUL Bulgaria 30 3 1 2
167 SLO Slovenia 31 3 1 1
199 UZB Uzbekistan 32 3 0 2
71 GEO Georgia 33 2 5 1
189 TPE Chinese Taipei 34 2 4 6
192 TUR Turkey 35 2 2 9
75 GRE Greece 37 2 1 1
195 UGA Uganda 37 2 1 1
56 ECU Ecuador 38 2 1 0
88 IRL Ireland 40 2 0 2
91 ISR Israel 40 2 0 2
156 QAT Qatar 41 2 0 1
14 BAH Bahamas 43 2 0 0
103 KOS Kosovo 43 2 0 0
196 UKR Ukraine 44 1 6 12
24 BLR Belarus 45 1 3 3
157 ROU Romania 47 1 3 0
201 VEN Venezuela 47 1 3 0
86 IND India 48 1 2 4
82 HKG Hong Kong, China 49 1 2 3
148 PHI Philippines 51 1 2 1
178 SVK Slovakia 51 1 2 1
158 RSA South Africa 52 1 2 0
12 AUT Austria 53 1 1 5
57 EGY Egypt 54 1 1 4
85 INA Indonesia 55 1 1 3
62 ETH Ethiopia 57 1 1 2
153 POR Portugal 57 1 1 2
191 TUN Tunisia 58 1 1 0
61 EST Estonia 62 1 0 1
63 FIJ Fiji 62 1 0 1
107 LAT Latvia 62 1 0 1
184 THA Thailand 62 1 0 1
20 BER Bermuda 65 1 0 0
117 MAR Morocco 65 1 0 0
155 PUR Puerto Rico 65 1 0 0
44 COL Colombia 66 0 4 1
13 AZE Azerbaijan 67 0 3 4
55 DOM Dominican Republic 68 0 3 2
8 ARM Armenia 69 0 2 2
100 KGZ Kyrgyzstan 70 0 2 1
123 MGL Mongolia 71 0 1 3
7 ARG Argentina 73 0 1 2
168 SMR San Marino 73 0 1 2
96 JOR Jordan 76 0 1 1
118 MAS Malaysia 76 0 1 1
138 NGR Nigeria 76 0 1 1
28 BRN Bahrain 82 0 1 0
104 KSA Saudi Arabia 82 0 1 0
114 LTU Lithuania 82 0 1 0
125 MKD North Macedonia 82 0 1 0
134 NAM Namibia 82 0 1 0
186 TKM Turkmenistan 82 0 1 0
98 KAZ Kazakhstan 83 0 0 8
122 MEX Mexico 84 0 0 4
64 FIN Finland 85 0 0 2
26 BOT Botswana 93 0 0 1
31 BUR Burkina Faso 93 0 0 1
40 CIV Ivory Coast 93 0 0 1
74 GHA Ghana 93 0 0 1
76 GRN Grenada 93 0 0 1
105 KUW Kuwait 93 0 0 1
120 MDA Moldova 93 0 0 1
181 SYR Syria 93 0 0 1
1 AFG Afghanistan 206 0 0 0
2 ALB Albania 206 0 0 0
3 ALG Algeria 206 0 0 0
4 AND Andorra 206 0 0 0
5 ANG Angola 206 0 0 0
6 ANT Antigua and Barbuda 206 0 0 0
9 ARU Aruba 206 0 0 0
10 ASA American Samoa 206 0 0 0
15 BAN Bangladesh 206 0 0 0
16 BAR Barbados 206 0 0 0
17 BDI Burundi 206 0 0 0
19 BEN Benin 206 0 0 0
21 BHU Bhutan 206 0 0 0
22 BIH Bosnia and Herzegovina 206 0 0 0
23 BIZ Belize 206 0 0 0
25 BOL Bolivia 206 0 0 0
29 BRU Brunei 206 0 0 0
32 CAF Central African Republic 206 0 0 0
33 CAM Cambodia 206 0 0 0
35 CAY Cayman Islands 206 0 0 0
36 CGO Republic of the Congo 206 0 0 0
37 CHA Chad 206 0 0 0
38 CHI Chile 206 0 0 0
41 CMR Cameroon 206 0 0 0
42 COD Democratic Republic of the Congo 206 0 0 0
43 COK Cook Islands 206 0 0 0
45 COM Comoros 206 0 0 0
46 CPV Cape Verde 206 0 0 0
47 CRC Costa Rica 206 0 0 0
50 CYP Cyprus 206 0 0 0
53 DJI Djibouti 206 0 0 0
54 DMA Dominica 206 0 0 0
58 ERI Eritrea 206 0 0 0
59 ESA El Salvador 206 0 0 0
66 FSM Federated States of Micronesia 206 0 0 0
67 GAB Gabon 206 0 0 0
68 GAM The Gambia 206 0 0 0
70 GBS Guinea-Bissau 206 0 0 0
72 GEQ Equatorial Guinea 206 0 0 0
77 GUA Guatemala 206 0 0 0
78 GUI Guinea 206 0 0 0
79 GUM Guam 206 0 0 0
80 GUY Guyana 206 0 0 0
81 HAI Haiti 206 0 0 0
83 HON Honduras 206 0 0 0
89 IRQ Iraq 206 0 0 0
90 ISL Iceland 206 0 0 0
92 ISV Virgin Islands 206 0 0 0
94 IVB British Virgin Islands 206 0 0 0
101 KIR Kiribati 206 0 0 0
106 LAO Laos 206 0 0 0
108 LBA Libya 206 0 0 0
109 LBN Lebanon 206 0 0 0
110 LBR Liberia 206 0 0 0
111 LCA Saint Lucia 206 0 0 0
112 LES Lesotho 206 0 0 0
113 LIE Liechtenstein 206 0 0 0
115 LUX Luxembourg 206 0 0 0
116 MAD Madagascar 206 0 0 0
119 MAW Malawi 206 0 0 0
121 MDV Maldives 206 0 0 0
124 MHL Marshall Islands 206 0 0 0
126 MLI Mali 206 0 0 0
127 MLT Malta 206 0 0 0
128 MNE Montenegro 206 0 0 0
129 MON Monaco 206 0 0 0
130 MOZ Mozambique 206 0 0 0
131 MRI Mauritius 206 0 0 0
132 MTN Mauritania 206 0 0 0
133 MYA Myanmar 206 0 0 0
135 NCA Nicaragua 206 0 0 0
137 NEP Nepal 206 0 0 0
139 NIG Niger 206 0 0 0
141 NRU Nauru 206 0 0 0
143 OMA Oman 206 0 0 0
144 PAK Pakistan 206 0 0 0
145 PAN Panama 206 0 0 0
146 PAR Paraguay 206 0 0 0
147 PER Peru 206 0 0 0
149 PLE Palestine 206 0 0 0
150 PLW Palau 206 0 0 0
151 PNG Papua New Guinea 206 0 0 0
154 PRK North Korea 206 0 0 0
160 RWA Rwanda 206 0 0 0
161 SAM Samoa 206 0 0 0
162 SEN Senegal 206 0 0 0
163 SEY Seychelles 206 0 0 0
164 SGP Singapore 206 0 0 0
165 SKN Saint Kitts and Nevis 206 0 0 0
166 SLE Sierra Leone 206 0 0 0
169 SOL Solomon Islands 206 0 0 0
170 SOM Somalia 206 0 0 0
172 SRI Sri Lanka 206 0 0 0
173 SSD South Sudan 206 0 0 0
174 STP São Tomé and Príncipe 206 0 0 0
175 SUD Sudan 206 0 0 0
177 SUR Suriname 206 0 0 0
180 SWZ Eswatini 206 0 0 0
182 TAN Tanzania 206 0 0 0
183 TGA Tonga 206 0 0 0
185 TJK Tajikistan 206 0 0 0
187 TLS East Timor 206 0 0 0
188 TOG Togo 206 0 0 0
190 TTO Trinidad and Tobago 206 0 0 0
193 TUV Tuvalu 206 0 0 0
194 UAE United Arab Emirates 206 0 0 0
197 URU Uruguay 206 0 0 0
200 VAN Vanuatu 206 0 0 0
202 VIE Vietnam 206 0 0 0
203 VIN Saint Vincent and the Grenadines 206 0 0 0
204 YEM Yemen 206 0 0 0
205 ZAM Zambia 206 0 0 0
206 ZIM Zimbabwe 206 0 0 0

Data & resources

The International Olympic Committee maintains a suite of webpages that contain a large amount of data for all previous games. You can access them here although it is not easy to scrape the data in an automated way. Wikipedia has a useful all-time medals table here. Perhaps most useful, however, is the set of medals tables provided by Rob Wood on his website topendsports.com. Even more data is available from a generous Kaggle user (R.Griffin) here.

You are free to use any data you can find to inform your predictions.

You might also be interested in YouTube video-seminars from participants in previous prediction competitions. These include presentations from the winners of the 2020 prediction competition and the winners of the 2023 competition. Finally, if you would like to be kept up to date on the activities of the RSS’s Statistics in Sports Section you can sign up to their mailing list here.

Example scoring calculation

To help forecasters understand the scoring system we provide the following cartoon example. Below is a fictional medals table that is used to compute a true rank for each of five NOCs. The table also includes a fictional set of predicted ranks. The numbers on the far left of the table are just labels arbitrarily (alphabetically) assigned to the NOCs to help refer to them - a bit like the NOC codes in the real medals table.

NOC True_rank Forecast_rank Gold Silver Bronze
3 Cambodia 1 4 2 2 0
4 Denmark 2 4 0 1 0
2 Bahamas 3 2 0 0 1
1 Afghanistan 5 2 0 0 0
5 Ecuador 5 4 0 0 0

To score the predicted ranks we enumerate all pairs of NOCs and, for each one, check whether the true ordering and the predicted ordering match. These orderings are quantified using the \(\text{sign}\) function (which returns values -1, 0 or 1) applied to the difference between the ranks. If the orderings agree the forecaster scores a point, if they disagree the forecaster loses a point. If either the true or predicted ranks are tied for a particular pair of NOCs then no points are scored or lost. To quantify this sort of agreement we multiply the signs of the rank differences together. The results of these calculations for this toy example are presented in the table below.

i j NOC_i NOC_j True_rank_i True_rank_j True_sign Forecast_rank_i Forecast_rank_j Forecast_sign Forecast_score
1 2 Afghanistan Bahamas 5 3 1 2 2 0 0
1 3 Afghanistan Cambodia 5 1 1 2 4 -1 -1
1 4 Afghanistan Denmark 5 2 1 2 4 -1 -1
1 5 Afghanistan Ecuador 5 5 0 2 4 -1 0
2 3 Bahamas Cambodia 3 1 1 2 4 -1 -1
2 4 Bahamas Denmark 3 2 1 2 4 -1 -1
2 5 Bahamas Ecuador 3 5 -1 2 4 -1 1
3 4 Cambodia Denmark 1 2 -1 4 4 0 0
3 5 Cambodia Ecuador 1 5 -1 4 4 0 0
4 5 Denmark Ecuador 2 5 -1 4 4 0 0

The forecaster’s tau score is their cumulative score (the sum of the right-most column) divided by the number of pairs, i.e. -3/10=-0.3.

Extra info

Bonus fact: Kendall’s tau was devised and analysed by Maurice Kendall who served as President of the Institute of Statisticians, which broke away from then merged with the Royal Statistical Society.

Bonus reminder: We are orienting our ranks so that smaller numbers correspond to more medals. When we talk about the medals table, countries at the top are those with the most medals and the smallest ranks.