RSS Olympic prediction competition 2024

Context

Each summer the Royal Statistical Society’s Statistics in Sport Section organizes a prediction competition. For 2024 the competition, which is being sponsored by Amelco, involves predicting the medals table for the Paris Olympics.

This year’s competition is novel in several ways. Firstly, forecasters will be asked to provide a single predicted ranking whose distance from the true ranking at the end of the games will determine their score. This stands in contrast to previous forecasting competitions, in which forecasters have been required to specify probabilities for the outcomes of individual sports matches in a tournament. Secondly, the ‘true ranking’ for the Olympics is not as well defined or officially endorsed as it is for other events.

For the purposes of this competition the true final ranking will be determined by the number of gold medals won by each country. Silver medals will be used only to break ties between countries with equal numbers of gold medals. Bronze medals will be used to break ties between countries after silver medals are accounted for. Countries still tied after accounting for bronze medals will remain tied.

Scoring

Predicted rankings will be scored using a statistic called Kendall’s tau, or the Kendall rank correlation coefficient. There are (at least) two ways to compute this statistic, which shed light on its meaning. To understand what the statistic quantifies it is first necessary to think about the all the possible pair-wise comparisons between countries. With \(n\) participating countries there are \(n(n-1)/2=\binom{n}{2}\) of these. A forecaster will gain points for every pair which they have placed in the correct order, and they will lose a point for every pair which they placed in the wrong order. If a pair countries are tied, either in the predicted ranking or the true ranking, this pair does not contribute to the score. The score is then scaled by the total number of pairs so that it falls between -1 and 1.

We can write this down as

\[\begin{align} \tau & = \frac{\text{number of concordant pairs}-\text{number of discordant pairs}}{\text{number of pairs}} \\ & = \frac{2}{n(n-1)} \sum_{i<j}\text{sign}(R_i-R_j)\text{sign}(S_i-S_j) \end{align}\]

where ‘concordant’ is a word used in the academic literature to mean ‘in agreement’. Similarly ‘discordant’ means ‘in disagreement’. For example, if a forecaster ranks Kiribati higher than Tanzania and, indeed, Kiribati is higher than Tanzania in the final true ranking then we say that the forecaster has a concordant pair with the true ranking. As alluded to in the equation above, it is also possible to show that Kendall’s tau is the correlation between the signs of the differences in ranks. In this expression \(R_i\) denotes the numerical rank assigned by a forecaster to a country (e.g. if a forecaster predicted country \(i\) to be second from the top in the medals table they would set \(R_i=2\)). The quantity \(S_i\) denotes the true rank for country \(i\). The \(\text{sign}\) function used here returns value one when its argument is positive, minus one when its argument is negative and zero when its argument is zero.

There are 206 countries participating in the 2024 Olympic games, although officially they are referred to as National Olympic Committees (NOCs) rather than countries. In the submission template you will find a list of the NOCs and, as a default, they are all ranked jointly at 206th. This default entry will receive a Kendall’s tau score of zero because, as mentioned above, tied pairs contribute nothing to the score - they are considered neither concordant nor discordant. Forecasters are invited to modify the ranks in any way they see fit. You could, for example,

cluster countries into groups all with the same rank,
use non-integer ranks,
use ranks that go below 1!

The only thing that matters is that your ranks allow for the pairwise comparisons that contribute to Kendall’s tau.

Submission template

You can download the submission template here. Only modify the numbers in the Rank column. In particular do not permute the rows of the NOCs.

Email completed submissions to ben.powell@york.ac.uk with the subject line SiS forecasting competition 2024. This email should contain an attached .csv file called “RSS_pred_comp_submission_NAME.csv” where NAME is the name you would like to appear on the leader board.

If you would like to be considered for the methodology prize, please provide a brief description of how you made your forecast predictions.

Please make your submissions before Monday 22nd July. The games start on Friday 26th July but it will be necessary to have submission files a bit earlier in order to initialize the forecast scoring procedure and the leader board.

Prizes

The principal and most valuable prize for winning the forecasting competition is prestige (and a certificate)! Special attention and corresponding certificates will be awarded for

Best overall score
Best student score
Most innovative methodology

The winners of these three awards will also be invited to the Royal Statistical Society conference to present their methods. This year’s conference will be held in Brighton in early September. Conference fees and travel/accommodation costs are being provided by sponsors Amelco.

Leader board

The competition is now over and our winner is John Edwards. Well done John and everyone who took part!

Forecaster	Rank	Tau
John Edwards	1	0.596
mohamed	2	0.564
OlymPicks	3	0.544
Hammers_O’Callaghan	4	0.543
ALEC	5	0.540
AJM	6	0.534
fa2410	7	0.533
ChristopherWharton	8	0.532
JKH	9	0.513
Orla_S	10	0.511
KaitoGoto	11	0.509
JoePenn	13	0.508
JohnDSouza	13	0.508
JPN2020	14	0.507
HarrySnart	16	0.495
WilliamBowers	16	0.495
LexieB	17	0.493
ANIK	18	0.486
Rank_Deficient	19	0.480
WeatherQuant	20	0.455
Kizmet24	21	0.426
tghaynes	22	0.395
Everyone_is_(equally)_awesome	23	0.000

Medals table

Last updated: 2024-08-11 21:13:10

X	Code	NOC	Rank	Gold	Silver	Bronze
198	USA	United States	1	40	44	42
39	CHN	China	2	40	27	24
97	JPN	Japan	3	20	12	13
11	AUS	Australia	4	18	19	16
65	FRA	France	5	16	26	22
136	NED	Netherlands	6	15	7	12
69	GBR	Great Britain	7	14	22	29
102	KOR	South Korea	8	13	9	10
93	ITA	Italy	9	12	13	15
73	GER	Germany	10	12	13	8
142	NZL	New Zealand	11	10	7	3
34	CAN	Canada	12	9	7	11
199	UZB	Uzbekistan	13	8	2	3
84	HUN	Hungary	14	6	7	6
60	ESP	Spain	15	5	4	9
179	SWE	Sweden	16	4	4	3
99	KEN	Kenya	17	4	2	5
140	NOR	Norway	18	4	1	3
88	IRL	Ireland	19	4	0	3
27	BRA	Brazil	20	3	7	10
87	IRI	Iran	21	3	6	3
196	UKR	Ukraine	22	3	5	4
157	ROU	Romania	23	3	4	2
71	GEO	Georgia	24	3	3	1
18	BEL	Belgium	25	3	1	6
30	BUL	Bulgaria	26	3	1	3
171	SRB	Serbia	27	3	1	1
51	CZE	Czechia	28	3	0	2
52	DEN	Denmark	29	2	2	5
13	AZE	Azerbaijan	31	2	2	3
48	CRO	Croatia	31	2	2	3
49	CUB	Cuba	32	2	1	6
28	BRN	Bahrain	33	2	1	1
167	SLO	Slovenia	34	2	1	0
189	TPE	Chinese Taipei	35	2	0	5
12	AUT	Austria	36	2	0	3
82	HKG	Hong Kong, China	38	2	0	2
148	PHI	Philippines	38	2	0	2
3	ALG	Algeria	40	2	0	1
85	INA	Indonesia	40	2	0	1
91	ISR	Israel	41	1	5	1
152	POL	Poland	42	1	4	5
98	KAZ	Kazakhstan	43	1	3	3
95	JAM	Jamaica	46	1	3	2
158	RSA	South Africa	46	1	3	2
184	THA	Thailand	46	1	3	2
62	ETH	Ethiopia	47	1	3	0
176	SUI	Switzerland	48	1	2	5
56	ECU	Ecuador	49	1	2	2
153	POR	Portugal	50	1	2	1
75	GRE	Greece	51	1	1	6
7	ARG	Argentina	54	1	1	1
57	EGY	Egypt	54	1	1	1
191	TUN	Tunisia	54	1	1	1
26	BOT	Botswana	58	1	1	0
38	CHI	Chile	58	1	1	0
111	LCA	Saint Lucia	58	1	1	0
195	UGA	Uganda	58	1	1	0
55	DOM	Dominican Republic	59	1	0	2
77	GUA	Guatemala	61	1	0	1
117	MAR	Morocco	61	1	0	1
54	DMA	Dominica	63	1	0	0
144	PAK	Pakistan	63	1	0	0
192	TUR	Turkey	64	0	3	5
122	MEX	Mexico	65	0	3	2
8	ARM	Armenia	67	0	3	1
44	COL	Colombia	67	0	3	1
100	KGZ	Kyrgyzstan	69	0	2	4
154	PRK	North Korea	69	0	2	4
114	LTU	Lithuania	70	0	2	2
86	IND	India	71	0	1	5
120	MDA	Moldova	72	0	1	3
103	KOS	Kosovo	73	0	1	1
50	CYP	Cyprus	78	0	1	0
63	FIJ	Fiji	78	0	1	0
96	JOR	Jordan	78	0	1	0
123	MGL	Mongolia	78	0	1	0
145	PAN	Panama	78	0	1	0
185	TJK	Tajikistan	79	0	0	3
2	ALB	Albania	83	0	0	2
76	GRN	Grenada	83	0	0	2
118	MAS	Malaysia	83	0	0	2
155	PUR	Puerto Rico	83	0	0	2
40	CIV	Ivory Coast	89	0	0	1
46	CPV	Cape Verde	89	0	0	1
147	PER	Peru	89	0	0	1
156	QAT	Qatar	89	0	0	1
164	SGP	Singapore	89	0	0	1
178	SVK	Slovakia	89	0	0	1
1	AFG	Afghanistan	204	0	0	0
4	AND	Andorra	204	0	0	0
5	ANG	Angola	204	0	0	0
6	ANT	Antigua and Barbuda	204	0	0	0
9	ARU	Aruba	204	0	0	0
10	ASA	American Samoa	204	0	0	0
14	BAH	Bahamas	204	0	0	0
15	BAN	Bangladesh	204	0	0	0
16	BAR	Barbados	204	0	0	0
17	BDI	Burundi	204	0	0	0
19	BEN	Benin	204	0	0	0
20	BER	Bermuda	204	0	0	0
21	BHU	Bhutan	204	0	0	0
22	BIH	Bosnia and Herzegovina	204	0	0	0
23	BIZ	Belize	204	0	0	0
25	BOL	Bolivia	204	0	0	0
29	BRU	Brunei	204	0	0	0
31	BUR	Burkina Faso	204	0	0	0
32	CAF	Central African Republic	204	0	0	0
33	CAM	Cambodia	204	0	0	0
35	CAY	Cayman Islands	204	0	0	0
36	CGO	Republic of the Congo	204	0	0	0
37	CHA	Chad	204	0	0	0
41	CMR	Cameroon	204	0	0	0
42	COD	Democratic Republic of the Congo	204	0	0	0
43	COK	Cook Islands	204	0	0	0
45	COM	Comoros	204	0	0	0
47	CRC	Costa Rica	204	0	0	0
53	DJI	Djibouti	204	0	0	0
58	ERI	Eritrea	204	0	0	0
59	ESA	El Salvador	204	0	0	0
61	EST	Estonia	204	0	0	0
64	FIN	Finland	204	0	0	0
66	FSM	Federated States of Micronesia	204	0	0	0
67	GAB	Gabon	204	0	0	0
68	GAM	The Gambia	204	0	0	0
70	GBS	Guinea-Bissau	204	0	0	0
72	GEQ	Equatorial Guinea	204	0	0	0
74	GHA	Ghana	204	0	0	0
78	GUI	Guinea	204	0	0	0
79	GUM	Guam	204	0	0	0
80	GUY	Guyana	204	0	0	0
81	HAI	Haiti	204	0	0	0
83	HON	Honduras	204	0	0	0
89	IRQ	Iraq	204	0	0	0
90	ISL	Iceland	204	0	0	0
92	ISV	Virgin Islands	204	0	0	0
94	IVB	British Virgin Islands	204	0	0	0
101	KIR	Kiribati	204	0	0	0
104	KSA	Saudi Arabia	204	0	0	0
105	KUW	Kuwait	204	0	0	0
106	LAO	Laos	204	0	0	0
107	LAT	Latvia	204	0	0	0
108	LBA	Libya	204	0	0	0
109	LBN	Lebanon	204	0	0	0
110	LBR	Liberia	204	0	0	0
112	LES	Lesotho	204	0	0	0
113	LIE	Liechtenstein	204	0	0	0
115	LUX	Luxembourg	204	0	0	0
116	MAD	Madagascar	204	0	0	0
119	MAW	Malawi	204	0	0	0
121	MDV	Maldives	204	0	0	0
124	MHL	Marshall Islands	204	0	0	0
125	MKD	North Macedonia	204	0	0	0
126	MLI	Mali	204	0	0	0
127	MLT	Malta	204	0	0	0
128	MNE	Montenegro	204	0	0	0
129	MON	Monaco	204	0	0	0
130	MOZ	Mozambique	204	0	0	0
131	MRI	Mauritius	204	0	0	0
132	MTN	Mauritania	204	0	0	0
133	MYA	Myanmar	204	0	0	0
134	NAM	Namibia	204	0	0	0
135	NCA	Nicaragua	204	0	0	0
137	NEP	Nepal	204	0	0	0
138	NGR	Nigeria	204	0	0	0
139	NIG	Niger	204	0	0	0
141	NRU	Nauru	204	0	0	0
143	OMA	Oman	204	0	0	0
146	PAR	Paraguay	204	0	0	0
149	PLE	Palestine	204	0	0	0
150	PLW	Palau	204	0	0	0
151	PNG	Papua New Guinea	204	0	0	0
160	RWA	Rwanda	204	0	0	0
161	SAM	Samoa	204	0	0	0
162	SEN	Senegal	204	0	0	0
163	SEY	Seychelles	204	0	0	0
165	SKN	Saint Kitts and Nevis	204	0	0	0
166	SLE	Sierra Leone	204	0	0	0
168	SMR	San Marino	204	0	0	0
169	SOL	Solomon Islands	204	0	0	0
170	SOM	Somalia	204	0	0	0
172	SRI	Sri Lanka	204	0	0	0
173	SSD	South Sudan	204	0	0	0
174	STP	São Tomé and Príncipe	204	0	0	0
175	SUD	Sudan	204	0	0	0
177	SUR	Suriname	204	0	0	0
180	SWZ	Eswatini	204	0	0	0
181	SYR	Syria	204	0	0	0
182	TAN	Tanzania	204	0	0	0
183	TGA	Tonga	204	0	0	0
186	TKM	Turkmenistan	204	0	0	0
187	TLS	East Timor	204	0	0	0
188	TOG	Togo	204	0	0	0
190	TTO	Trinidad and Tobago	204	0	0	0
193	TUV	Tuvalu	204	0	0	0
194	UAE	United Arab Emirates	204	0	0	0
197	URU	Uruguay	204	0	0	0
200	VAN	Vanuatu	204	0	0	0
201	VEN	Venezuela	204	0	0	0
202	VIE	Vietnam	204	0	0	0
203	VIN	Saint Vincent and the Grenadines	204	0	0	0
204	YEM	Yemen	204	0	0	0
205	ZAM	Zambia	204	0	0	0
206	ZIM	Zimbabwe	204	0	0	0

Data & resources

The International Olympic Committee maintains a suite of webpages that contain a large amount of data for all previous games. You can access them here although it is not easy to scrape the data in an automated way. Wikipedia has a useful all-time medals table here. Perhaps most useful, however, is the set of medals tables provided by Rob Wood on his website topendsports.com. Even more data is available from a generous Kaggle user (R.Griffin) here.

You are free to use any data you can find to inform your predictions.

You might also be interested in YouTube video-seminars from participants in previous prediction competitions. These include presentations from the winners of the 2020 prediction competition and the winners of the 2023 competition. Finally, if you would like to be kept up to date on the activities of the RSS’s Statistics in Sports Section you can sign up to their mailing list here.

In this YouTube video Dr. Jess Hargreaves (Univ. York) hears from Dr. Johan Rewilak (Univ. South Carolina) on his thoughts about predicting the Olympics.

Lexie Bonas, an undergraduate student at the University of York, has also provided a quick explanation of her submission.

Example scoring calculation

To help forecasters understand the scoring system we provide the following cartoon example. Below is a fictional medals table that is used to compute a true rank for each of five NOCs. The table also includes a fictional set of predicted ranks. The numbers on the far left of the table are just labels arbitrarily (alphabetically) assigned to the NOCs to help refer to them - a bit like the NOC codes in the real medals table.

	NOC	True_rank	Forecast_rank	Gold	Silver	Bronze
3	Cambodia	1	4	2	2	0
4	Denmark	2	4	0	1	0
2	Bahamas	3	2	0	0	1
1	Afghanistan	5	2	0	0	0
5	Ecuador	5	4	0	0	0

To score the predicted ranks we enumerate all pairs of NOCs and, for each one, check whether the true ordering and the predicted ordering match. These orderings are quantified using the \(\text{sign}\) function (which returns values -1, 0 or 1) applied to the difference between the ranks. If the orderings agree the forecaster scores a point, if they disagree the forecaster loses a point. If either the true or predicted ranks are tied for a particular pair of NOCs then no points are scored or lost. To quantify this sort of agreement we multiply the signs of the rank differences together. The results of these calculations for this toy example are presented in the table below.

i	j	NOC_i	NOC_j	True_rank_i	True_rank_j	True_sign	Forecast_rank_i	Forecast_rank_j	Forecast_sign	Forecast_score
1	2	Afghanistan	Bahamas	5	3	1	2	2	0	0
1	3	Afghanistan	Cambodia	5	1	1	2	4	-1	-1
1	4	Afghanistan	Denmark	5	2	1	2	4	-1	-1
1	5	Afghanistan	Ecuador	5	5	0	2	4	-1	0
2	3	Bahamas	Cambodia	3	1	1	2	4	-1	-1
2	4	Bahamas	Denmark	3	2	1	2	4	-1	-1
2	5	Bahamas	Ecuador	3	5	-1	2	4	-1	1
3	4	Cambodia	Denmark	1	2	-1	4	4	0	0
3	5	Cambodia	Ecuador	1	5	-1	4	4	0	0
4	5	Denmark	Ecuador	2	5	-1	4	4	0	0

The forecaster’s tau score is their cumulative score (the sum of the right-most column) divided by the number of pairs, i.e. -3/10=-0.3.

Extra info

NOCs without state affiliation: The submission template provided did not include rows for Olympics Committees for neutral athletes or refugees. Their ranks will not contribute to the prediction competition scores. The template did include rows for Belorussian and Russian NOCs, which are not taking part in the games. Predictions for these NOCs will not be used when calculating the prediction competition scores.

Bonus fact: Kendall’s tau was devised and analysed by Maurice Kendall who served as President of the Institute of Statisticians, which broke away from then merged with the Royal Statistical Society.

Bonus reminder: We are orienting our ranks so that smaller numbers correspond to more medals. When we talk about the medals table, countries at the top are those with the most medals and the smallest ranks.