Return to site

France vs. Croatia: Why was everyone’s world cup prediction WRONG?

· machine learning,prediction,AI,models,Analytics insight

With hindsight, a month seems like an eternity when reviewing all that has transpired in the 2018 World Cup. But if you can remember that far back, you might recall headlines dominated not only by bookmakers but AI and machine learning models.

Not surprisingly, Brazil, Germany and Spain

The MIT Technology Review came up with a few different predictions. First, they combined selections from multiple bookmakers to build a list of the most likely winners, giving Brazil a 16.6% chance of winning, Germany 12.8% and Spain 12.5%. The article then shared results from a model developed by Andreas Groll at the Technical University of Dortmund in Germany which applied a combination of machine learning and statistics. This model covered a wide range of variables, including economic factors like GDP and population (hello Belgium and Croatia!), FIFA rankings, and team/player factors like age, number of superstars, home advantage, etc. From this model Spain arose as the likely winner, with the added twist that Germany would take the cup if they cleared the quarter finals.

Even less surprising: Brazil vs. Germany

Goldman Sachs also took a stab at predicting the tournament, with predictable results (pun intended) showing Brazil taking home the cup.

Just for reference, in 2010 the psychic octopus Paul predicted 12 out of 14 World Cup games, including the final.

So what went wrong?

For starters, when working with predictive models, feedback is critical, especially when factors change. In the case of this tournament many things changed soon after the models were released, including:

  • the firing of Spain’s head coach
  • players receiving red and yellow cards
  • Iranian fans singing all night outside of Ronaldo’s room

The nature of the sport also makes prediction very difficult. A paper written by Brazilian data scientists looks at the challenges associated with predicting winners in four different sports: basketball, volleyball, handball and soccer. Their findings show that skill and luck contribute differently depending on the sport. In sports with high point totals, skill is a very strong prediction factor. Likewise, tournaments with “best-of-x” formats also benefit from player skill. Luck plays a much larger factor in low scoring sports like soccer, or in knock-out tournament formats.

What can we learn from this?

For starters, machine learning is not a silver bullet. Just creating a static model without regularly updating it will leave your organization in a potentially worse situation than not developing the model in the first place.

Secondly, it’s important to know when and where to deploy a statistical or machine learning model. The poor souls who built World Cup prediction models were playing against a stacked deck in a sport where luck is a huge factor. Likewise, in a corporate environment statistical models may not always be appropriate, or at the very least they might be better used for different problems.

And finally, always remember that industry insight, especially from your domain experts is equally if not more important than the models. Presumably, the creators of these machine learning models did not have the opportunity to review their assumptions with players, coaches or other insiders. Ironically, the person who came up with the best predictions was a former player by the name of Steve McManaman who correctly predicted 12 out of 14 World Cup knockout matches. Don’t ever forget that your domain experts have a wealth of knowledge that the machine doesn’t.

Make data-driven decisions. At 3AG Systems we help businesses improve by transforming their data into actionable insights. Learn more at

All Posts

Almost done…

We just sent you an email. Please click the link in the email to confirm your subscription!

OKSubscriptions powered by Strikingly