FIFA World Cup Historical Analysis

by Omar Baig

1994 World Cup

With the World Cup coming to a conclusion tomorrow, I figured it would be interesting to find any trends in the past tournaments to see which matches had surprising outcomes this year. In the end I hope to offer insights that will add greater context and enjoyment to this year's tournament.

The World Cup is a passion fueled competition that allows nations to transcend their differences and come together, regardless of political and geographic boundaries. Over the years, the tournament has grown with more people attending nearly every tournament. The most attended World Cup was the 1994 World Cup in the United States with almost 3.6 million attendants. While the final numbers are not in yet, it is possible that this year may break that record.

World Cup Attendance

The first question I wanted to answer was which nation, historically, has won the most World Cups. More specifically, I wanted to see who finished in the Top 3 of the tournament. The graphs below show just this.

World Cup Wins World Cup Runner Up World Cup Third Place

To make this more meaningful, a summation of the previous three is shown below, intending to show which nation has performed the best throughout all World Cups since the first tournament in 1930. As we can see, Brazil is the strongest nation followed by Germany FR (West Germany), Italy, Germany, and Argentina.

World Cup Top 3

Another interesting point to take a look at would be the total number of goals scored in each World Cup. Initially it seems that it is trending upwards, with a few high scoring outliers in earlier tournaments. I wanted to believe that these athletes are becoming super performing cyborgs, but the truth is less satisfying. A closer look shows that there were changes in the tournament involving more teams competing and more matches being played which would affect the total number of goals scored.

World Cup Total Goals Scored World Cup Changes

Another piece I was interested in was if being labeled the "Home Team" had any affect over the score, Additionally I wondered if more goals tended to be scored after half-time. It seems both are true based on the data. Of course, only one team in the tournament is the real home team, but it seems there may be power to that formality.

Plot 1

So why was this year's tournament so special?

Based on the data Brazil, Germany, Italy, and Argentina should all make it out of their Groups to the Top 16. Surprisingly, Italy did not even qualify for the World Cup this year (neither did USA but that is less surprising). Germany was knocked out by both Korea, a nation who has placed fourth place once in a World Cup, and Mexico, a nation who has never placed in the top four. Argentina narrowly made it out of their group, facing a loss to Croatia (the heroes of this year's tournament) and a draw with Iceland. Brazil did very well in their group, as one would expect.

Argentina was knocked out by France, a very formidable team who has placed in the top 3 four times including a World Cup home victory in 1998. However, Brazil was then knocked out in the quarter-finals by Belgium, a nation who has only placed in fourth place once. At this point nobody that you would expect to be in the finals, based on the previous tournaments, is there.

Now we come to the final match of Croatia versus France. What makes this standing so interesting is Croatia is a country of less than 5 million people. This is less than many cities in the world. Nobody was betting on them to make it this far. They have, however, done well in the past with a third place standing in the 1998 World Cup in France, the first time they were admitted. They fought hard to get to this point this year, beating Denmark, Russia, and England.

World Cup Changes World Cup Changes

In short, this year's tournament could not have been staged based on historical data of the World Cup. Hopefully this gave some insight on who the top dogs are and how the little guys can come out and surprise you. I know this year's tournament will be one to remember regardless of who wins.

  • The dataset used for this story can be found on Kaggle.
  • The source code for this analysis can be found on this Jupyter Notebook.