Analyzing the German Soccer (Bundesliga) League Using R & The dplyr Package

 

Hello. I have been playing around with German soccer (Bundesliga) data in R using the dplyr package. I have a previous post where I go over dplyr.


Topics

The Bundesliga Data Using bundesligR

The Dataset

Selecting Data Using R’s dplyr Package


The Bundesliga Data Using bundesligR

There is a neat data package in R called bundesligR. bundesligR is also a dataset which contains all final tables of Germany’s top tier soccer league, the Bundesliga.

Notable teams from the Bundesliga include FC Bayern München (Munich), Borussia Dortmund, Bayer 04 Leverkusen, and Borussia Mönchengladbach.

If you have not installed the bundesligR or the dplyr package, you can install them both using:


The Dataset

After installation, we convert the bundesligR dataset into a data frame and name it soccer. We also take a look at the data. The data spans from 1964 to 2016.

The team with the most points at the end of a season is the title winner for that season. The Season variable if the year in which the season starts. In the screenshot above, the 2015 seasons refers to the season from late Summer 2015 to Spring 2016.

Position refers to the ranking on the table. Team is the football team. Played refers to the number of games played in the season. W, D and L refers to Wins Draws and Losses for the team. GF is goals for the team or how many goals scored for the season, GA is short for goals against the team and GD is goal differential which is GF – GA.

With points, a win gives the winning team 3 points, a draw gives 1 point and a loss gives zero points. The points system before 1995 had 2 points for wins under the variable Pts_pre_95.

The full R Documentation of the bundesligR dataset can be found here.

The last column Pts_pre_95 will be removed from the dataset. Also a few column names will be renamed.

The %>% pipe operator is used for easier reading. Instead of select(soccer, -Pts_pre_95), we use soccer %>% select(-Pts_pre_95). The negative sign in front of the column Pts_pre_95 inside select() tells R to remove the specified column. Remove a column is easier than selecting everything else.

The rename() part is used to rename past columns.


Selecting Data Using R’s dplyr Package

Now we use dplyr to help us find some interesting data of the Bundesliga in its history.

2015-2016 Season

Here were the results from last year’s (previous) Bundesliga season. The filter() function is used here.

 

This season was interesting in the sense that it was a very good season for Borussia Dortmund and they were still 10 points away from FC Bayern München. The gap between Borussia Dortmund at second place and third place was 18 points.


Best Season Of All Time

The best season of all time in the Bundesliga belongs to the team which had the most points at the end of the season.

 

For the 2012-2013 season, FC Bayern München won the Bundesliga with a record 91 points. They also won the DFB-Pokal and the UEFA Champions League for that season, winning the treble. (Winning the treble is very difficult.) More information of this season can be found here.


Worst Season Of All Time

The worst season of all time in the Bundesliga belongs to the worst last placed team (and is also relegated to Bundesliga 2 which is the lower tier league).

 

SC Tasmania 1900 Berlin came in dead last in 1965 with 10 points from 2 wins, 4 draws, and 28 losses.


Top 5 Teams Per Season

We can find the top 5 teams per season in this data. As this subset is quite large, we look at the top 5 teams from the 2010-2011 season to the 2015-2016 season.

 

 


Number of Titles for FC Bayern München

We can also find the number of times a certain team wins the Bundesliga title by placing first in a season. Here, we look at FC Bayern München and their number of Bundesliga titles.

 

From 1964 to now (2016), FC Bayern München has won the Bundesliga title 25 times, an impressive feat.


Number Of Titles For Borussia Dortmund

Here are the number of titles for Borussia Dortmund.

 


List of Title Winning Teams In The Bundesliga

Here is the full list of title winning teams in the Bundesliga.


Total Number of Games Played, Wins, Draws, Losses & Goals For FC Bayern München

 


Total Number of Games Played, Wins, Draws and Losses, goals for Bayern Muenchen, Borussia Dortmund & Borussia Moenchengladbach & Bayer 04 Leverkusen

 

We can add win rates as a new column where the win rate is the number of wins divided by the number of games played. The dplyr function mutate() is used to create a new column into the data.

 


Total Number of Games Played, Wins, Draws and Losses, goals for all Teams Who Played In The Bundesliga

 

 


References

The featured image is from http://arysports.tv/wp-content/uploads/2015/11/bundesliga.jpg.

3 thoughts on “Analyzing the German Soccer (Bundesliga) League Using R & The dplyr Package

  1. Claus-Dieter Mayer

    Nice work, but of course one can always find something to nag about ;-). As a Borussia Moenchengladbach supporter I couldnt help but notice that you awarded us 6 league titles instead to the 5 that we have. It seems that is down to the (points == max(points)) statement and the fact that in 1984 3 teams ended up on same points but Stuttgart won the title on goal difference. But thanks for trying to boost up our records, I hope one day that sixth title will come…

    Reply
    1. dkmathstats Post author

      Hmm. Did not notice that. Then it would be 5 titles for Borussia Moenchengladbach and some other teams would be added to the winning teams list.

      The correct filter statement could be something like filter(points == max(points) & GD = max(GD)). I would need to check/test this out.

      Reply
    2. dkmathstats Post author

      Okay. I have fixed it. Instead of filter(points = max(points)), use filter(Position == 1).

      I have also noticed that this output in R gives 4 titles to Borussia Moenchengladbach and 1 title to Bor. Moenchengladbach which would be 5 in total. This slight error in spelling was not noticed in the dataset. Some slight data cleaning would be needed to find the winning years and change the one with Bor. into Borussia.

      Thank you for pointing the error out.

      Reply

Leave a Reply