Hello. I have been playing around with German soccer (Bundesliga) data in R using the dplyr package. I have a previous post where I go over dplyr.
The Bundesliga Data Using bundesligR
There is a neat data package in R called bundesligR. bundesligR is also a dataset which contains all final tables of Germany’s top tier soccer league, the Bundesliga.
Notable teams from the Bundesliga include FC Bayern München (Munich), Borussia Dortmund, Bayer 04 Leverkusen, and Borussia Mönchengladbach.
If you have not installed the bundesligR or the dplyr package, you can install them both using:
After installation, we convert the bundesligR dataset into a data frame and name it soccer. We also take a look at the data. The data spans from 1964 to 2016.
The team with the most points at the end of a season is the title winner for that season. The Season variable if the year in which the season starts. In the screenshot above, the 2015 seasons refers to the season from late Summer 2015 to Spring 2016.
Position refers to the ranking on the table. Team is the football team. Played refers to the number of games played in the season. W, D and L refers to Wins Draws and Losses for the team. GF is goals for the team or how many goals scored for the season, GA is short for goals against the team and GD is goal differential which is GF – GA.
With points, a win gives the winning team 3 points, a draw gives 1 point and a loss gives zero points. The points system before 1995 had 2 points for wins under the variable Pts_pre_95.
The full R Documentation of the bundesligR dataset can be found here.
The last column Pts_pre_95 will be removed from the dataset. Also a few column names will be renamed.
The %>% pipe operator is used for easier reading. Instead of select(soccer, -Pts_pre_95), we use soccer %>% select(-Pts_pre_95). The negative sign in front of the column Pts_pre_95 inside select() tells R to remove the specified column. Remove a column is easier than selecting everything else.
The rename() part is used to rename past columns.
Selecting Data Using R’s dplyr Package
Now we use dplyr to help us find some interesting data of the Bundesliga in its history.
Here were the results from last year’s (previous) Bundesliga season. The filter() function is used here.
This season was interesting in the sense that it was a very good season for Borussia Dortmund and they were still 10 points away from FC Bayern München. The gap between Borussia Dortmund at second place and third place was 18 points.
Best Season Of All Time
The best season of all time in the Bundesliga belongs to the team which had the most points at the end of the season.
For the 2012-2013 season, FC Bayern München won the Bundesliga with a record 91 points. They also won the DFB-Pokal and the UEFA Champions League for that season, winning the treble. (Winning the treble is very difficult.) More information of this season can be found here.
Worst Season Of All Time
The worst season of all time in the Bundesliga belongs to the worst last placed team (and is also relegated to Bundesliga 2 which is the lower tier league).
SC Tasmania 1900 Berlin came in dead last in 1965 with 10 points from 2 wins, 4 draws, and 28 losses.
Top 5 Teams Per Season
We can find the top 5 teams per season in this data. As this subset is quite large, we look at the top 5 teams from the 2010-2011 season to the 2015-2016 season.
Number of Titles for FC Bayern München
We can also find the number of times a certain team wins the Bundesliga title by placing first in a season. Here, we look at FC Bayern München and their number of Bundesliga titles.
From 1964 to now (2016), FC Bayern München has won the Bundesliga title 25 times, an impressive feat.
Number Of Titles For Borussia Dortmund
Here are the number of titles for Borussia Dortmund.
List of Title Winning Teams In The Bundesliga
Here is the full list of title winning teams in the Bundesliga.
Total Number of Games Played, Wins, Draws, Losses & Goals For FC Bayern München
Total Number of Games Played, Wins, Draws and Losses, goals for Bayern Muenchen, Borussia Dortmund & Borussia Moenchengladbach & Bayer 04 Leverkusen
We can add win rates as a new column where the win rate is the number of wins divided by the number of games played. The dplyr function mutate() is used to create a new column into the data.
Total Number of Games Played, Wins, Draws and Losses, goals for all Teams Who Played In The Bundesliga
The featured image is from http://arysports.tv/wp-content/uploads/2015/11/bundesliga.jpg.