Using R For Text Analysis On A Few Song Lyrics

Hi there. This post features experimental R programming work for text analysis and text mining on a few song lyrics.





Text Mining And Text Analysis With R

The R programming language is capable of all kinds of statistical work and data analysis. One of those tasks includes text mining and text analysis. Text analysis can be done on reviews, Youtube comments, text from articles and song lyrics.

For this project, the R packages that are needed are dplyr for data wrangling, ggplot2 for plotting and tidytext for data cleaning. Text analysis will be done on three songs. The lyrics from these songs were copied and pasted from lyrics websites into separate .txt files.

To load a package into R, use the library() or require() command. To install a package into R, use the command install.packages(“pkg_name”).


Example One: Armin Van Burren Feat. Fiora – Waiting For The Night

For this first example, I have chosen the track Waiting For The Night from DJ/Producer Armin Van Buuren featuring the vocals of Fiora. (This song falls under the Dance category.)


I have named the lyrics text file as armin_waitingForTheNight.txt. When you are reading text files offline, you need to set a working directory. In my case, this file is placed inside a folder called songLyrics_project on my PC. The working directory would be set to this folder.

The lyrics are then put into a data frame in R.

There are words in the English language that do not carry much meaning on their own but they are used to make sentences flow and make grammar proper. Words such as the, and, of, me that, this, etc. are referred to as stop words.

From R’s dplyr package, the anti_join() function is used to remove stop words from stop_words which are in the lyrics. (The object stop_words is a dataset.)



To achieve the word counts, the count function from R’s dplyr package is used to obtain counts. Adding the sort = TRUE argument will sort the counts.

We can now make a plot of the word counts.


It appears that the word night is the most frequent word with a count of 12.

Sentiment Analysis Of Armin Van Buuren – Waiting For The Night

For song lyrics, sentiment analysis analyzes words and text and determines whether a song is positive or negative. (Note that this sort of analysis does not factor in sound, melodies and such. The listeners determine this in a subjective manner.)

There are three main lexicons which determine whether a song is positive or negative. These three are AFINN, bing and nrc.

The AFINN lexicon is used here.



The next lines of codes will feature a plot with words and their sentiment scores. As in the first example, sentiment scores takes the word counts multiplied by the AFINN lexicon score. (If the word yes had a word count of 5 and a score of +3, the score would be 5 x 3 = 15).


We can plot the results with a plot from the ggplot2 package.



Example Two: Linkin Park – New Divide

In the second example, I have chosen to look at the song New Divide by Linkin Park. This track was featured in the Transformers 2 movie. The code here is very similar to the code from the first example.


The anti_join() function from R’s dplyr package will remove stop words from New Divide’s lyrics.


Here is the code for word counts and its corresponding plots.



Sentiment Analysis


The AFINN lexicon is used here.




We can plot the results with a plot from the ggplot2 package.



Example Three: Cosmic Gate Feat. Emma Hewitt – Tonight

This third example features the vocal trance track Tonight from Cosmic Gate featuring Emma Hewitt.



Here are the word counts and its plot for Tonight.


After filtering out the stop words, you see that there are not a lot of words in the plot. The word tonight has the highest count of four.

Sentiment Analysis



We can plot the results with a plot from the ggplot2 package.






  • Song lyrics do not have a lot of words in general relative to articles, papers, and books. The sample size is small.
  • Many song lyrics repeat certain phrases or words for emphasis.
  • Not all songs have vocals or lyrics as some of them are instrumentals. You would have to hear those instrumentals and judge whether a song is positive or not with your own ears.
  • I have used the AFINN lexicon. A different lexicon would scored the words differently.
  • Text mining and analysis may work better on music albums than on individual songs.

References include Datacamp courses, R Graphics Cookbook by Winston Chang, Text Mining With R: A Tidy Approach By Julia Silge and David Robinson (Website version:

Leave a Reply