R Programming and Text Analysis On Linkin Park – Meteora Song Lyrics

Hi. In this page, I share some experimental work that I have done in the programming language R. With R, I have done text analysis on the Linkin Park music album Meteora. I look at word counts and determine if the album overall is positive or negative. Results are shown with code outputs and plots.

 

Source: https://images-na.ssl-images-amazon.com/images/I/31CFVY1GX9L.jpg

 


Sections

  • The Meteora Album
  • Loading In The Lyrics Into R
  • Word Counts In Meteora
  • Sentiment Analysis In Linkin Park – Meteora
  • Bigrams In Meteora

 


The Meteora Album

Linkin Park’s Meteora album was the second studio album from the American band. This album was released in 2003 and contains the singles Somewhere I Belong, Faint, Numb, From The Inside and Breaking The Habit.

 

Song List In Linkin Park – Meteora

1) Foreword
2) Don’t Stay
3) Somewhere I Belong
4) Lying From You
5) Hit The Floor
6) Easier To Run
7) Faint
8) Figure.09
9) Breaking The Habit
10) From The Inside
11) Nobody’s Listening
12) Session [Instrumental]
13) Numb

 


Loading In The Lyrics Into R

The lyrics from all the songs in the Meteora album were copied from a lyrics website and the pasted into a .txt file. Whenever a .txt file is loaded into R (R Studio), you need to set the working directory of the folder where the .txt file is located. Once the working directory is set, you can read in the .txt file into R.

In the code below, I load in the dplyr, ggplot2 and tidytext libraries into R. I then read in the lyrics and convert the lyrics into a data frame. The head() function is for previewing the lyrics.

The last line of code with the unnest_tokens() functions converts the data frame in such a way such that each word has its own row.

 

 


Word Counts In Meteora

After the unnest_tokens() functions, we can start to find word counts in the lyrics. One thing to consider is that the song lyrics contain words that make sentences flow but have no/little meaning on their own. These words are called stop words. Examples of stop words include the, and, me, you, of, etc. An anti_join() from R’s dplyr package will remove words from the lyrics in meteora_words.

The use of the count() function will extract word counts.

Now that we have word counts, the results can be displayed in the form of a bar graph with the use of R’s ggplot2 package.

In the code below, the filter() function is used to extract words with a count greater than 8. In geom_col(), I make the bars blue with the fill argument. Labels are added the labs() and geom_text add on functions.

 

 

From the plot, the top words are wanna, feel, pain and time. The word wanna is slang for want to.

 


Sentiment Analysis In Linkin Park – Meteora

Sentiment analysis determines whether a piece of text is viewed as positive or negative. This analysis is subjective as different people have different views on the connotations of certain words. The lyrics in Linkin Park – Meteora are analysed with three lexicons. These three lexicons are bing, AFINN and nrc.

 

Bing Lexicon

With our wordcounts in R, we select the words from the lyrics which are the same as the words from the list of words associated with the bing lexicon.

The words with their counts and sentiment are plotted with ggplot2.

Under the bing lexicon, the most frequent word is pain. The word pain is associated with a red bar which signals that pain is a negative word. Heal is the second most frequent word which has a positive sentiment. By looking at the colour of the side bars, the most frequent words have a negative sentiment for the most part. This plot does suggest that the album Meteora is a mostly negative (or dark) album.

In ggplot2, you can produce separate sentiment bar graphs in one plot. One bar graph is for positive words and the second bar graph is for negative words. The add on function facet_wrap() allows for side by side graphs within one plot as shown below.

From the plot, you can clearly see that there are more negative words than positive words from the album lyrics.

 

AFINN Lexicon

It is expected that the AFINN lexicon will produce different results from using the bing lexicon. Instead of classifying a word as positive or negative like in the bing lexicon, the AFINN lexicon gives select words a score from -5 to +5. Negative scores are for words with negative sentiment while positive scores are for words with positive sentiment. Scores of zero are associated with neutral words.

The most frequent word is pain which has an AFINN sentiment score of -2. Through the use of dplyr’s mutate() function, I have added an extra column called is_positive which classifies the word as positive or negative depending on the word score. This is_positive variable will be helpful for plotting.

Plotting the results under the AFINN lexicon is not much different that the one for bing.

The add on function facet_wrap() is used again to produce side by side bar graphs in one plot. I have made a slight modification on the labels. The word_labels object is a vector that produces the label “Negative Words” for FALSE and “Positive Words” for TRUE. This word_labels is used in the as_labeller() function in the labeller argument for facet_wrap().

 

 

 

nrc Lexicon

With the nrc lexicon, there are no sentiment scores associated with the words. Words under this lexicon have the sentiment of either negative, positive, trust, fear, anger or sadness.

The inner_join() functions from R’s dplyr is used to extract words from the lyrics that are also in get_sentiments(“nrc”).  Next, the filter() function is used to extract the words which have the sentiment as positive or negative.

 

 

Once the counts are extracted, the counts can be plotted with the ggplot2 package.

 

 

 

 

NOTE

These lexicons are helpful in classifying words as positive or negative but one should remember that these things are NOT perfect. There are words in the English language which classify as a verb instead of a noun and vice versa. In addition, these lexicons may not pick up context, jokes, sarcasm and the like.

 


Bigrams In Meteora

Analyzing single words on their own may not be enough. We can look at phrases with two words. Two word phrases are called bigrams.

The code for having bigrams require a slight modification in the unnest_tokens() function.

 

 

The bigrams have been extracted from the lyrics. In order to produce word counts, stop words need to be removed. To do this, the separate function from R’s tidyr package will separate the bigrams into two separate words and then the filter functions will filter out the stop words.

Counts can be produced with the count() function from R’s dplyr package.

Notice how these counts include the separate words. The unite() function here will unite the two words into bigrams along with their counts.

 

Now that we have counts of the bigrams, these counts can be plotted with ggplot2.

 

The most common bigram is nobody’s listening which is the title of one of the songs in the Meteora album. Coming in second place is the bigram wanna feel which is featured often in the single Somewhere I Belong on the album.

I don’t think this bigram plot provides that much useful information. It was an interesting exercise.

 


References

  • R Reference Book: Text Mining With R – A Tidy Approach By Julia Silge and David Robinson
  • https://stackoverflow.com/questions/3744178/ggplot2-sorting-a-plot
  • R Graphics Cookbook By Winston Chang

Leave a Reply