Loading A .JSON File Into R

Hi. I have been doing some self-learning on the topic of loading a .JSON file into the statistical programming language R. The concept of a .JSON file is somewhat new to me as it was not shown to me in my university studies. Through self-learning, trial and error and a few resources, I have been able to work with .JSON files in R.


Sections

Introduction

Loading A .JSON File Into R

Some Data Manipulation Work

Resources/References

 


Introduction

According to this website, JSON files store data in a text format that is human readable. The JSON acronym stands for Javascript Object Notation. These JSON files can be read into R using a custom package such as jsonlite.

 


Loading A .JSON File Into R

If the jsonlite package is not installed into R, you can type in

to install jsonlite into R/RStudio.

After installation, the jsonlite package can be loaded into R by typing in

Since I am loading in the .JSON file from the internet, I copy the URL link and save it to a variable. The fromJSON() command will load in the variable containing the URL.

You could preview this data by typing in head(pokemon_data) but the output would be very large. I did not include the output here.

To have the data is a more useable format for data analysis in R, the data.frame() function is used here. I also preview the data frame with the head() and tail() functions.

As you can see, this data frame contains many rows and many columns. Due to the context of this data, some entries contain NULL or N/A.

 


Some Data Manipulation Work

After loading in data, you may to have to do some data manipulation tasks to have the data ready for data analysis and other statistical work. In this section, I show a few things I have done to this dataset (through experimentation, trial and error).

I first remove the pokemon.id column as I have the pokemon.num column.

You can verify that the pokemon.id (first column) is removed with the head() function.

Next, I rename the columns (manually) by using the colnames() function.

 

From the Spawn Chance column, I convert the decimals into percentages. I use the round() function to convert the decimals into percentage values along with the paste0() function to add the percentage sign.

 

Splitting The Pokémon Types Column Into Two Separate Columns

In the Types column of the Pokémon dataset, some of the entries contain two types. I want to split this types columns into two separate columns where the first column is one Pokémon type and the second column is for the second Pokémon type. If the Pokémon has only one type, the entry for the second column would be <NA>.

In order to achieve the separation of the column, I load in the tidyr and dplyr packages into R.

After loading in the packages, the separate() function is used to separate the Types column into two columns named Type_One and Type_Two. I separate the entries by the comma and one white space with the sep = “,\s” where “,\s” is a regular expression in R.

I am not sure why that the result from separate() gives a format of c(“Type1” for the Type_One column and “Type2”) for the Type_Two column.

Next, I want to remove the first c, brackets and quotes from the two columns. I load in the stringr package into R to enable functions for dealing with strings/text. After loading in stringr, I use the str_replace_all() function to remove the first c, brackets and quotes. Regular expressions are used here.

From the code in pokemon_df2$Type_One, str_replace_all(“\(|\)”, “”) replaces the bracket with a blank character. This str_replace_all(‘^c\”‘, “”) function removes the c (at the beginning) and the first quote. The third str_replace_all with ‘\”$’ removes the quote and the end of the string.

In pokemon_df2$Type_Two, the first str_replace_all() removes brackets, the second one removes the ending quote, and the third str_replace_all() removes the quote at the start of the string.

 


Resources/References

  • https://stackoverflow.com/questions/2617600/importing-data-from-a-json-file-into-r
  • https://www.datacamp.com/community/tutorials/r-data-import-tutorial#javascript
  • http://www.tutorialspoint.com/r/r_json_files.htm
  • https://stackoverflow.com/questions/9704213/r-remove-part-of-string
  • http://stat545.com/block022_regular-expression.html
  • https://github.com/jdorfman/awesome-json-datasets
  • https://stackoverflow.com/questions/7195805/removing-brackets-from-a-string

 

Leave a Reply