Data Manipulation In R: The melt Function

Hi there. I was playing around with the melt function from the reshape2 package in R. This melt function helps in converting data from a wide format to a long format which would be ready for data analysis, plotting, graphing, etc.

Here is some of the work I have come with.


Sections

Installing and Loading The reshape2 Package In R

Example One: Kangaroo Data From Wide Format To Long Format

Example Two: Melting A Table

Example Three: Melting A Table Of Results From Coin Flipping and Rolling A Die

References

 


Installing and Loading The reshape2 Package In R

To install the reshape2 package in R you can type in:

To enable the functions in the reshape2 package use the line

Once the reshape2 package is loaded into R, the melt function can be used to convert data from a wide format to a long format or convert tables into a (long) format which would be ready for analysis and plotting.

 


Example One: Kangaroo Data From Wide Format To Long Format

In this first example, I deal with a kangaroo measurements dataset from the faraway library in R.

The dataset called kanga from the faraway package is saved into kangaroo_data. Then the head() and tail() functions are used the preview the data.

The summary() function and the str() function is used to check some summary statistics and to check the variable types in the data.

Notice how the 18 variables/columns out of 20 are all measurement variables. These 18 variables/columns can be combined together into one column which would represent the measurement type. Also there would be an additional column which would contain the measurement values associated with the measurement type. This is where the melt() function from the reshape2 package comes in.

This melted data has 2607 rows and 4 columns versus 148 rows and 20 columns from the non-melted data. The column variable represents the measurement type and Measurement_Value represents the measurement value associated with the measurement type.

The columns of the melted data can be renamed by using colnames() to make things look professional.

 


Example Two: Melting A Table 

This next example will feature the melt function decomposing a table into a format ready for data analysis and plotting in R.

I first create two vectors/arrays where one of the two is a small list of colours and the other vector consists of sizes.

Next, I create a Cartesian Product of colours and sizes in R using the expand.grid() function. This function will create all combinations from each of the colours with each of the sizes.

(For example, I would have [Yellow, Small], [Yellow, Medium], all the way to [White, Large].)

Suppose I wanted to create a table which shows the number of cases for each combination. In this scenario, I have a count of 1 for each combination. The table() function in R creates such a table/matrix.

This table has its uses but it is not ideal for data analysis and plotting in R. This is where the melt() function comes in. The melt() function will create the Cartesian product from before along with another column with the counts.

The column names can be renamed as follows:


Example Three: Melting A Table Of Results From Coin Flipping and Rolling A Die

In this third and last example, I generate/simulate results from a coin flip and a dice roll. I repeat this coin flip and dice roll 300 times and display the results in a table with counts. This table is then melted using the melt function to convert the table into a long format.

In order to simulate the coin flips and die rolls, the sample function is used. Zeroes corresponds to tails and ones corresponds to heads.

As usual the column names are renamed.

A table is created using the table() function in R to create a table of counts depending on the coin flip outcome and the die roll number.

The table output comes out nicely. As an example, a coin flip of heads and a roll number of 5 appears 32 times.

The melt function in this case converts the table into a data frame in the long format. Here is the code and output.

The column names need some tweaking.

Plotting The Data

Now this data is ready for data analysis/plotting. Here is the code and output.


References

R Graphics Cookbook by Winston Chang (2012)

http://seananderson.ca/2013/10/19/reshape.html

Leave a Reply