Data Visualization In R With The ggvis Package

Hi there. In this post, I feature some functions from R’s ggvis package for data visualization. This work is based on some trial and error in RStudio/RMarkdown.

 


Sections

 


Introduction & Setup

When it comes to data visualization in R, ggplot2 usually comes to mind. The ggvis package in R provides a good alternative to ggplot2 and it also includes some interactive plot features.

The screenshot below is from http://ggvis.rstudio.com/ and it gives a brief explanation of what ggvis is about.

For ggvis installation into R use the code:

To load in the ggvis package, use the code:

 


A Histogram Example

In this histogram example, I simulate 10000 standard uniform random variables and display the results.

 

The layer_histograms() part gives the histograms while the add_axis() parts gives labels and the plot title. A workaround solution (reference: here) was used for the plot title. The title_offset argument was for spacing on the axes labels.

Our histogram of simulated uniform random variables does not exactly match a rectangle but it is close to it (uniform density function).

 


Bar Graph Example

This bar graph example is based on a simulation of 10000 dice rolls. (The die is six sided.)

The sample() function in R allows for random selection of values or strings.

From the str(dice_sim) function, the variable dice_sim comes out as numeric. I want the dice_sim variable to be a factor variable with factors of 1 to 6.

To obtain bar graphs in R’s ggvis package, you need the layer_bars() function.

From our simulations results, the number 2 appears the most often (mode). These results do not match the theoretical/expected result of 10000/6 ≈ 1667 for each outcome. Remember that in many cases that theoretical results do not necessarily match the results in real life.

The title_offset argument in add_axis are used such that the numbers in the ticks do not overlap with the labels. (i.e Counts, 800 and 1000).

 

 


Scatterplots & Linear Regression Lines

For this section, the cats dataset from the MASS library is used.

For the scatterplot, I want to take a look at body weight versus the heart weights of the cats in the dataset. In ggvis, you need to specify which variables would be x and y respectively. Also, you need to use layer_points() to obtain the data points.

In layer_points() the fill = ~Sex option was inputted to indicated which points were for males and which points were for females. It appears that there are more male cats than female cats in this dataset. (Counts would need to be obtained to check.)

 

Regression Line Regardless Of Gender

Here is an example of a regression line through the points regardless of gender. From statistics, regression lines are another way of saying lines of best fits. We want to fit a line through the points such that the total of the square distances is minimized.

You may notice that the code for a linear regression line is not that much different than the one for the scatterplot. We just add layer_model_predictions(model = “lm”, se = TRUE). In model = “lm”, lm means linear model and se = TRUE gives the confidence intervals around the line. (I think it is a 95% confidence interval.)

 

Regression Line For Each Gender

 


Interactive Plots In ggvis

Unlike ggplot2, ggvis is capable of creating interactive plots. Interactive plots allows the user to change values of parameters, change colours, and change visual settings.

This section features two examples of interactive plots in ggvis. The faraway library in R is used here.

 

Example One

 

This first example looks at a USA wages dataset.

The head() function allows for previewing the dataset. We have the variables.columns wage, educ, exper, race, smsa, ne, mw, so, we, and pt. In the interactive scatterplot, I want to compare years of experience with the weekly wages.

 

 

The R code is not much different from the code for scatterplots earlier. What is new is adding the input_select and input_slider options to fill and size. The user can choose between the colours red, blue, green and black from the colour list. In the slider for size, the user can adjust the size of the points by dragging the slider left or right. Moving the slider to the right increases the size of the points while moving the slider to the left decreases the size of the points. I have set the initial size value at 30 as indicated by value = 30 in the code.

(The screenshot image above shows the red points with a size of 30.)

 

Example Two

This second example features the star data from R’s faraway dataset library. I compare star temperature with light intensity.

 

In this screenshot, I show the colour of the points as darkgreen, the size of the points as 45 and the opacity being 0.6. (Opacity is the measure of opaqueness or how shading you want in the points.)

To include the opacity feature I add in:

 


References

  • http://ggvis.rstudio.com/ggvis-basics.html
  • https://www.dezyre.com/data-science-in-r-programming-tutorial/ggvis
  • http://ggvis.rstudio.com/cookbook.html
  • http://ggvis.rstudio.com/axes-legends.html

Leave a Reply