Boxplots In R

Hi. Here is some work I have done in R with regards to boxplots.


Table Of Contents

A Short Guide To Boxplots

Creating Boxplots In R Using The ggplot2 Package

References


A Short Guide To Boxplots

Boxplots are simple visuals which shows the distribution of a dataset (or a set of values). In a boxplot, you can visually determine the minimum value, the 25th percentile (where 25 percent of values are below), the median (middle number of 50th percentile), the 75th percentile, the maximum value and any extreme points outside of the minimum/maximum values which are known as outliers.

The range is equal to the maximum value minus the minimum value.

Here are a few visual guides.

Box-and-Whisker Plot Explained

Source: http://flowingdata.com/2008/02/15/how-to-read-and-use-a-box-and-whisker-plot/

and

Source: https://dr282zn36sxxg.cloudfront.net/datastreams/f-d%3A0aa004fd8840e42e1f7b68e5688f867a2c8e00281be5fdd4986f3ead%2BIMAGE%2BIMAGE.1


Creating Boxplots In R Using The ggplot2 Package

In R, I use a dataset called anaesthetic from the faraway dataset library/package. The image below provides the details of this dataset.

Load the faraway and ggplot2 packages into R.

I save this anaesthetic data into a variable called hosp_data. Then, I preview the data using the head() and tail() functions.

The data structure and the summary of this dataset can be examined using str() and summary() respectively.

In the summary of hosp_data, we are given the 5 number summary and mean of the variable/column breath. This five number summary is the numerical version of the boxplot. We also see that each of the four treatment groups have 20 observations each.

The column names are renamed using colnames().

A Boxplot In R

Creating a boxplot in R is not very difficult. The main parts for creating a boxplot using ggplot2 is the ggplot() function and geom_boxplot(). The hard part would be adding labels and changing some visual features. Here is the code and boxplot below.

Adding Means To Boxplots

The boxplot above gives information on minimums, maximums, 25th percentiles, 75th percentiles, medians, ranges and outliers. However, the boxplot above does not have means. Means can be added to boxplots by adding stat_summary(). (The means are represented by red squares.)


References

  • R Graphics Cookbook by Winston Chang (2012)
  • http://www.purplemath.com/modules/boxwhisk2.htm
  • http://flowingdata.com/2008/02/15/how-to-read-and-use-a-box-and-whisker-plot/

Leave a Reply