Hello. In this post, the ggplot2 package in R will be used to produce informative and nice bar graphs.

**Table of Contents**

- The Data: Simulating Dice Rolls
- Data Cleaning for Graphing
- Producing the Bar Graph
- Notes and Thoughts

**The Data: Simulating Dice Rolls**

No dataset will be imported here. We will generate some simulation results this time around. We will simulate rolling a fair six sided die 1000 times using the sample() function in R.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Making Bar Graphs in R Using ggplot2: library("ggplot2") # Roll a 6 sided die 1000 times and produce a bar graph of the results: # Run die roll simulations: result <- sample(1:6, size = 1000, replace = TRUE) # Get counts of the die rolls: table(result) result 1 2 3 4 5 6 148 191 197 164 161 139 |

The table(result) function in R gives us the frequencies of the outcomes of this experiment.

The mode (or most frequent result) is rolling a three at 197 times. If another 1000 die rolls were made then it is not guaranteed we will replicate the same results. (The probability of replicating such results is very low.)

**Data Cleaning for Graphing**

Now that we have our results, we need to fix / clean our data such that it is nicely formatted for graphing purposes.

In the x-axis (horizontal) of our graph we do not really want to see 1,2,3,4,5,6. It would be a bit better to use “One”, “Two” up to “Six” for each bar. We create a vector in R with the names.

1 |
die_names |

With the outcome names, we want the corresponding counts from the experiment.

1 2 3 4 5 |
outcome_data <- rep(NA, 6) for (i in 1:6) { + outcome_data[i] <- as.numeric(sum(result == i)) + } |

Next, we combine the die_names vector and the outcome_data vector into a data frame. This data frame will be ready for graphing purposes.

1 2 3 4 5 6 7 8 9 10 11 |
results_data <- data.frame(Number = factor(die_names, levels = die_names), + Counts = outcome_data) results_data Number Counts 1 One 148 2 Two 191 3 Three 197 4 Four 164 5 Five 161 6 Six 139 |

**Producing the Bar Graph**

We now plot our bar graph using the ggplot() function from ggplot2. The labels of One, Two up to Six are on the x-axis and the counts / frequencies are on the y-axis. A title is added and we set limits on the y-axis from 0 to 200.

1 2 3 4 5 6 7 8 |
# \n for newline and spacing of labels on the axes: ggplot(data = results_data, aes(x = Number, y = Counts)) + geom_bar(stat = "identity", alpha = 0.8) + xlab("\n Result on Die") + ylab("Frequency\n") + ggtitle("Die Roll Results") + ylim(0,200) |

**Notes and Thoughts**

If we did not have the levels argument in the factor function then our bar graph would not have One, Two, Three, …, Six as desired. It would be alphabetically ordered. The following code and output will illustrate this.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
results_data <- data.frame(Number = factor(die_names), + Counts = outcome_data) results_data Number Counts 1 One 158 2 Two 162 3 Three 178 4 Four 162 5 Five 184 6 Six 156 # \n for newline and spacing of labels on the axes: ggplot(data = results_data, aes(x = Number, y = Counts)) + + geom_bar(stat = "identity", alpha = 0.8) + + xlab("\n Result on Die") + + ylab("Frequency\n") + + ggtitle("Die Roll Results") + + ylim(0,200) |

Having results_data Counts = outcome_data) will fix the issue.

**References**

The following websites along with Datacamp courses have been very useful.

- http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/
- http://docs.ggplot2.org/0.9.2.1/labs.html