Plotting Continuous Probability Distributions In R With ggplot2

Hello there. This page is about plotting various (continuous) probability distributions in R with ggplot2. Through experimentation and trial and error, here is what I have come with. As there are many different probability distributions, I will go through a sample of them.

To install the ggplot2 package into R, try typing in:

To load in the ggplot2 package into R, type in

Most of the R & ggplot2 code in the below sections will have a format similar to:

 


Sections

 

 

 

You may want to refer to a few of my other pages:

Plotting Normal Distributions In R Using ggplot2

Plotting Uniform Distributions In R With ggplot2

 


Standard Uniform Distribution

Given values of a and b, the random variable U follows a uniform distribution with a probability density function (pdf) of:

    \[f(u) = \dfrac{1}{b - a}\]

for a \leq u \leq b.

If a = 0 and b = 1, the uniform distribution becomes the standard uniform distribution. We would now have

    \[f(u) = 1\]

for 0 \leq x \leq 1.

R Code

In R, the code for the uniform density function is:

where we have x, min which is like a and max which is like b. (I am not sure what log is for but I would leave it at the FALSE default.)

The code presented below starts with the ggplot() function taking in 0 and 1 as limits for the horizontal axis. The xlim() and ylim() optional functions are used to adjust to the a and b parameters. Two stat_functions() are used for the colour fill to represent the area and for the probability density line. Add-on functions such as labs() and theme() are for labels and adjusting text.

Inside stat_function, it is important to include args = list(). Inside this list(), you input the parameters/values for the function that you are using. In this case, the uniform distribution function dunif() requires a minimum and a maximum.

 


Exponential Distribution Plot

Given a rate of \lambda (lambda), the probability density function for the exponential distribution is:

    \[f(x; \lambda) = \lambda \text{e}^{-\lambda x}\]

for x \geq 0.

In the R documentation, the code for the exponential distribution’s density function is:

 

This first plot deals with the case when the rate/lambda is equal to 1 in the exponential distribution.

This plot is expected when \lambda = 1 as this is simply exponential decay (i.e. \text{e}^{-x}).

 

Plotting Multiple Exponential Distribution Plots

Suppose you want to compare multiple exponential distribution plots with different rates. This can be done in the ggplot2 framework with the use of multiple stat_functions with different rate values in each of the list() functions for args = list().

 


Weibull Distribution

The Weibull distribution depends on shape and scale parameters. A special case of the Weibull distribution is the Exponential distribution where the shape parameter from the Weibull is one.

In R, the code for the Weibull density function is:

The code for Weibull distribution plot is very similar to the code for the first Exponential distribution plot above. Instead of dexp(), it would be dweibull() instead. Do note the changes in the args = list() parts in two stat_function() parts.

 

Multiple Weibull Distribution Plots

I have included code and a plot of three Weibull distributions with varying shape and scale parameters. Fitting multiple densities into one plot is good for comparisons.

 


Gamma Distributions

The Gamma distribution is a continuous probability distribution which depends on shape and rate parameters. In R, the code for the gamma density is dgamma(). In the comment, I have put in a note that you have to specify the rate or scale but not both.

The code and output below is one example of plotting a Gamma distribution.

 

 

Multiple Gamma Distributions

Since the Gamma distribution depends on shape and rate parameters, you can play around with different values of the rate and shape parameters and plot multiple Gamma distributions.

 


Cauchy Distributions

The Cauchy distribution is one that is taught in some more higher level probability and statistics courses. One could compare this distribution to the normal distribution as the shape does look similar.

In R, dcauchy() is the function for the Cauchy density. Make sure to specify the location and scale parameters for the Cauchy distribution.

 

 

Multiple Cauchy Distribution Plots

 


Pareto Distribution Plots With Custom Function

In the previous sections, we have used a built in R function inside of stat_function(). However, not all probability distribution functions have a built in R function that is ready to use.

With the Pareto distribution, a custom function needs to be made. The parameters for the Pareto distribution are lambda and k. (Yes, I forgot to put an if statement which would consider the support of the distribution.)

 

 

Multiple Pareto Distributions

For plotting multiple distributions, the custom function is needed as well.


Notes

  • The code provided could add some if statements to let the user know if certain x-values are not valid.
  • Making plots for other probability distributions requires a simple adjustment in the stat_function() part.
  • If there is no built in functions for you to use, you would need to write up a custom function for that probability density function.

 


References

R Graphics Cookbook By Winston Chang (2012)

http://www.math.wm.edu/~leemis/chart/UDR/PDFs/Pareto.pdf

https://stackoverflow.com/questions/31792634/adding-legend-to-ggplot2-with-multiple-lines-on-plot

https://stackoverflow.com/questions/19950219/using-legend-with-stat-function-in-ggplot2

Leave a Reply