Mean, Median, Mode & Variance In Python

Hi there. This is a short page on computing means, medians, modes, and variances in Python. As someone who is coming from a heavy R background, I wanted to revisit Python and brush up on some things.

 


Sections

Means

Medians

Modes

Variances

References

 


Means

The sample mean is not very difficult to implement in Python (or in any programming language). I add the realized values together and divide this sum by the number of values. In math notation, I would have:

    \[\bar{x} = \dfrac{1}{n} \sum_{i = 1}^{n} x_{i} = \dfrac{1}{n} (x_1 + x_2 + x_3 + ... + x_n)\]

In Python, I would define the mean function by having a list as an input and returning the sum of the elements in this list divide by the number of elements in the list. An example is shown below.

 


Medians

In a sorted list of numbers, the median is the middle number. The median number is also the 50th percentile. This means that half of the sorted numbers are above the median and the other half are below the median.

If we have an odd amount of numbers in a list, the middle number can be obtained easily. In the event of an even amount of numbers, the median would be the average of the middle two numbers. (The average as in add the two middle numbers and divide by two.)

 

Remember that the first element of a list in Python has an index of 0. In R, the first element index is 1.

Here are a few examples with the median.

 


Modes

In a list or sequence of numbers, the mode is the number which occurs the most frequently.

I have found two ways to find the mode. The first way is with using the Counter() function and the second way is by using mode from the statistics package in Python.

This output gives the mode and its frequency.

 


Variances

Means, medians and modes are measures of central tendency and deal mostly with location. When it comes to measures such as variances, we deal with how the data is spread out.

The sample variance involves taking each data point and subtracting it from the mean. This difference is squared and all of these squared differences are added together and divided by the number of data points minus one. In math notation it looks like this.

    \[\bar{x} = \dfrac{1}{n - 1} \sum_{i = 1}^{n} (x_{i} - \bar{x})^2 = \dfrac{1}{n - 1} ((x_{1} - \bar{x})^2 + (x_{2} - \bar{x})^2 +  ... + (x_{n} - \bar{x})^2 )\]

Note that we divide by n - 1 for the sample variance as this is an unbiased estimator for the population variance (which divides by n and not n - 1).

 

Here is the Python code for the variance with an example.

 

Standard Deviation

The standard deviation is the square root of the variance. Since the variance is at least 0 (because of the squares), the standard deviation is non-negative as well. There is no need to worry about having negatives inside a square root.

 


References

  • https://stackoverflow.com/questions/2600191/how-can-i-count-the-occurrences-of-a-list-item-in-python
  • https://stackoverflow.com/questions/10797819/finding-the-mode-of-a-list
  • https://stackoverflow.com/questions/16670658/python-variance-of-a-list-of-defined-numbers
  • Data Science from Scratch- First Principles with Python by Joel Grus

Leave a Reply