Hi there. This is a short page on computing means, medians, modes, and variances in Python. As someone who is coming from a heavy R background, I wanted to revisit Python and brush up on some things.

**Sections**

**Means**

The sample mean is not very difficult to implement in Python (or in any programming language). I add the realized values together and divide this sum by the number of values. In math notation, I would have:

In Python, I would define the mean function by having a list as an input and returning the sum of the elements in this list divide by the number of elements in the list. An example is shown below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
>>> # Mean, Median, Mode & Variance In Python Work ... ... import math ... from __future__ import division ... ... ### Mean Example: ... ... test_ex = [5, 9, 12, 2, 4, 18, 11] ... ... def mean(x): ... return sum(x) / len(x) ... ... print("Mean:", round(mean(test_ex), 2)) Mean: 8.71 |

**Medians**

In a sorted list of numbers, the median is the middle number. The median number is also the 50th percentile. This means that half of the sorted numbers are above the median and the other half are below the median.

If we have an odd amount of numbers in a list, the middle number can be obtained easily. In the event of an even amount of numbers, the median would be the average of the middle two numbers. (The average as in add the two middle numbers and divide by two.)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
### Median Code & Example: def median(x): # Input: list of numbers; Output: the "middle" number of an ordered list of #s sorted_x = sorted(x) length_n = len(x) middle = length_n // 2 # Integer division # Even numbered amount in list: if length_n % 2 == 0: median_even = (sorted_x[middle - 1] + sorted_x[middle]) / 2 return(median_even) # Remember index 0 as 1st element. else: return(sorted_x[middle]) # Return middle number |

Remember that the first element of a list in Python has an index of 0. In R, the first element index is 1.

Here are a few examples with the median.

1 2 3 4 5 6 7 8 9 |
>>> # Test Cases: ... ... test_ex2 = [5, 1, 4, 2] ... ... print("Median: ", median(test_ex)) ... ... print("Median: ", median(test_ex2)) Median: 9 Median: 3.0 |

**Modes**

In a list or sequence of numbers, the mode is the number which occurs the most frequently.

I have found two ways to find the mode. The first way is with using the Counter() function and the second way is by using mode from the statistics package in Python.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
>>> ### Mode Example: ... ... # Finding most occuring number/object in a list. ... ... from collections import Counter ... ... # References: https://stackoverflow.com/questions/2600191/how-can-i-count-the-occurrences-of-a-list-item-in-python ... # https://stackoverflow.com/questions/10797819/finding-the-mode-of-a-list ... # https://stackoverflow.com/questions/16670658/python-variance-of-a-list-of-defined-numbers ... ... test_ex3 = [5, 5, 0, 1, 4, 2, -1, 4, 3, 2, 7, 5] ... ... print(Counter(test_ex3).most_common(1)) [(5, 3)] |

This output gives the mode and its frequency.

1 2 3 4 5 6 |
>>> # Alternate Way: ... ... from statistics import mode ... ... print("Mode: ",mode(test_ex3)) Mode: 5 |

**Variances**

Means, medians and modes are measures of central tendency and deal mostly with location. When it comes to measures such as variances, we deal with how the data is spread out.

The sample variance involves taking each data point and subtracting it from the mean. This difference is squared and all of these squared differences are added together and divided by the number of data points minus one. In math notation it looks like this.

Note that we divide by for the sample variance as this is an unbiased estimator for the population variance (which divides by and not ).

Here is the Python code for the variance with an example.

1 2 3 4 5 6 7 8 9 10 11 |
>>> ### Variances ... ... def variance(x): ... n = len(x) ... x_bar = mean(x) ... return(round(sum((x_i - x_bar)**2 for x_i in x) / (n - 1), 2)) ... ... test_ex2 = [5, 1, 4, 2] ... ... print("Variance: ", variance(test_ex2)) # 3.33 Variance: 3.33 |

__Standard Deviation__

The standard deviation is the square root of the variance. Since the variance is at least 0 (because of the squares), the standard deviation is non-negative as well. There is no need to worry about having negatives inside a square root.

1 2 3 4 5 |
>>> def standard_deviation(x): ... return(math.sqrt(variance(x))) ... ... print("Standard Deviation: ", round(standard_deviation(test_ex2), 2)) Standard Deviation: 1.82 |

**References**

- https://stackoverflow.com/questions/2600191/how-can-i-count-the-occurrences-of-a-list-item-in-python
- https://stackoverflow.com/questions/10797819/finding-the-mode-of-a-list
- https://stackoverflow.com/questions/16670658/python-variance-of-a-list-of-defined-numbers
- Data Science from Scratch- First Principles with Python by Joel Grus