*50*

The **variance **is a way to measure the spread of values in a dataset.

The formula to calculate **population variance** is:

**σ ^{2}** = Σ (x

_{i}– μ)

^{2}/ N

where:

**Σ**: A symbol that means “sum”**μ**: Population mean**x**: The i_{i}^{th}element from the population**N**: Population size

The formula to calculate **sample variance** is:

**s ^{2}** = Σ (x

_{i}– x)

^{2}/ (n-1)

where:

**x**: Sample mean**x**: The i_{i}^{th}element from the sample**n**: Sample size

We can use the **variance** and **pvariance** functions from the statistics library in Python to quickly calculate the sample variance and population variance (respectively) for a given array.

from statistics import variance, pvariance #calculate sample variance variance(x) #calculate population variance pvariance(x)

The following examples show how to use each function in practice.

**Example 1: Calculating Sample Variance in Python**

The following code shows how to calculate the sample variance of an array in Python:

from statistics import variance #define data data = [4, 8, 12, 15, 9, 6, 14, 18, 12, 9, 16, 17, 17, 20, 14] #calculate sample variance variance(data) 22.067

The sample variance turns out to be **22.067**.

**Example 2: Calculating Population Variance in Python**

The following code shows how to calculate the population variance of an array in Python:

from statistics import pvariance #define data data = [4, 8, 12, 15, 9, 6, 14, 18, 12, 9, 16, 17, 17, 20, 14] #calculate sample variance pvariance(data) 20.596

The population variance turns out to be **20.596**.

**Notes on Calculating Sample & Population Variance**

Keep in mind the following when calculating the sample and population variance:

- You should calculate the
**population variance**when the dataset you’re working with represents an entire population, i.e. every value that you’re interested in. - You should calculate the
**sample variance**when the dataset you’re working with represents a a sample taken from a larger population of interest. - The sample variance of a given array of data will always be larger than the population variance for the same array of a data because there is more uncertainty when calculating the sample variance, thus our estimate of the variance will be larger.

**Additional Resources**

The following tutorials explain how to calculate other measures of spread in Python:

How to Calculate The Interquartile Range in Python

How to Calculate the Coefficient of Variation in Python

How to Calculate the Standard Deviation of a List in Python