Standard deviation and variance in statistics

Welcome, we will use an example to explain standard deviation and variance in statistics.

Variance

If we capture the data on the arrival of flights per minute at an airport over a period of time. We will have data to plot a histogram. From the histogram, we can find out the most common, most frequent, less frequent number, etc. Please look at the below figure.

Standard deviation and variance in statistics

Variance measures how “spread-out” the data is.

Variance (𝜎2) is simply the average of the squared differences from the mean.

Example:

  • What is the variance of the data set (1, 4, 5, 4, 8)?
  • First find the mean: (1+4+5+4+8)/5 = 4.4
  • Now find the differences from the mean: (-3.4, -0.4, 0.6, -0.4, 3.6)
  • Find the squared differences: (11.56, 0.16, 0.36, 0.16, 12.96)
  • Find the average of the squared differences:
  • 𝜎2= (11.56 + 0.16 + 0.36 + 0.16 + 12.96) / 5 = 5.04

Standard Deviation

Standard Deviation 𝜎 is just the square root of the variance.

𝜎2 = 5.04
𝜎 = square root(5.04) = 2.24
So the standard deviation of (1, 4, 5, 4, 8) is 2.24.

Moreover, this is usually used as a way to identify outliers. Data points that lie more than one standard deviation from the mean can be considered unusual.
Also, you can talk about how extreme a data point is by talking about “how many sigmas or standard deviations” away from the mean it is.

The standard deviation is expressed in the same units as the mean is, whereas the variance is expressed in squared units, but for looking at distribution, you can use either.

Most important, you should know what you are using. For example, a Normal distribution with mean = 10 and standard deviation = 3 is exactly the same thing as a Normal distribution with mean = 10 and variance = 9. You don’t really need both. If you report one, you don’t need to report the other. The benefit of reporting standard deviation is that it remains in the scale of data. Say, a sample of adult heights is in meters, then the standard deviation will also be in meters.

Python Code

import numpy as np
dataset=[1, 2, 3, 4, 5, 5, 5, 5, 5, 6, 7, 8]
print('Mean:', np.mean(dataset))
print('Variance:', np.var(dataset))
print('Standard Deviation:', np.std(dataset))
Mean: 4.666666666666667 
Variance: 3.5555555555555554
Standard Deviation: 1.8856180831641267

Conclusion

Firstly, we understood variance and standard deviation through an example. Then, we understood simple formulas. After that, we discussed the way of reporting these metrics. Finally, we looked at the python code snippet for variance & standard deviation.

Surely, this article had helped in shedding some light on “standard deviation and variance in statistics”.

In fact, AI is fun! Check out the table of contents for Product Management and Data Science to explore those topics.

Curious about how product managers can utilize Bhagwad Gita’s principles to tackle difficulties? Give this super short book a shot. This will certainly support my work.

Leave a Comment