Statistics

Histograms, Box Plots & Standard Deviation

When you collect data — test scores, heights, temperatures — you need a way to see what’s going on. Raw numbers are hard to interpret, but a picture of the data tells you everything: where values cluster, how spread out they are, and whether anything unusual is happening.

Let’s explore the most important tools for visualizing and measuring data.

Part 1: The Normal (Bell) Curve

Many real-world data sets — heights of people, measurement errors, test scores — follow a bell-shaped pattern called the normal distribution. Its formula is:

\displaystyle f(x) = \frac0{\sigma\sqrt{2\pi}} \, e^{-\frac{(x - \mu)^2}{2\sigma^2}}

Don’t worry about memorizing that. What matters is two numbers control everything:

μ (mu) — the center of the distribution (the mean)
σ (sigma) — the spread (the standard deviation)

Let’s see them in action. Drag the sliders to move and reshape the curve:

Center (mu)0

-55

Spread (sigma)1

0.33

\mu = 0, \quad \sigma = 1

Try This

Play with the sliders and notice:

Changing mu slides the whole curve left or right without changing its shape
Increasing sigma makes the curve wider and shorter (more spread out)
Decreasing sigma makes it narrower and taller (more concentrated)
The total area under the curve always stays the same (it equals 1)

Part 2: Standard Deviation — Measuring Spread

The standard deviation (sigma) tells you how far typical data points sit from the mean. Here’s the key rule:

In a normal distribution, about 68% of data falls within 1 sigma of the mean, 95% within 2 sigma, and 99.7% within 3 sigma.

This is called the 68-95-99.7 Rule (or the Empirical Rule).

Let’s visualize this. Adjust sigma and see how the “zones” change:

Standard deviation (sigma)1

0.53

\text0\sigma = \pm1, \quad \text0\sigma = \pm 2 \times 1

Connection

Why does this matter? If a class’s test scores have mean 75 and standard deviation 10, then:

About 68% of students scored between 65 and 85
About 95% scored between 55 and 95
A score of 95+ is more than 2 standard deviations above average — very rare!

Part 3: Narrow vs. Wide Distributions

Two data sets can have the same mean but very different spreads. Compare:

Narrow sigma0.5

0.31.5

Wide sigma2

Try This

Think about real examples:

Narrow distribution: A machine that cuts bolts very precisely — almost all bolts are very close to the target length
Wide distribution: Human heights across the world — there’s a lot of variation

Which would you prefer for quality control in a factory?

Part 4: Shifting the Center

What happens when you shift the mean while keeping the spread constant? This is like comparing two different groups:

Group A center-2

-40

Group B center2

\text{Difference in means} = 2 - (-2)

Notice how the overlap between the two curves changes. When the means are close together, the distributions overlap a lot — it’s hard to tell which group a data point came from. When they’re far apart, the groups are clearly distinct.

Part 5: Skewness — When Data Isn’t Symmetric

Not all data is perfectly bell-shaped. Sometimes data skews to one side. We can model this by comparing a symmetric curve with an offset one:

Skew shift0

Connection

Real-world skewed data:

Right-skewed: Income distributions (most people earn moderate amounts, a few earn a lot)
Left-skewed: Age at retirement (most people retire around 65, a few retire very early)

When data is skewed, the mean gets pulled toward the tail, while the median stays near the center of the bulk of the data. That’s why median income is often more informative than mean income!

Wrapping Up

Here’s what you’ve discovered:

Concept	What It Tells You
Mean (mu)	The center of the distribution — where data clusters
Standard deviation (sigma)	How spread out the data is around the mean
68-95-99.7 Rule	Percentage of data within 1, 2, or 3 standard deviations
Narrow vs. wide	Low sigma = consistent data; high sigma = variable data
Skewness	When data piles up on one side instead of being symmetric

Challenge

Challenge: A factory produces widgets with mean weight 100g and standard deviation 2g. A widget is rejected if it’s more than 2 standard deviations from the mean.

What weight range is acceptable?
About what percentage of widgets get rejected?
If the factory improves its machines so sigma drops to 1g, how does the acceptable range change?

Use the sliders above to visualize your answers!

Two numbers — the mean and the standard deviation — capture the essence of a data distribution. Master them, and you can summarize thousands of data points in a single sentence.

Take the Quiz