Histograms, Box Plots & Standard Deviation
When you collect data — test scores, heights, temperatures — you need a way to see what’s going on. Raw numbers are hard to interpret, but a picture of the data tells you everything: where values cluster, how spread out they are, and whether anything unusual is happening.
Let’s explore the most important tools for visualizing and measuring data.
Part 1: The Normal (Bell) Curve
Many real-world data sets — heights of people, measurement errors, test scores — follow a bell-shaped pattern called the normal distribution. Its formula is:
Don’t worry about memorizing that. What matters is two numbers control everything:
- μ (mu) — the center of the distribution (the mean)
- σ (sigma) — the spread (the standard deviation)
Let’s see them in action. Drag the sliders to move and reshape the curve:
Play with the sliders and notice:
- Changing mu slides the whole curve left or right without changing its shape
- Increasing sigma makes the curve wider and shorter (more spread out)
- Decreasing sigma makes it narrower and taller (more concentrated)
- The total area under the curve always stays the same (it equals 1)
Part 2: Standard Deviation — Measuring Spread
The standard deviation (sigma) tells you how far typical data points sit from the mean. Here’s the key rule:
In a normal distribution, about 68% of data falls within 1 sigma of the mean, 95% within 2 sigma, and 99.7% within 3 sigma.
This is called the 68-95-99.7 Rule (or the Empirical Rule).
Let’s visualize this. Adjust sigma and see how the “zones” change:
Why does this matter? If a class’s test scores have mean 75 and standard deviation 10, then:
- About 68% of students scored between 65 and 85
- About 95% scored between 55 and 95
- A score of 95+ is more than 2 standard deviations above average — very rare!
Part 3: Narrow vs. Wide Distributions
Two data sets can have the same mean but very different spreads. Compare:
Think about real examples:
- Narrow distribution: A machine that cuts bolts very precisely — almost all bolts are very close to the target length
- Wide distribution: Human heights across the world — there’s a lot of variation
Which would you prefer for quality control in a factory?
Part 4: Shifting the Center
What happens when you shift the mean while keeping the spread constant? This is like comparing two different groups:
Notice how the overlap between the two curves changes. When the means are close together, the distributions overlap a lot — it’s hard to tell which group a data point came from. When they’re far apart, the groups are clearly distinct.
Part 5: Skewness — When Data Isn’t Symmetric
Not all data is perfectly bell-shaped. Sometimes data skews to one side. We can model this by comparing a symmetric curve with an offset one:
Real-world skewed data:
- Right-skewed: Income distributions (most people earn moderate amounts, a few earn a lot)
- Left-skewed: Age at retirement (most people retire around 65, a few retire very early)
When data is skewed, the mean gets pulled toward the tail, while the median stays near the center of the bulk of the data. That’s why median income is often more informative than mean income!
Wrapping Up
Here’s what you’ve discovered:
| Concept | What It Tells You |
|---|---|
| Mean (mu) | The center of the distribution — where data clusters |
| Standard deviation (sigma) | How spread out the data is around the mean |
| 68-95-99.7 Rule | Percentage of data within 1, 2, or 3 standard deviations |
| Narrow vs. wide | Low sigma = consistent data; high sigma = variable data |
| Skewness | When data piles up on one side instead of being symmetric |
Challenge: A factory produces widgets with mean weight 100g and standard deviation 2g. A widget is rejected if it’s more than 2 standard deviations from the mean.
- What weight range is acceptable?
- About what percentage of widgets get rejected?
- If the factory improves its machines so sigma drops to 1g, how does the acceptable range change?
Use the sliders above to visualize your answers!
Two numbers — the mean and the standard deviation — capture the essence of a data distribution. Master them, and you can summarize thousands of data points in a single sentence.