Statistics

Line of Best Fit

You have a bunch of data points scattered on a graph. They roughly follow a trend, but they’re not perfectly lined up. How do you draw the best line through them? That’s what linear regression is all about.

Part 1: Eyeballing a Trend

Imagine you tracked how many hours students studied and what they scored on a test. The data might look like a loose upward cloud. Your instinct is to draw a line through the middle of that cloud — and that instinct is exactly right.

Here’s a set of points. Use the slope and intercept sliders to try to fit a line through them:

Slope (m)3
-510
Intercept (b)40
-1030
y=3x+40y = 3x + 40
102030405060708090100
Try This

Try to fit these data points by eye:

  • (1, 45), (2, 50), (3, 55), (4, 58), (5, 65)
  • (6, 68), (7, 72), (8, 78), (9, 82), (10, 90)

Set the slope around 4-5 and the intercept around 40-42. The “best fit” line minimizes the total distance from all points to the line.


Part 2: What Makes a Line “Best”?

The official method is called least squares regression. For each data point, you measure the vertical distance from the point to the line — that’s the residual (or error). Then you square each residual and add them up. The “best” line is the one that makes this total as small as possible.

Total Error=(yi0^i)2\text{Total Error} = \sum (y_i - \hat0_i)^2

Why square the residuals? Because some points are above the line (positive error) and some are below (negative error). Squaring makes them all positive so they don’t cancel out.

Connection

Think of it this way: If you have a rubber band connecting each data point to the line, the least-squares line is the one that minimizes the total stretch of all the rubber bands (well, the stretch squared).


Part 3: Slope and Intercept — What They Mean

In the regression equation y = mx + b:

Slope (rate of change)4.5
08
Intercept (starting value)42
2060
0^=4.5x+42\hat0 = 4.5x + 42
102030405060708090100110
Try This

Predict: If a student studies for 7 hours, what score does the line predict? Read the y-value at x = 7 on the graph, or compute it: y = 4.5(7) + 42 = 73.5. Try changing the slope to see how the prediction changes!


Part 4: Correlation — How Strong Is the Trend?

Not all scatter plots have a clear linear trend. Correlation (written as r) measures how tightly the points cluster around a line:

Noise level (less = stronger correlation)0.5
05
2468-224681012141618202224True trendWith noise

When the noise is low, both curves nearly overlap — that’s a high correlation (r close to 1). Crank up the noise and the red line wiggles away — correlation drops.

Connection

Correlation does NOT mean causation! Just because two things are correlated (ice cream sales and drownings both go up in summer) doesn’t mean one causes the other. There may be a hidden variable (hot weather) causing both. Always think critically about why a correlation exists.


Part 5: Positive, Negative, and No Correlation

246810-4-22468101214161820Positive (r > 0)Negative (r < 0)None (r = 0)

Wrapping Up

ConceptWhat It Means
Line of best fitThe line that minimizes total squared error
Slope (m)How much y changes per unit of x
Intercept (b)Predicted y when x = 0
Correlation (r)Strength and direction of linear relationship (-1 to 1)
r^2Proportion of y’s variation explained by x
Challenge

Challenge: A dataset has a best-fit line y = -2x + 100 with r = -0.9.

  1. Is the relationship positive or negative?
  2. Is the correlation strong or weak?
  3. Predict y when x = 30.
  4. Should you trust a prediction at x = 500? Why or why not?

Linear regression is one of the most widely used tools in all of statistics. From predicting house prices to analyzing scientific experiments, the humble “line of best fit” is everywhere.

Take the Quiz