Statistics

Line of Best Fit

You have a bunch of data points scattered on a graph. They roughly follow a trend, but they’re not perfectly lined up. How do you draw the best line through them? That’s what linear regression is all about.

Part 1: Eyeballing a Trend

Imagine you tracked how many hours students studied and what they scored on a test. The data might look like a loose upward cloud. Your instinct is to draw a line through the middle of that cloud — and that instinct is exactly right.

Here’s a set of points. Use the slope and intercept sliders to try to fit a line through them:

Slope (m)3

-510

Intercept (b)40

-1030

y = 3x + 40

Try This

Try to fit these data points by eye:

(1, 45), (2, 50), (3, 55), (4, 58), (5, 65)
(6, 68), (7, 72), (8, 78), (9, 82), (10, 90)

Set the slope around 4-5 and the intercept around 40-42. The “best fit” line minimizes the total distance from all points to the line.

Part 2: What Makes a Line “Best”?

The official method is called least squares regression. For each data point, you measure the vertical distance from the point to the line — that’s the residual (or error). Then you square each residual and add them up. The “best” line is the one that makes this total as small as possible.

\text{Total Error} = \sum (y_i - \hat0_i)^2

Why square the residuals? Because some points are above the line (positive error) and some are below (negative error). Squaring makes them all positive so they don’t cancel out.

Connection

Think of it this way: If you have a rubber band connecting each data point to the line, the least-squares line is the one that minimizes the total stretch of all the rubber bands (well, the stretch squared).

Part 3: Slope and Intercept — What They Mean

In the regression equation y = mx + b:

m (slope): For every 1-unit increase in x, the predicted y changes by m. If m = 4.5 in a study-hours-vs-score example, each extra hour of studying predicts about 4.5 more points on the test.
b (intercept): The predicted y when x = 0. In our example, b = 42 would mean a student who studies 0 hours is predicted to score 42. (This doesn’t always make real-world sense — use judgment!)

Slope (rate of change)4.5

Intercept (starting value)42

2060

\hat0 = 4.5x + 42

Try This

Predict: If a student studies for 7 hours, what score does the line predict? Read the y-value at x = 7 on the graph, or compute it: y = 4.5(7) + 42 = 73.5. Try changing the slope to see how the prediction changes!

Part 4: Correlation — How Strong Is the Trend?

Not all scatter plots have a clear linear trend. Correlation (written as r) measures how tightly the points cluster around a line:

r = 1: Perfect positive linear relationship (all points on a rising line)
r = -1: Perfect negative linear relationship (all points on a falling line)
r = 0: No linear relationship at all

Noise level (less = stronger correlation)0.5

When the noise is low, both curves nearly overlap — that’s a high correlation (r close to 1). Crank up the noise and the red line wiggles away — correlation drops.

Connection

Correlation does NOT mean causation! Just because two things are correlated (ice cream sales and drownings both go up in summer) doesn’t mean one causes the other. There may be a hidden variable (hot weather) causing both. Always think critically about why a correlation exists.

Part 5: Positive, Negative, and No Correlation

Positive correlation: As x increases, y increases (study more, score higher)
Negative correlation: As x increases, y decreases (skip class more, score lower)
No correlation: x and y have no linear relationship (shoe size vs. test score)

Wrapping Up

Concept	What It Means
Line of best fit	The line that minimizes total squared error
Slope (m)	How much y changes per unit of x
Intercept (b)	Predicted y when x = 0
Correlation (r)	Strength and direction of linear relationship (-1 to 1)
r^2	Proportion of y’s variation explained by x

Challenge

Challenge: A dataset has a best-fit line y = -2x + 100 with r = -0.9.

Is the relationship positive or negative?
Is the correlation strong or weak?
Predict y when x = 30.
Should you trust a prediction at x = 500? Why or why not?

Linear regression is one of the most widely used tools in all of statistics. From predicting house prices to analyzing scientific experiments, the humble “line of best fit” is everywhere.

Take the Quiz