Make judgments about the size of the standard error of the estimate
from a scatter plot

Compute the standard error of the estimate based on errors of prediction

Compute the standard error using Pearson's correlation

Estimate the standard error of the estimate based on a sample

Figure
1 shows two regression examples. You can see that in Graph A,
the points are closer to the line than they are in Graph B. Therefore,
the predictions in Graph A are more accurate than in Graph B.

Figure 1. Regressions differing in accuracy
of prediction.

The
standard error of the estimate is a measure of the accuracy
of predictions. Recall that the regression line is the line that
minimizes the sum of squared deviations of prediction (also
called the sum
of squares error). The standard error of the estimate
is closely related to this quantity and is defined below:

where σ_{est} is
the standard error of the estimate, Y is an actual score, Y' is
a predicted score, and N is the number of pairs of scores. The
numerator is the sum of squared differences between
the actual scores and the predicted scores.

Note the similarity of the formula for σ_{est} to the formula for σ. ￼ It turns out that σest is the standard deviation of the errors of prediction (each Y - Y' is an error of prediction).

Assume the
data in Table 1 are the data from a population of five X, Y pairs.

Table 1. Example data.

X

Y

Y'

Y-Y'

(Y-Y')^{2}

1.00

1.00

1.210

-0.210

0.044

2.00

2.00

1.635

0.365

0.133

3.00

1.30

2.060

-0.760

0.578

4.00

3.75

2.485

1.265

1.600

5.00

2.25

2.910

-0.660

0.436

Sum

15.00

10.30

10.30

0.000

2.791

The last column shows that the sum of the squared
errors of prediction is 2.791. Therefore, the standard error of
the estimate is

There is a version of the formula for the standard error in terms
of Pearson's correlation:

where ρ is the population
value of Pearson's correlation and SSY is

For the data in Table 1, μ_{y} =
2.06, SSY = 4.597 and ρ= 0.6268.
Therefore,

which is the same value computed previously.

Similar formulas are used when the standard error
of the estimate is computed from a sample rather than a population.
The only difference is that the denominator is N-2 rather than
N. The reason N-2 is used rather than N-1 is that two parameters
(the slope and the intercept) were estimated in order to estimate
the sum of squares. Formulas for a sample comparable to the ones for a population
are shown below.

x=c(1,2,3,4,5)
y= c(1,2,1.3,3.75,2.25)
summary(lm(y~x))
Call:
lm(formula = y ~ x)
Residuals:
1 2 3 4 5
-0.210 0.365 -0.760 1.265 -0.660
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.785 1.012 0.776 0.494
x 0.425 0.305 1.393 0.258
Residual standard error: 0.9645 on 3 degrees of freedom
Multiple R-squared: 0.3929, Adjusted R-squared: 0.1906
F-statistic: 1.942 on 1 and 3 DF, p-value: 0.2578