Page 97 - Applied Statistics with R
P. 97
7.2. LEAST SQUARES APPROACH 97
We could find the line that minimizes the sum of all the squared distances from
the points to the line. That is,
2
argmin ∑( − ( + )) .
1
0
0 , 1 =1
This last option is called the method of least squares. It is essentially the
de-facto method for fitting a line to data. (You may have even seen it before
in a linear algebra course.) Its popularity is largely due to the fact that it is
mathematically “easy.” (Which was important historically, as computers are a
modern contraption.) It is also very popular because many relationships are
well approximated by a linear function.
7.2 Least Squares Approach
Given observations ( , ), for = 1, 2, … , we want to find values of and
0
which minimize
1
2
2
( , ) = ∑( − ( + )) = ∑( − − ) .
0
0
1
1
0
1
=1 =1
̂
̂
We will call these values and .
1
0
First, we take a partial derivative with respect to both and .
0
1
= −2 ∑( − − )
0
1
0 =1
= −2 ∑( )( − − )
1
0
1 =1
We then set each of the partial derivatives equal to zero and solving the resulting
system of equations.
∑( − − ) = 0
0
1
=1
∑( )( − − ) = 0
1
0
=1
While solving the system of equations, one common algebraic rearrangement
results in the normal equations.

