Page 95 - Applied Statistics with R

P. 95

7.1. MODELING 95

Recall that we use capital to indicate a random variable, and lower case
to denote a potential value of the random variable. Since we will have
observations, we have random variables and their possible values .

In the simple linear regression model, the are assumed to be fixed, known

constants, and are thus notated with a lower case variable. The response
remains a random variable because of the random behavior of the error vari-
able, . That is, each response is tied to an observable and a random,

unobservable, .

Essentially, we could explicitly think of the as having a different distribution

for each . In other words, has a conditional distribution dependent on the

value of , written . Doing so, we still make no distributional assumptions of

the , since we are only interested in the distribution of the for a particular

value .

2
∣ ∼ ( + , )
0
1

The random are a function of , thus we can write its mean as a function of

E[ ∣ = ] = + .

1
0
However, its variance remains constant for each ,

2
Var[ ∣ = ] = .

This is visually displayed in the image below. We see that for any value , the
expected value of is + . At each value of , has the same variance
0
1
2
.
Often, we directly talk about the assumptions that this model makes. They can
be cleverly shortened to LINE.
• Linear. The relationship between and is linear, of the form + .
0
1
• Independent. The errors are independent.
• Normal. The errors, are normally distributed. That is the “error”
around the line follows a normal distribution.
2
• Equal Variance. At each value of , the variance of is the same, .
We are also assuming that the values of are fixed, that is, not random. We
do not make a distributional assumption about the predictor variable.
As a side note, we will often refer to simple linear regression as SLR. Some
explanation of the name SLR:

90 91 92 93 94 95 96 97 98 99 100