Page 95 - Applied Statistics with R
P. 95

7.1. MODELING                                                      95


                      Recall that we use capital    to indicate a random variable, and lower case
                         to denote a potential value of the random variable. Since we will have   
                      observations, we have    random variables    and their possible values    .
                                                                                       
                                                              
                      In the simple linear regression model, the    are assumed to be fixed, known
                                                               
                      constants, and are thus notated with a lower case variable. The response      
                      remains a random variable because of the random behavior of the error vari-
                      able,    . That is, each response    is tied to an observable    and a random,
                                                                               
                                                      
                              
                      unobservable,    .
                                      
                      Essentially, we could explicitly think of the    as having a different distribution
                                                               
                      for each    . In other words,    has a conditional distribution dependent on the
                                 
                                                   
                      value of    , written    . Doing so, we still make no distributional assumptions of
                                 
                                           
                      the    , since we are only interested in the distribution of the    for a particular
                             
                                                                               
                      value    .
                               
                                                                   2
                                                ∣    ∼   (   +       ,    )
                                                         0
                                                              1   
                                                    
                                                
                      The random    are a function of    , thus we can write its mean as a function of
                                     
                                                      
                         ,
                          
                                             E[   ∣    =    ] =    +       .
                                                      
                                                  
                                                          
                                                                  1   
                                                             0
                      However, its variance remains constant for each    ,
                                                                    
                                                                 2
                                               Var[   ∣    =    ] =    .
                                                              
                                                      
                                                          
                      This is visually displayed in the image below. We see that for any value   , the
                      expected value of    is    +      . At each value of   ,    has the same variance
                                            0
                                                1
                        2
                         .
                      Often, we directly talk about the assumptions that this model makes. They can
                      be cleverly shortened to LINE.
                         • Linear. The relationship between    and    is linear, of the form    +      .
                                                                                    0
                                                                                         1
                         • Independent. The errors    are independent.
                         • Normal. The errors,    are normally distributed. That is the “error”
                           around the line follows a normal distribution.
                                                                                       2
                         • Equal Variance. At each value of   , the variance of    is the same,    .
                      We are also assuming that the values of    are fixed, that is, not random. We
                      do not make a distributional assumption about the predictor variable.
                      As a side note, we will often refer to simple linear regression as SLR. Some
                      explanation of the name SLR:
   90   91   92   93   94   95   96   97   98   99   100