Page 423 - Applied Statistics with R
P. 423

17.2. BINARY RESPONSE                                             423



                      sim_logistic_data = function(sample_size = 25, beta_0 = -2, beta_1 = 3) {
                        x = rnorm(n = sample_size)
                        eta = beta_0 + beta_1 * x
                        p = 1 / (1 + exp(-eta))
                        y = rbinom(n = sample_size, size = 1, prob = p)
                        data.frame(y, x)
                      }


                      You might think, why not simply use ordinary linear regression? Even with a
                      binary response, our goal is still to model (some function of) E[   ∣ X = x].
                      However, with a binary response coded as 0 and 1, E[   ∣ X = x] =   [   = 1 ∣
                      X = x] since


                               E[   ∣ X = x] = 1 ⋅   [   = 1 ∣ X = x] + 0 ⋅   [   = 0 ∣ X = x]
                                           =   [   = 1 ∣ X = x]

                      Then why can’t we just use ordinary linear regression to estimate E[   ∣ X = x],
                      and thus   [   = 1 ∣ X = x]?
                      To investigate, let’s simulate data from the following model:


                                                     (x)
                                             log (       ) = −2 + 3  
                                                  1 −   (x)
                      Another way to write this, which better matches the function we’re using to
                      simulate the data:

                                             ∣ X = x ∼ Bern(   )
                                                    i
                                               i
                                             
                                                               
                                                                  1
                                                      =   (x ) =  1 +    −  (x i )
                                                      
                                                          i
                                                  (x ) = −2 + 3     
                                                   i
                      set.seed(1)
                      example_data = sim_logistic_data()
                      head(example_data)
                      ##    y          x
                      ## 1 0 -0.6264538
                      ## 2 1   0.1836433
                      ## 3 0 -0.8356286
                      ## 4 1   1.5952808
                      ## 5 0   0.3295078
                      ## 6 0 -0.8204684
   418   419   420   421   422   423   424   425   426   427   428