Page 78 - Applied Statistics with R
P. 78

78                 CHAPTER 5. PROBABILITY AND STATISTICS IN R


                                 Thus,


                                                    (0 <    < 2) =   (   < 2) −   (   < 0).

                                 This can then be calculated using R without a need to first standardize, or use
                                 a table.

                                 pnorm(2, mean = 1, sd = sqrt(0.32)) - pnorm(0, mean = 1, sd = sqrt(0.32))



                                 ## [1] 0.9229001

                                 An alternative approach, would be to simulate a large number of observations
                                 of    then use the empirical distribution to calculate the probability.
                                 Our strategy will be to repeatedly:

                                                                                               2
                                    • Generate a sample of 25 random observations from   (   = 6,    = 4).
                                                                                        1
                                                                  
                                      Call the mean of this sample ̄ .
                                                                 1  
                                                                                               2
                                    • Generate a sample of 25 random observations from   (   = 5,    = 4).
                                                                                        1
                                      Call the mean of this sample ̄   .
                                                                 2  
                                    • Calculate the differences of the means,    = ̄ 1    − ̄ .
                                                                               
                                                                                    
                                                                           
                                                                                   2  
                                 We will repeat the process a large number of times. Then we will use the
                                 distribution of the simulated observations of    as an estimate for the true
                                                                             
                                 distribution of   .
                                 set.seed(42)
                                 num_samples = 10000
                                 differences = rep(0, num_samples)
                                 Before starting our for loop to perform the operation, we set a seed for repro-
                                 ducibility, create and set a variable num_samples which will define the number
                                 of repetitions, and lastly create a variables differences which will store the
                                 simulate values,    .
                                                   
                                 By using set.seed() we can reproduce the random results of rnorm() each
                                 time starting from that line.
                                 for (s in 1:num_samples) {
                                   x1 = rnorm(n = 25, mean = 6, sd = 2)
                                   x2 = rnorm(n = 25, mean = 5, sd = 2)
                                   differences[s] = mean(x1) - mean(x2)
                                 }
   73   74   75   76   77   78   79   80   81   82   83