Page 303 - Applied Statistics with R
P. 303

Chapter 14




                      Transformations







                           “Give me a lever long enough and a fulcrum on which to place it,
                           and I shall move the world.”
                           — Archimedes

                      Please note: some data currently used in this chapter was used, changed, and
                      passed around over the years in STAT 420 at UIUC. Its original sources, if they
                      exist, are at this time unknown to the author. As a result, they should only be
                      considered for use with STAT 420. Going forward they will likely be replaced
                      with alternative sourceable data that illustrates the same concepts. At the end
                      of this chapter you can find code seen in videos for Week 8 for STAT 420 in
                      the MCS-DS program. It is currently in the process of being merged into the
                      narrative of this chapter.

                      After reading this chapter you will be able to:

                         • Understand the concept of a variance stabilizing transformation.
                         • Use transformations of the response to improve regression models.
                         • Use polynomial terms as predictors to fit more flexible regression models.

                      Last chapter we checked the assumptions of regression models and looked at
                      ways to diagnose possible issues. This chapter we will use transformations of
                      both response and predictor variables in order to correct issues with model
                      diagnostics, and to also potentially simply make a model fit data better.


                      14.1     Response Transformation


                      Let’s look at some (fictional) salary data from the (fictional) company Initech.
                      We will try to model salary as a function of years of experience. The data
                      can be found in initech.csv.

                                                       303
   298   299   300   301   302   303   304   305   306   307   308