Page 373 - Python Data Science Handbook
P. 373

Figure 5-18. The handwritten digits data; each sample is represented by one 8×8 grid of
               pixels

               In order to work with this data within Scikit-Learn, we need a two-dimensional,
               [n_samples, n_features]  representation. We can accomplish this by treating each
               pixel in the image as a feature—that is, by flattening out the pixel arrays so that we
               have a length-64 array of pixel values representing each digit. Additionally, we need
               the target array, which gives the previously determined label for each digit. These two
               quantities are built into the digits dataset under the data  and target  attributes,
               respectively:

                   In[24]: X = digits.data
                           X.shape
                   Out[24]: (1797, 64)
                   In[25]: y = digits.target
                           y.shape
                   Out[25]: (1797,)
               We see here that there are 1,797 samples and 64 features.

               Unsupervised learning: Dimensionality reduction
               We’d like to visualize our points within the 64-dimensional parameter space, but it’s
               difficult to effectively visualize points in such a high-dimensional space. Instead we’ll
               reduce the dimensions to 2, using an unsupervised method. Here, we’ll make use of a


                                                                   Introducing Scikit-Learn  |  355
   368   369   370   371   372   373   374   375   376   377   378