Page 54 - Hematology_ Basic Principles and Practice ( PDFDrive )
P. 54

26    Part I  Molecular and Cellular Basis of Hematology


        heterogeneity  involves  analyzing  cells  at  the  single-cell  level,  as   all of these methods identify major associations within a given dataset
        described later.                                      if the signature is strong and robust. Great care must be taken in the
           At the level of DNA analysis (as opposed to RNA analysis), the   interpretation of clustering results because clusters with distinct gene
        admixture of nonmalignant cells within a tumor may not obscure the   expression profiles may be caused not only by biologically important
        presence of mutations in the tumor cells, even if those cells represent   distinctions but also by artefacts of sample processing. Unsupervised
        a  minority  population.  However,  the  detection  of  mutations  in  a   learning methods that have been used include hierarchical clustering,
        subset  of cells  within  a sample requires  extra  depth  of  sequencing   principal  component  analysis,  nonnegative  matrix  factorization,  k-
        beyond what would be required to sequence, for example, a normal   means clustering, and t-distributed stochastic neighbor embedding.
        diploid genome. Thus it becomes critical to have a rough estimate of   Supervised learning approaches are best suited for comparing data
        the purity of a given sample so that the appropriate genomic approach   between two or more classes of samples that can be distinguished by
        can be used subsequently.                             some known property (or class distinction), such as biologic subtype
                                                              or clinical outcome. For example, to determine the gene expression
                                                              differences between different leukemia subtypes with distinct genetic
        Analytical Considerations                             abnormalities, one would use a supervised approach (Fig. 3.1). The
                                                              same genes might be clustered together on the basis of unsupervised
        Unsupervised  learning  approaches  (often  referred  to  as  clustering)   approaches already described, but they might also be obscured by a
        have become an important part of the discovery process in genomic   more  dominant  gene  expression  signature  that  is  unrelated  to  the
        analysis. This type of analysis involves grouping samples solely on the   distinction  of  interest.  For  example,  if  there  were  another  major
        basis of data obtained without regard to any prior knowledge of the   signature within the data (i.e., a stage of differentiation signature),
        samples or the disease. Thus one can obtain the predominant “struc-  the differences that the investigator was searching for might be lost.
        ture” of the dataset without imposing any prior bias. For example,   A number of metrics can be used to identify genes that are differen-
        unsupervised learning approaches have been used to cluster leukemia   tially expressed between two groups of samples, all of which are best
        or lymphoma samples on the basis of their gene expression profiles   suited to identification of genes that are uniformly highly expressed
        with the goal of uncovering the most robust classification schemes.   in one group. Although the different metrics may generate slightly
        Clustering algorithms can also cluster genes that have a similar expres-  different lists of gene expression differences, if the gene expression
        sion  profile  in  a  gene  expression  dataset.  There  are  a  number  of   difference is robust, all should give comparable results.
        methods for clustering genes and samples, all of which have compu-  With  the  ability  to  generate  large-scale  genomic  datasets  come
        tational  strengths  and  weaknesses.  Comparison  of  the  clustering   a number of analytical issues that are unique to what has come to
        methods is beyond the scope of this chapter, but suffice it to say that   be known as “big data.” In particular, when the number of features


                     ALL                     MLL                       AML


                                                                                         J03779  MME (CD10)
                                                                                         L33930  CD24
                                                                                         Y12735  DYRK3
                                                                                         M11722  DNTT (TDT)
                                                                                         X83441  LIG4 (DNA Ligase IV)
                                                                                         AF032885  FOXO1A (FKHR)
                                                                                         M96803  SPTBN1 (Spectrum-β)
                                                                                         AB020674  KIAA0867
                                                                                         X59350  CD22
                                                                                         Z49194  POU2AF1(OBF-1)
                                                                                         AL049279  DKFZp5641083
                                                                                         U48959  MYLK (MLCK)
                                                                                         U29175  SMARCA4 (SNF2-β)
                                                                                         AF054825  VAMP5
                                                                                         A1761647  cDNA wg66h09
                                                                                         U02687  FLT3
                                                                                         AB007888  KIAA0428
                                                                                         AJ001687  NKG2D
                                                                                         AF009615  ADAM10
                                                                                         AF027208  PROML1 (AC133)
                                                                                         U02687  FLT3
                                                                                         AB028948  KIAA1025
                                                                                         AI535946  LGALS1 Galectin 1)
                                                                                         U66838  CCNA1 (Cyclin A1)
                                                                                         AL050157  DKFZp586o0120
                                                                                         Z48579  ADAM10
                                                                                         AF026816  ITPA (Inosine triphosphatase)
                                                                                         AA669799  cDNA ag36c04
                                                                                         AB023137  KIAA0920
                                                                                         X61118  LMO2
                                                                                         X04325  GJB1
                                                                                         X64364  BSG
                                                                                         X99906  ENSA (α-endosulfine)
                                                                                         M63138  CTSD (CathepsinD)
                                                                                         M84526  DF (Adipsin)
                                                                                         U35117  TFDP2
                                                                                         U41843  DRAP1
                                                                                         L27066  NF2
                                                                                         W27095  cDNA 20c10
                                                                                         Y08134  PDE3B (Phosphodiesterase 3B)
                                                                                         M22324  ANPEP (CD13)
                                                                                         AC005787  Chromosome 19 clone
                                                                                         DB7002  Chromosome 22q11 clone
                                                                                         AF004222  RTN2
                                                                                         U05569  CRYAA
                          -3σ   -2σ    -1σ    0    +1σ    +2σ   +3σ
                                  σ = standard deviation from mean
                        Fig. 3.1  COMPARISON OF GENE EXPRESSION IN ACUTE LYMPHOBLASTIC LEUKEMIA (ALL),
                        MIXED-LINEAGE  LEUKEMIA  (MLL)-REARRANGED  ALL  (DESIGNATED  MLL),  AND  ACUTE
                        MYELOGENOUS LEUKEMIA (AML) SAMPLES USING A SUPERVISED LEARNING APPROACH.
                        Gene expression in leukemia samples was analyzed using Affymetrix microarrays containing 12,600 unique
                        probe sets. Genes that are highly expressed in one type of leukemia relative to the other two are shown. Each
                        column represents a patient sample, and each row represents a gene. Red represents relative high-level expression
                        and blue relative low-level expression. (From Armstrong SA, Staunton JE, Silverman LB, et al: MLL translocations
                        specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41, 2002.)
   49   50   51   52   53   54   55   56   57   58   59