Page 186 - Williams Hematology ( PDFDrive )
P. 186
160 Part IV: Molecular and Cellular Hematology Chapter 11: Genomics 161
subclones can be defined by their somatic mutational landscape from represent up to 60 percent of transcripts in a cell) or one that used an
high depth NGS, where the digital nature of the NGS data is exploited initial poly-A enrichment step (as rRNAs are not polyadenylated). By
by algorithmic clustering of mutations that share the same variant allele comparison, noncoding RNAs play a role in many cellular processes but
fraction (VAF). In particular, the VAF of any mutation is defined as the are not polyadenylated, so even though poly-A enrichment would not
fraction of sequencing reads that contain the somatic variant (as com- be applied, a protocol that preserves strand specificity should be.
pared to the germline or inherited nucleotide at that locus). Changes in RNA is a less-stable molecule than DNA and hence assessing the
the heterogeneity of cancer cell populations can be studied by compar- quality of the isolated RNA prior to creating a sequencing library is of
ing data from temporal sampling of a patient, such as at diagnosis and paramount importance. The source for the RNA may be fresh tissue,
disease relapse. fresh-frozen tissue, or formalin-fixed, paraffin-embedded (FFPE) tis-
sue, and each of these sources may influence the quality of the resulting
NEXT-GENERATION SEQUENCING–BASED RNA. RNA derived from FFPE tissue is often at least partially degraded
because of formalin crosslinks with the RNA backbone that result in
COMPREHENSIVE GENOMICS: FROM breakage. Similarly, the amount of RNA available from clinical speci-
STUDIES OF THE TRANSCRIPTOME TO DNA mens is often quite limited, making necessary the use of RNA amplifi-
METHYLATION TO CHROMATIN ACCESSIBILITY cation prior to library construction, or the use of hybrid capture probes
39
to enrich the on-gene yield of sequencing data from low input sources.
AND MODIFICATIONS As the analysis of RNA-seq data is distinct in many ways compared
The study of modern genomics by NGS methods is not limited to the to DNA sequencing data analysis, multiple software tools are avail-
sequencing of genomic DNA but also can include (1) the characteriza- able to characterize differential gene expression, differential splicing,
tion of RNA transcripts, (2) the physical structure of genomes includ- gene fusion detection, and allele-specific expression. 40,41 In regard to
ing chromatin organization and protein-DNA interactions, and (3) the cancer-specific analyses of RNA, a paired “normal” comparator from
identification of specific chemical modifications to nucleotides and adjacent nonmalignant cells is often not available (or even understood),
histones. 37 which complicates the analysis and interpretation of RNA-seq data.
However, efforts are now cataloguing expression in normal human
Analysis of the Transcriptome: RNA Sequencing tissues and providing these results in public databases for comparison
RNA sequencing (RNA-seq) involves the conversion of RNA into com- purposes.
plementary DNA (cDNA) by reverse transcription followed by NGS
library construction. RNA-seq uses the digital nature of NGS tech-
38
nology to quantify levels of RNA transcripts. Previously, microarrays Next-Generation Sequencing–Based Studies of Chromatin
(designed with a fixed content of gene-specific probes) were used to Modifications
assay gene expression by hybridization to reverse-transcribed RNA iso- Chromatin immunoprecipitation followed by NGS-based whole-
42
lates. By contrast, RNA-seq offers the advantages of comprehensive and genome sequencing is known as ChIP-seq. When studying chroma-
less-biased data analysis, with a broader dynamic range for detection tin modifications (Chap. 12), the targets are often transcription factors
of high and low abundance transcripts. With the single base resolution or specific histone modifications (such as methylation or acetylation)
provided by RNA-seq, one can determine the expression of specific that may be important for regulation of gene expression. In brief, ChIP-
mutant alleles present in the germline or in cancer samples, which may seq begins with standard chromatin immunoprecipitation: protein and
be highly relevant for implementing a small molecule or immunother- DNA are crosslinked in growing cell culture, the fixed and crosslinked
apy-based targeted therapeutic. RNA-seq data can be analyzed to detect DNA–protein complexes are fragmented, immunoprecipitated with an
the expression of alternatively spliced isoforms of transcribed genes or antibody specific for the protein of interest, and the DNA isolated from
to detect the transcriptional product(s) of gene fusions in cancer cells. the precipitated material. After DNA isolation, a standard NGS library
RNA-seq can be produced as either single- or paired-end reads, where is prepared by adapter ligation and sizing, and the DNA is sequenced by
the latter are better suited to detect alternative splicing and gene fusions. standard NGS methods. Given the digital nature of NGS, the number of
Additionally, RNA-seq data can identify strand specificity of the DNA reads aligning to a particular area of the genome is directly proportional
template, wherein RNA derived from the antisense strand may play an to the amount of input DNA from that region. Thus, one can determine
important role in regulating gene expression. Finally, the insert size “peaks” with a statistically significant increased number of aligned reads
of the RNA-seq libraries can be targeted to enrich for different sub- and infer that the genomic regions underlying the peaks are the specific
sets of the transcriptome. Small fragment size libraries (approximately areas where the protein of interest was bound to the DNA. 43,44 Antibody
15 to 70 bp) enrich for microRNA (miRNA), short-interfering RNA specificity and avidity remain key determinants for the validity of ChIP-
(siRNA) and PIWI-interacting RNA (piRNA), intermediate size librar- seq data, as does identifying the appropriate coverage cutoff value that
ies (approximately 70 to 200 bp) enrich for small nuclear (snRNA) and determines a “peak.”
small nucleolar RNA (snoRNA), and larger fragment libraries (exclud-
ing fragments less than 200 bp) enrich for messenger RNA (mRNA) and Next-Generation Sequencing–Based Studies of Chromatin
long noncoding RNA (lncRNA). Accessibility
There are many protocols for RNA-seq, including different com- The interaction of DNA and proteins to form chromatin plays an
mercially available kits that exploit the aforementioned experimental increasingly recognized role in the study of genomics and epigenomics
focus areas. For example, protocols to study the “transcriptome,” which (Chap. 12). Several methods using NGS-based approaches can inter-
is defined as all the expressed RNA from a given cell or cell popula- rogate the physical structure of DNA. These methods, which fragment
tion, are often optimized to preferentially target one (or more) types DNA based on the accessibility of chromatin, allow for the determi-
of RNA that are pertinent to a particular area of clinical or research nation of nucleosome positioning and inferred protein–DNA binding
interest. Thus, a researcher interested only in detecting gene expres- sites. Although these studies are not a direct method for determining
sion of annotated mRNA transcripts would choose either an RNA-seq specific protein–DNA binding sites, one can use sequence from the
protocol that included ribosomal RNA (rRNA) depletion (rRNA may inferred protein–DNA binding sites as an indirect method for assaying
Kaushansky_chapter 11_p0155-0164.indd 161 9/18/15 11:48 PM

