Page 1345 - Clinical Immunology_ Principles and Practice ( PDFDrive )
P. 1345

1306         ParT ElEvEN  Diagnostic Immunology


        an informatics pipeline for each clinical application. New assay
        development requires validation conforming to Clinical Labora-  Secondary Data Analysis—Demultiplexing,
        tory Improvement Amendments of 1988 (CLIA 1988) regulations   Alignment, and Variant Calling
        and laboratories must have an ongoing process to monitor data   The next steps in next-generation sequence data analysis involve
        quality and ensure result accuracy. The US Food and Drug   aligning reads to a reference sequence and generating variant
        Administration (FDA) has released a draft guidance on standards   calls. In many high-throughput applications, patient DNA samples
        for clinical NGS-based diagnostic tests (http://www.fda.gov/ucm/  are tagged with index sequences during preparation for sequencing
        groups/fdagov-public/@fdagov-meddev-gen/documents/     (library construction). Molecular indexing or multiplexing allows
        document/ucm509838.pdf).                               pooling of the samples on the instrument and then sorting them
                                                               out after sequencing. Demultiplexing of sequence reads is another
        Sample and Laboratory Process Management               step that is subject to quality monitoring.
        Each diagnostic laboratory must deal with the generic operational   After demultiplexing, the reads are mapped and aligned to
        problems of sample accession, tracking, and reporting. Clinical-  the reference genome. Alignment of short-read sequences to the
        grade laboratory information management systems (LIMS) are   reference genome involves systematically matching read fragments
        required to handle all these processes with associated regulatory   to their correct location in the genome. The most widely used
        compliance. DNA diagnostic laboratories have several unique   tools exploit the Burrow-Wheeler algorithm to carry out this
        problems and requirements that deserve comment. Automated   process efficiently and precisely (bio-bwa.sourceforge.net/).
        data acquisition is an important component of DNA sequencing   Typically, only uniquely mapping reads are passed to the later
        and genotyping requiring personnel specialized in information   steps of sequence analysis. This makes it difficult to analyze some
        science and systems administration. Advanced statistical models   segments of the genome that are important in health and disease.
        are employed at many steps in the processes of base calling   Some elements in the genome are composed of nearly identical
        (primary analysis), alignment to the reference genome, and   sequences most often arranged in tandem on adjacent segments
        identification of positions that are different from the reference   of  chromosomes.  Human  leukocyte  antigen  (HLA)  presents
        (together called secondary analysis). Once the raw data from arrays   particular challenges: (i) certain HLA alleles may not be repre-
        and NGS are produced, bioinformaticians develop, manage, and   sented in the reference genomes; (ii) reads may align to more
        operate analysis pipelines that synthesize the results into forms   than one location in HLA leading to discard of the read or
        comprehensible to the laboratory staff tasked with reporting   misalignment and false-positive variation; and (iii) identical reads
        the  results.  Bioinformaticians  maintain  or develop  analysis   may have origins in distinct haplotypes that cannot be easily
        information management systems (AIMS), which are also used   recognized with short-read sequences.
        to collect and monitor performance metrics and quality control.   Another important issue is that the reliability of variant calling
        Specialized software is used to perform these functions and to   is different with different classes of variation. Small insertion and
        report the metrics needed for quality control. The number of   deletion variants (indels) are clinically important because they often
        patient-specific data records and the complexity of relationships   lead to frameshift and premature termination of proteins; but calling
        in family-based testing make it essentially impossible for manual   indels and automated application of consistent indel nomenclature
        processes to achieve the required reliability. Because of the broad   are more difficult than single nucleotide variants (SNVs). The
        intended use of genomic testing, there is an increasing importance   Genome Analysis Toolkit (http://www.broadinstitute.org/gatk/) is
        to collection of patient phenotype data, which is needed for   the most widely used software for variant calling.
        variant filtering and prioritization (see Tertiary analysis below).  Targeted resequencing and whole exome sequencing (WES)
                                                               focus on protein-coding elements in the genome. Because of the
        Primary Data Analysis—Genotyping and Base Calling      complex and highly variable exon–intron structure of genes, there
        Genotyping in the case of microarray and base calling in the   is considerable technical difficulty in using exon sequence data
        case of sequencers are platform specific, and the required software   to call structural variants and CNVs. WGS, in contrast, surveys
        is  supplied  by  instrument  vendors.  Microarray  data,  whether   all the intron and intergenic sequence. New methods of PCR-free
        array comparative genomic hybridization or SNP chip platforms,   library construction enable the read count depth to be used as
                                                                                                32
        use signal hybridization intensity to estimate DNA copy number.   an accurate surrogate for copy number.  In addition, gaps in
        Copy number calls are based on multiple adjacent assay positions   aligned reads (called “split reads”) can be used to recognize deletions
        with respect to the genome map (i.e., the identification of clinically   and other structural variants, including duplications and inversions.
        important CNVs is always supported by many independent data   Although challenges still remain, it is possible that WGS data
        points and on-chip assay replicates). The resolution and reliability   combined with standardized algorithms could allow a single test
        of the CNV calls depends on total number of positions on the   to be used for almost all classes of pathogenic alleles.
        array and their “responsiveness” to differences in DNA copy   Variants are saved in a specified format called a genomic variant
        number. Laboratories using these methods must assess data   call format (gVCF) file. This format contains information not
        quality with robust statistical procedures specifying in advance   only about the positions that contain a nonreference genotype
        the minimum size and composition of called CNVs.       call but also about the quality of each site that is called with
           Sequencing data, especially when considering exome and whole   the reference homozygous base. This is important because it
        genomes, presents much more challenging problems in bioinfor-  allows multiple samples to be aggregated (e.g., to analyze mother,
        matics. Base calling from the instrument raw data (primary analysis)   father, and their offspring jointly). Format standardization allows
        typically takes place in local computers dedicated to the sequencer,   exchange and aggregation of data among laboratories around
        but cloud-based methods can be used. Base calling generates   the world. Data aggregation is now widely recognized as a key
        sequence  “reads” with their base quality scores. Some of the   step in the development of molecular diagnostics, not only to
        important measures of data quality at the primary analysis step   reduce errors in variant calling but also to enable sophisticated
        are base quality score, number of reads per sample, length of   approaches to the problem of genotype-phenotype relationships
        reads, and fall-off of base quality with read position.  in rare genetic diseases.
   1340   1341   1342   1343   1344   1345   1346   1347   1348   1349   1350