Next-Generation Sequencing: Challenges and Clinical Translation
Original article published on Illumina Blog, on 30 Apr 2014 here.
As the research world advances the field of genomics, the clinical counterpart attempts to translate these technologies for patients. The intersection of these two worlds and a candid discussion of genomic application to cancer were on display at the recently concluded 105th annual meeting of the American Association of Cancer Research (AACR) in San Diego.
Promises and Challenges of Cancer Genomics
The first human genome that was sequenced cost a staggering $3 billion just over a decade ago . Today, we have come close to getting an entire genome sequenced for $1000. This tremendous fall in the price was made possible by unprecedented technological advances and the subsequent availability of increasingly efficient platforms. However, in spite of increased access and the plummeting costs of sequencing, considerable challenges remain, primarily related to data analysis and interpretation. Understanding the relevance of genomic variation in the context of cancer will require the sequencing analysis of a substantial number of samples to get statistical correlation and validation by functional genomic approaches.
Sequencing the genome (whole-genome sequencing, WGS), exome (whole-exome sequencing, WES), and transcriptome (RNA sequencing, RNA-Seq) are three approaches that help researchers detect somatic cancer genome alterations such as nucleotide substitutions, insertions, deletions, copy number variations, and chromosomal rearrangements. Targeted sequencing can be used for regions of interest at significantly lower costs compared to the whole-genome approach. These investigations not only help in understanding the pathogenesis of cancers but also provide biomarkers that can identify novel targets for drug development. Importantly, these data can help guide cancer therapy using existing drugs against actionable molecular targets.
Cancer genome sequencing involves significant challenges – such as the quality (paraffin-embedded, variably degraded, heterogeneous) and quantity of samples available. There are ways to overcome most of these challenges and get meaningful data. For instance, increasing sequence depth can counter low sample purity and increased ploidy. Sequencing the ends of DNA library molecules can identify discordant pairs representing deletions, amplifications, inversions, or translocations. Paired-end reads have become a valuable strategy for cancer genomics. Finally, since most genetic abnormalities in cancer are somatic and not germ line, a comparison of a patient’s matched “normal” genome is crucial to interpret the alterations identified through deep sequencing.
Christos Hatzis, Ph.D. from Yale Cancer Center expounded on additional challenges associated with sequencing, in an AACR symposium “NGS: From Bench To Bedside”. Most of the data obtained with state-of-the-art sequencers is in the form of short reads. Hence, analysis and interpretation of these data encounters several challenges, including those associated with base calling, sequence alignment and assembly, and variant calling. These challenges have led to the development of innovative computational tools and bioinformatics approaches to facilitate data analysis and clinical translation.
Bioinformatics and Multiple ‘Omics’ Approaches
Nearly 600 bioinformatics tools were developed over the past two years, and are being used to enable data analysis and interpretation. Some of these tools include those that assess the quality of short reads such as FastQC and htSeqTools, or a tool like MuTect that can be used for sequence alignment to detect somatic mutations with low allele fractions. One such tool that Hatzis discussed at length was a mutational analysis pipeline that uses sequencing data in addition to clinical information in order to develop correlations among mutations, genes, and pathways. This pipeline – the Mutational Significance in Cancer (MuSiC) can help differentiate passenger mutations from the so-called driver mutations. Similarly, MutSig is an interpretation tool that can detect significantly mutated genes. Multiple computational tools are used by cancer researchers, many of which have specific requirements because cancer genome data:
- Needs to be analyzed in conjunction with normal matched genome
- Involves highly rearranged genomes, and
- Have immense heterogeneity
A few examples include the ELAND aligner tool and CASAVA for mutation calling from Illumina , BFAST alignment tool , and PINDEL to detect indels . A comprehensive database of tools for analysis and interpretation tools for NGS can be found on SEQwiki .
Analyzing the protein-encoding regions of the DNA by WES represents a powerful tool in the sequencing armamentarium. Hatzis quoting from an article in Scientific American, pointed out, “Analyzing an exome to understand disease is, in some cases like reading Cliff Notes to understand a classical textbook” . This is because the exome represents less than 2% of the total human DNA; hence WES only examines this small part while missing the majority of DNA. However, WES is a valuable tool because exons contain >85% of disease-causing mutations in all Mendelian disorders, in addition to majority of single-nucleotide variations in the genome. A recent study has demonstrated use of WES for characterizing circulating tumor cells . In addition to WGS and WES, transcriptome analysis through RNA-Seq can identify alternative splice variants and gene fusion events. RNA-Seq is also a powerful technique for expression profiling of therapeutically relevant transcripts.
In spite of the advantages that sequencing technologies offer, the rapidly dwindling costs are resulting in an ever-increasing amount of data generation. This threatens to overwhelm the analytical capacity, thereby creating a bottleneck. However, owing to the multitude of research groups working on the bioinformatics tools for sequencing, this may not be as big a problem as it is perceived.
Optimism, Collaboration, and the Way Forward
For one, Elaine Mardis, Ph.D. from The Genome Institute of Washington University is optimistic. In her talk at the same NGS symposium, she expressed her conviction that in spite of hurdles, sequencing data analysis “will get easier” as instrument manufacturers provide highly tuned pipelines. In addition, cloud-based or locally installable analytic pipelines are becoming commercially available. Moreover, inclusion of RNA and protein information in conjunction with sequencing data is important; this combination of data from different sources provides an orthogonality that often increases precision. Finally, the availability of long read platforms would obviate the need for alignment.
Last year, Illumina’s MiSeqDx became the first next generation sequencer to be approved by the FDA for IVD use. Furthermore, Illumina recently introduced a new sequencing platform for research – the NextSeq 500 that will likely fuel future advances in analytical and interpretation tools. But the technology will only take us only so far. Ultimately, a truly successful bench-to-bedside translation requires a multidisciplinary approach where basic scientists, bioinformaticians, pathologists, genetic counselors, nurses, and physicians collaborate on genomic data. Discoveries driven by sequencing are set to revolutionize clinical practice, leading to the development of novel diagnostic and prognostic tools in addition to realizing the goal of truly personalized medicine.
- National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH): The Human Genome Project
- Accurate whole human genome sequencing using reversible terminator chemistry. Bentley DR et al., 2008. doi:10.1038/nature07517
- BFAST: An Alignment Tool for Large Scale Genome Resequencing. Homer N et al., 2009. DOI: 10.1371/journal.pone.0007767
- Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Ye K et al., 2009. doi: 10.1093/bioinformatics/btp394
- The SEQanswers wiki: a wiki database of tools for high-throughput sequencing analysis. Li J-W et al., 2011. doi: 10.1093/nar/gkr1058
- 10 Things Exome Sequencing Can’t Do – but Why It’s Still Powerful. By Ricki Lewis. Scientific American. May 16, 2012.
- Whole-exome sequencing of circulating tumor cells provides a window into metastatic prostate cancer. Lohr JG et al., 2014. Nature Biotechnology. doi:10.1038/nbt.2892