Next-Generation Sequencing: Applications
Original article published on Assay Depot Blog, 2 Jan 2013 here.
Next-generation sequencing has reduced costs and increased the throughput, as compared to Sanger sequencing. Importantly, it has expanded the capabilities of sequencing technologies to enable a wide variety of applications. The list below is, by no means, comprehensive and talks about some of the most prominent of the ever-expanding applications of next-generation sequencing.
The main application of whole-genome sequencing based on the newer technologies is the discovery of genetic variants. The first whole genome sequenced using the next-generation technology was that of James D. Watson, the co-discoverer of DNA . This whole-genome sequencing can identify single nucleotide variations and structural variations in the genome, many of which may have a causative role in disease. It is being used for sequencing as part of The Cancer Genome Atlas (TCGA) project, “to create a comprehensive catalogue of the genomic changes involved in cancer” .
The transcriptome represents the complete set of transcripts in a cell. Analysis of the transcriptome gives information on the functional elements of the genome and is important for understanding the process of development and disease. The recently developed RNA-sequencing (RNA-seq) is a powerful approach for profiling the transcriptome, wherein RNA analyses is carried out using next-generation cDNA sequencing. The advantages of RNA-seq are:
- It is not limited to detecting transcripts that correspond to existing genomic sequences and can reveal sequence variations in any transcribed region.
- It has a high signal-to-noise ratio and a large dynamic range of expression levels over which transcripts can be detected.
Understanding transcription regulation requires global mapping of DNA-protein interactions. This can be achieved by chromatin immunoprecipitation (ChIP) followed by sequencing (ChIP-seq). In this technique, antibodies select proteins and thereby enrich DNA fragments bound to it, which are then sequenced. ChIP-seq was one of the first applications of next-generation sequencing and has significantly improved our understanding of transcriptional cascades.
Technological advances in the field of genomics have made it possible to sequence most of the coding regions of the genome (exome) and is termed exome sequencing. These protein-coding regions constitute less than 2% of the entire genome. Thanks to highly parallelized DNA sequencing technologies, they provide an opportunity for discovering rare alleles underlying complex traits or Mendelian phenotypes. However, exome sequencing will miss mutations in non-coding regions of the genome (introns) that have been shown to be associated with disease.
Characterizing the human microbiome and their collective genes (termed metagenome) for both infectious and commensal flora in humans has wide-ranging implications for health and disease. Recognizing the significance of metagenome sequencing for human health, the National Institutes of Health (NIH) in the US earmarked $140 million for the Human Microbiome Project in 2007.
RNA-binding proteins are involved in processing of RNA and affect regulation of gene expression. Applying recent technological developments, it is now possible to study individual RNA-binding proteins and large complexes such as ribosomes.
Epigenetic modifications such as DNA methylation and covalent modifications of histone proteins can be mapped using next-generation sequencing. Sequencing of bisulphite-treated DNA can provide an estimation of methylated DNA. Potential regulatory sequences in DNA can be determined by DNaseI hypersensitivity site footprinting followed by sequencing (DHS-seq).
Limitations of Next-Generation Sequencing
In spite of the advantages next-generation sequencing offers, there are a few limitations to this technology:
- Shorter read lengths compared to the Sanger method – This is a major drawback of this technology over Sanger sequencing. De novo assembly of genome is difficult; hence this technology better serves as a genome “re-sequencing” tool.
- Repetitive DNA – Almost 50% of the human genome has repetitive DNA. Owing to shorter read lengths, ambiguities in alignment and assembly arise in the areas of repeats.
- Data Volume – Large volumes of data are generated and analyses is time consuming and expensive. Data analyses may represent the rate-limiting step in next-generation sequencing.
Despite these limitations, it is obvious that next-generation sequencing has revolutionized the fields of research and medicine. It has made possible large-scale studies such as the Encyclopedia of DNA Elements (ENCODE) project that aims to decipher functional elements encoded in the human genome  and exponentially advances our understanding of the human genome. In addition to the basic science aspect, what is exciting is the clinical translation of these technologies for understanding human health and disease.
- The complete genome of an individual by massively parallel DNA sequencing. Nature (2008).
- Mapping the cancer genome. By Francis Collins and Anna Barker. Scientific American. (2007)
- An integrated encyclopedia of DNA elements in the human genome. The ENCODE Project Consortium. Nature. (2012) doi:10.1038/nature11247