Two papers in Nature Biotechnology, both published online on July 1, 2012 highlight the unique value for de novo genome assembly provided by the PacBio ® RS High Resolution Genetic Analyzer from Pacific Biosciences of California, Inc. (NASDAQ:PACB).
Due to the inherent limitations of commonly used short-read sequencing technologies, the genomes of very few species have been completely sequenced, or “finished.” PacBio’s single molecule, real-time (SMRT ®) technology offers very long reads that reduce the number of contiguous sequences, or contigs, to simplify and improve genome assembly. These multi-kilobase reads allow scientists to sequence through long repeat regions and to identify structural variation, which are common in genomes but not possible to resolve completely with short-read platforms. As a result, PacBio long reads can lead to final assemblies that match—and in some cases even exceed—the quality that previously counted as “finished,” approaching the gold standard of a perfect genome.
In the publication from Koren et al., titled “Hybrid error correction and de novo assembly of single-molecule sequencing reads,” the authors demonstrate a new pipeline for assembly of the parrot genome. Using PacBio long reads in combination with high-accuracy short reads and an updated version of Celera Assembler, they assembled for the first time regulatory regions of genes involved in vocal learning circuits. The hybrid reads represent the most complete assembled bird genome now available.
"Repetitive regions are the biggest impediment to all assembly algorithms and sequencing technologies as they introduce ambiguity in the reconstruction of the genome,” said Sergey Koren, Ph.D., Scientist of Bioinformatics at the National Biodefense Analysis and Countermeasures Center. “Using the long reads we have access to longer sequences, which increases the probability of spanning a repeat and leads to better assemblies at lower depths than short reads.”A separate publication from Bashir et al., titled “A hybrid approach for the automated finishing of bacterial genomes,” describes combining contigs from second-generation sequencing technologies with PacBio sequence data for the cholera strain responsible for the 2010 Haitian outbreak. The authors show that their hybrid assembly resolved complex regions with several repeats and suggest that the approach offers a solution for “rapid identification and assembly of full microbial genomes.”