Newswise — Pacific Biosciences of California, Inc. (NASDAQ: PACB) announced that it has completed a de novo sequence assembly of the Escherichia coli O104:H4 strain responsible for the recent outbreak in Germany using its Single Molecule Real Time (SMRT™) technology, and sequenced 11 related bacterial strains (including six previously unsequenced strains of the same serotype) for comparative analyses. An international team of scientific experts on E. coli collaborated on the rapid sequencing project to provide more comprehensive information about the origins of the strain that gave rise to the deadly outbreak. The data were generated using an early version of chemistry and software in development at Pacific Biosciences for the next major PacBio RS product upgrade, planned for the fourth quarter of 2011.
The data provided to the public domain includes a complete assembly of the German outbreak strain, alignment to assemblies from other outbreak isolates, and sequences for 11 related Enteroaggregative E. coli strains. The project demonstrates the ability to produce a PacBio-only de novo assembly for a complex microbial pathogen, and the power of rapid sequencing of multiple genomes with the PacBio RS to elucidate the evolutionary history of a pathogenic microbe. A summary of the project appears on the company’s website at http://blog.pacificbiosciences.com. The Pacific Biosciences scientific team, led by Chief Scientific Officer Eric Schadt, Ph.D., is collaborating with some of the world’s leading experts on E. coli and infectious diseases for this project. The collaborators include:
In Europe:• Karen Angeliki Krogfelt, Ph.D., Professor, Head of Unit, Gastrointestinal Infections, Statens Serum Institut (SSI), Denmark• Flemming Scheutz, Ph.D., Head of the WHO Collaborating Centre for Reference and Research on Escherichia and Klebsiella, SSI, DenmarkIn the U.S.:• James P. Nataro, M.D., Ph.D., Professor and Chair, Pediatrics, University of Virginia School of Medicine• David A. Rasko, Ph.D., Assistant Professor, University of Maryland School of Medicine, Institute for Genome Sciences and Department of Microbiology and Immunology• Nadia Boisen, Ph.D., Research Scientist, Department of Pediatrics, University of Virginia School of Medicine • Matthew K. Waldor, M.D., Ph.D., Professor of Medicine at Harvard Medical School, Brigham and Women’s Hospital, and HMMI
“Using samples provided by our collaborators, we rapidly sequenced each strain using a standard PacBio RS protocol that took on average less than eight hours from sample preparation to sequencing results,” said Dr. Schadt. “The ability to sequence the outbreak strain with reads averaging 2,900 base pairs and our longest reads at over 7,800 bases, combined with our circular consensus sequencing to achieve high single molecule accuracy with a mode accuracy distribution of 99.9%, enabled us to complete a PacBio-only assembly without having to construct specialized fosmid libraries, perform PCR off the ends of contigs, or other such techniques that are required to get to similar assemblies with second generation DNA sequencing technologies.”
Dr. Krogfelt commented: “These high quality data will provide scientists with more information about the genomic features of this strain that could provide new markers for predicting the higher degree of pathogenicity we are seeing with this outbreak. A more comprehensive evolutionary view of this pathogen may also help identify markers for antibiotic drug resistance that could be used in the future should other related strains emerge. The complexity of this case proves that international collaborations and communications are important in the achievement of detailed scientific information.”
The data are available for the bioinformatics community at the PacBio developer’s network (DevNet) web site (www.pacbiodevnet.com), where a suite of open source tools and other resources designed for SMRT sequence data are available to analyze the information. The data have also been submitted to the National Center for Biotechnology Information (NCBI) SRA database.