"By organizing the efforts of 'massively parallel' undergrads, we can solve problems that would defeat other methods," says GEP program director Sarah Elgin of Washington University in St. Louis. "At the same time, students learn how to handle the messiness of real data, to evaluate different kinds of evidence, and to justify their conclusions."
The GEP is a collaboration between faculty at a growing number of institutions and the Biology Department and The Genome Institute at Washington University in St. Louis. The GEP’s goals are to introduce bioinformatics into the undergraduate curriculum and to integrate research experience into the academic year. With this classroom-based approach, many more students can access educational opportunities normally restricted to those who secure one of the small number of summer research spots available to undergraduates.
The GEP faculty and staff oversaw the project and drafted the paper, but each of the 940 students listed as a co-authors performed original research and read and approved the manuscript before submission. Many students also provided important comments that were incorporated into the final version.
The GEP students tackled the investigation of the “dot” chromosome of Drosophila fruit flies. The dot chromosome gets its name from its tiny size; next to the other fruit fly chromosomes, it looks like a compact dot.
Scientists are interested in the dot chromosome because its DNA is tightly packaged in a form called heterochromatin — a state normally linked with relatively inactive genome regions that contain only a few rarely expressed genes. But despite being packed into heterochromatin, a large region of the dot chromosome carries a similar density of actively expressed genes compared to other, non-heterochromatic parts of the fruit fly genome. Non-heterochromatic DNA is known as euchromatin.
How has this unusual state affected evolution of the the dot chromosome genes? To investigate, the GEP team wanted to compare the dot chromosome to a euchromatic region from a different chromosome. But this exploration required a high quality genome sequence from several different Drosophila species, not just Drosophila melanogaster, the species in which the dot chromosome has been most intensively studied.
Draft genome sequences for other Drosophila species were already publicly available, but because the dot chromosome carries many repetitive sequences, the genome data was sometimes unreliable. That’s because repeat sequences cause trouble for the software that stitches together the fragments of raw sequence data — like a jigsaw puzzle with many pieces of the same color and shape, it’s hard to figure out which fragments belong where.
In this case, humans do a better job than computers. The GEP was able to correct errors in the draft genome assembly by breaking the work up into chunks and distributing it among hundreds of students. The students carefully examined each region they were assigned and paid attention to small differences in repeated sequences that gave them clues on how to put the puzzle together. In areas where there were gaps in the sequence, the students submitted requests for laboratory scientists at the Genome Institute to perform additional sequencing to cover these regions. "The students do a significantly better job at improving the sequence than the software does," says Elgin.
The team improved sequences from the dot chromosome and a euchromatic comparison region from three species of Drosophila that, together with D. melanogaster, are separated by 40 million years of evolution. To help them compare genes across the different genome sequences, the students used multiple types of evidence to predict the start, stop, and splice sites for each gene. These “punctuation marks” are critical to understanding how DNA is transcribed into RNA and translated into proteins. Start and stop sites tell the cellular machinery where to begin and end the translation of a sequence, and splice sites define where to chop out intervening sequences — introns — from the regions that code for proteins — known as exons.
Each chunk of sequence was examined by at least two independent groups of students, so they could cross-check findings and fix errors. The end result was a high quality data set that allowed the team, led by GEP staff member Wilson Leung, to statistically compare the properties of the dot chromosome to the euchromatic region in all four species.
This comparison revealed that most of the distinctive properties of the D. melanogaster dot chromosome are conserved across species. Dot chromosome genes have longer introns and more exons than the comparison region, as well as a higher density of repeat sequences. The accumulated repeats — mostly remnants of now inactive transposable elements — can partly explain why dot chromosome genes have larger introns (the introns contain more repeats), though it doesn’t explain why the genes tend to have more coding exons.
Dot chromosome genes also showed fewer traces of the effects of natural selection. This agrees with theoretical predictions that natural selection should be less effective on heterochromatic genome regions.
The analysis also uncovered a tantalizing clue to one of the ways dot chromosome genes could remain active despite being stuck in a heterochromatic state. The researchers found that dot chromosome genes contain fewer of the “C” and “G” bases (of the famous A, T, C, G components of DNA) than do genes in the euchromatic region. Because Cs and Gs bind together more tightly than As and Ts, the DNA strands that make up dot chromosome genes are likely easier to unwind, which might allow better access to the DNA for the proteins that turn genes on and off. Further research will be needed to test this idea.
The GEP students not only advanced science with their work, but they also learned about genetics and genomics in a hands-on way. This translated to greater educational benefits for the students.
"We think a lot of the benefit comes from asking students to weigh the evidence; sometimes it’s contradictory, sometimes one clue is more reliable than another, sometimes the students need to dig a bit deeper," says Elgin. "Basically we're teaching them to look carefully at data and be suspicious, be skeptical."
The GEP has previously measured the program’s educational performance and found that students learn more about genes and genomes compared to students who did not participate in a research-based genomics course. The GEP students also self-report similar gains in their ability to analyze data and understand the research process as those who had spent a summer working in a research lab. Given enough time (on average, around 45 hours of class time), GEP student gains even exceeded those of summer research students.
“Faculty are sometimes skeptical that this kind of project will work for their students. But the GEP includes a diverse range of schools serving different types of students and the learning gains were similar across every category we tested. I believe any student can benefit,” says Elgin.
Citation: Leung, W. et al. (2015). Drosophila Muller F Elements Maintain a Distinct Set of Genomic Properties Over 40 Million Years of Evolution. G3: Genes| Genomes| Genetics 5(5):719-740 doi: 10.1534/g3.114.015966 http://www.g3journal.org/content/5/5/719.full
Funding sources: Funding sources include Howard Hughes Medical Institute (#52007051, under the Professors program), NSF IUSE #1431407, and Washington University in St. Louis.
Images: Photo files can also be found at: http://bit.ly/1OUYpax
About the Genetics Society of America (GSA)
Founded in 1931, the Genetics Society of America (GSA) is the professional scientific society for genetics researchers and educators. The Society’s more than 5,000 members worldwide work to deepen our understanding of the living world by advancing the field of genetics, from the molecular to the population level. GSA promotes research and fosters communication through a number of GSA-sponsored conferences including regular meetings that focus on particular model organisms. GSA publishes two peer-reviewed, peer-edited scholarly journals: GENETICS, which has published high quality original research across the breadth of the field since 1916, and G3: Genes|Genomes|Genetics, an open-access journal launched in 2011 to disseminate high quality foundational research in genetics and genomics. The Society also has a deep commitment to education and fostering the next generation of scholars in the field. For more information about GSA, please visit www.genetics-gsa.org.