Knowing the three-dimensional structures of different kinds of proteins, RNA molecules and other building blocks of the body is essential for understanding how those molecules work, what goes wrong in disease and how abnormalities might be fixed.

Unfortunately, it can take years and a lot of money to determine structure using the standard experimental methods of X-ray crystallography or nuclear magnetic resonance. In the case of RNA, it’s not even clear which molecules have 3-D shapes or definable functions in the first place.

Meanwhile, for half a century, attempts to calculate a molecule’s 3-D structure based on its genetic sequence proved extremely difficult.

Get more HMS news here.

In 2011, a research team including computational biologist Debora Marks took a leap forward by building a tool that successfully predicted portions of the 3-D structures of proteins using only their DNA sequences.

Now, Marks and colleagues have adapted the tool to do the same for RNA molecules.

They report in the April 14 issue of Cell that by comparing RNA sequences across thousands of species and weeding out false correlations, their algorithm, called evfold-RNA, can quickly and cheaply predict whether and how individual RNA molecules form three-dimensional shapes.

This in turn promises to help researchers sort out which of the hundreds of thousands of as-yet unstudied RNAs serve a useful function in the body.

“Over the past five to 10 years, there’s been a huge resurgence of interest in RNAs, including microRNAs and riboswitches, among others,” said Marks, assistant professor of systems biology at Harvard Medical School and senior author of the new paper. “As high-throughput measurements have come through of what’s in a cell, the research community has also identified a gazillion RNA transcripts, which may or may not have 3-D structure and function.”

“Since it’s impossible to characterize them all experimentally,” she said, “we hope our approach will help identify which RNAs are actually functional and determine the 3-D structures of those we already know are important.”

Using evfold-RNA, which the researchers are making available online, “Anyone with a laptop can get clues about what their RNA of interest is doing, what proteins it interacts with and how it interacts with them,” said Caleb Weinreb, a graduate student in the Marks lab and first author of the study.

“It’s a cool example of how computation alone can reveal a lot of new biology,” Weinreb said, “and of the new possibilities that come from having thousands of genome sequences publicly available online.”

The tool itself uses a new version of code developed in the lab by graduate student John Ingraham.

Already, the team, led by a third graduate student in the lab, Adam Riesselman, has used evfold-RNA to resolve a longstanding debate about a segment of the 3-D structure of HIV, whose genome is made of RNA rather than DNA.

Insights gained from evfold-RNA could also drive the design of vaccines, drugs and RNA tools; illuminate viral evolution; and allow researchers to probe how DNA mutations affect RNA structure and function.

Answers from evolution

RNAs—the molecules that help turn DNA instructions into proteins, switch genes on and off and perform other tasks—are single-stranded, as opposed to DNA’s double-stranded helix.

But some RNAs twist and fold back onto themselves to form 3-D shapes. This can happen when complementary base pairs from different parts of the strand bind to each other, when RNA’s phosphate backbone interacts with itself or when the RNA binds to a protein such as the ribosome.

Evfold-RNA calculates each of these points, known as the molecule’s tertiary contacts, by studying how RNA itself evolves across species.

The team began by selecting 160 different types of RNA. For each one, they gathered sequences from thousands of species—from E. coli to octopi—and aligned them so similar sections overlapped.

Then they entered the sequences into the tool they’d used to predict protein structures, which they’d tweaked to read the slightly different RNA alphabet.

The algorithm analyzed all the possible pairs of bases at the same time to identify those that were most “evolutionarily coupled.”

“For some pairs of RNA positions, each time we see one position change in a given species, there is a corresponding change in the other position,” explained Weinreb. “This suggests they’re evolving together.”

A clearer picture

Analyzing the whole network at once and ranking how often each pair changed together allowed the team to filter out false correlations that had muddied earlier studies. The researchers assert that the base pairs with the strongest so-called evolutionary coupling signals indicate where an RNA molecule’s tertiary contacts are.

“It’s a huge assumption,” Marks readily admitted. “Just because bases co-evolve doesn’t necessarily mean they have to be close in three dimensions. But we found it’s much more true for proteins than we initially expected, so we thought it was worth looking for the same patterns with RNAs.”

That assumption gained some evidence when the team validated its tertiary contact predictions against 22 RNAs with known 3-D structures.

“It was quite a wild moment when we first saw that the structures matched,” said Marks, adding that evfold-RNA was a significant improvement over traditional methods, even allowing researchers to plug the results into standard software that will fold the RNA molecule into a 3-D model.

Surprises at each turn

The team was pleasantly surprised to find that the computational method they had developed for proteins also worked so well for RNAs.

“What surprised me was that RNA and proteins, which are thought to have very different properties and co-evolve in very different ways, are actually quite similar,” said Weinreb. “There appears to be some sort of universal logic that’s governing how all these things are co-evolving despite having totally different biochemistry.”

The team also gained new insights into the way RNAs and the proteins they bind to are evolutionarily intertwined.

The researchers look forward to learning more as the limited number of RNA sequences available in databases continues to rise.

“It’s pretty awesome when you take a step back and think about it, that we could see the 3-D structure of a tiny molecule by looking across thousands of species whose sequences have diverged across millions of years,” Weinreb said.

This study was funded by the National Institutes of Health (grant R01 GM106303), a Department of Energy fellowship (DE-FG02-97ER25308) and a National Science Foundation graduate research fellowship.