Newswise — It’s like looking for a needle in a haystack.

Scientists searching for the gene or gene combination that affects even one plant or animal characteristic must sort through massive amounts of data, according to associate professor Xijin Ge of the mathematics and statistics department at South Dakota State University.

“Biologists used to study one gene at time, but now they can look at tens of thousands of genes at once.” Ge said. Just one experiment to analyze gene expression can produce one terabyte of sequence data. “That’s a little beyond many biologists' comfort zone.” He leads the bioinformatics research group, which provides the expertise that SDSU plant and animal scientists need to uncover how genes and proteins affect cell functions.

Setting up the experimentsTypically, scientists consult with Ge when planning their studies. After examining what they want to investigate, the researchers decide which techniques should be used to obtain data and a plan to analyze the data.

“It’s critical to have the statistician and biologist working together,” noted plant science professor Fedora Sutton, who worked with Ge on identifying gene interactions that account for freeze resistance in winter wheat. “He is able to say, based on statistical rules and regulations, this is where this has to be.”

Using the same technique on one sample is not enough, Sutton pointed out. Multiple samples must be grown under the same conditions and then analyzed to have biological replicates. Ge explained that experiments must be designed to gather biological rather than technical replicates.

Once the technique to gather data is chosen and a plan of data analyses is created, Ge said, “we can figure out how many replicates are needed.”

Analyzing megabytes of data“Bioinformatics is an important tool to zoom in on the target gene networks,” said Xing-You Gu, who collaborated with Ge to identify genes that are associated with seed dormancy in weedy rice.

Weeds survive adverse environmental conditions because of strong seed dormancy, Gu explained. “To devise new weed management strategies, we need to understand the molecular genetic mechanisms of seed dormancy.”

Gu used a map-based cloning strategy and then Ge applied bioinformatics tools, such as statistical tests and clustering, to find the candidate genes. This task involved looking at more than 30,000 to 40,000 genes, which can produce three to four million data points, according to Ge.

To determine which genes are responsible, Ge must first eliminate those data points that contain noise and then “focus on the reliable signals because we’re looking at so many genes.” Sometimes nearly half the data are eliminated.

Visualizing gene expressionGe uses data-mining algorithms to find patterns of interest to the scientists. Typically, Ge’s analysis produces a visual representation of the data that is statistically significant. One of Sutton’s visuals was a heat map depicting gene expressions that were increased or upregulated in red, those that were shut down or downregulated in green and those unaffected in black. This allowed her to identify six genes as potential markers which will then help breeders develop more lines of freeze-resistant winter wheat.

“We are trying to explain what’s going on in the cell,” Ge said. “We have to make the data tell a story.” After identifying the genes, the researchers “want to piece together the jigsaw puzzle and figure out the common characteristics of the affected genes,” Ge explained. This will allow us to identify the sub-systems, or pathways, that are regulated.

About SDSU Bioinformatics Research GroupThe Bioinformatics Research Group, led by Dr. Xijin Ge, is devoted to using the tools of mathematics, computer science, and the biological sciences to explore the frontiers of the natural world. Research focuses on using, discovering and implementing statistical, machine learning and data mining algorithms to find patterns of interest within the mass of publicly available biological data. Members of our group are involved in studying evolutionary comparative genomics, text mining, and analysis of gene expression data.

About South Dakota State UniversityFounded in 1881, South Dakota State University is the state’s Morrill Act land-grant institution as well as its largest, most comprehensive school of higher education. SDSU confers degrees from eight different colleges representing more than 175 majors, minors and specializations. The institution also offers 29 master’s degree programs, 15 Ph.D. and two professional programs.

The work of the university is carried out on a residential campus in Brookings, at sites in Sioux Falls, Pierre and Rapid City, and through Cooperative Extension offices and Agricultural Experiment Station research sites across the state.

MEDIA CONTACT
Register for reporter access to contact details