Wladek Minor, PhD, of the Department of Molecular Physiology and Biological Physics, is working to overcome one of the greatest challenges in his field: the management of astounding amounts of raw data. Researchers, he says, are drowning in it. For example, his lab produces more than he can publish, meaning there’s no way for others to access it. “There is no way to publish! You cannot publish all these results,” Minor said. “My lab publishes between 10 and 15 papers per year. Can you publish more? No! I am telling anybody that he or she may come to my lab, I will give them our results and he or she may write a paper. I could stop doing any experiments today and I will publish 15 papers per year until I retire.”
There’s such data overload in the field that much of the older data – the raw X-ray diffraction results used to map out the submicroscopic – is in danger of being lost, destroyed or forgotten. But Minor is changing that, and he’s doing so with only $20,000 worth of hardware acquired with a National Institutes of Health grant.
Big Data to Knowledge ProgramMinor was awarded the NIH grant – totaling $1.4 million over three years – effective June 1 as part of the NIH’s Big Data to Knowledge program, an effort to help scientists manage and access tremendous amounts of biomedical data. In the weeks since, he has already presented at a conference and built an online search engine offering easy access to the diffraction data.
“Most people think that if they will buy new equipment, a new toy, it will solve all their problems. And obviously it is not true,” Minor said. “What we propose is to build the system to keep all diffraction data, structural data. People are saying to do that would cost millions of dollars in equipment. And our request for equipment was $20,000. Why? Because we use the newest technology, and we are building computers by ourselves quite often. Not because it’s cheaper, but because we can create something which is better than what you can buy.”
It’s a practical attitude that’s embodied in the workings of Minor’s lab, a combination of high-tech supply management – everything is tracked electronically – and low-tech ingenuity, such as using jury-rigged equipment from Best Buy to cool computer servers. Minor takes pride in this enterprising, economical approach, and he notes that it has served his students and lab members well: Three of his former lab workers have gone on to land at Google.
A Search Engine for the UnseeableThe design of the online portal Minor has built for diffraction data is simple, streamlined, unintimidating. But it’s customizable for the needs of the user, depending on whether they’re in academic research or, say, the pharmaceutical industry. By serving up the diffraction information however the user needs, the site provides data useful for many different projects, which may speed up scientific research, and potentially may lead to the development of new drugs and disease treatments.
“Big Data to Knowledge [the BD2K NIH initiative] is about brain engagement. You have to engage your brain. How to do things, not how to buy equipment,” Minor said. “People say, ‘I will only push buttons, I will create results.’ For a university, that would be absolutely crazy, because anybody can buy something. We have to do things others cannot.”
The NIH grant is No. U01 HG008424.