Newswise — Genetic sequencing technology has generated a vast amount of biomedical data in the past ten years. Along with that, the technology has become cheaper, faster and more accurate. Medical experts are blending these improved sequencing methods with President Obama’s Precision Medicine Initiative in an effort to combine electronic medical records (EMRs) with individual genome data to ultimately select treatments that will work best for individual patients.

This month, the Penn Center for Precision Medicine Accelerator Fund awarded its first grants to eight research teams for personalized medicine projects across a gamut of clinical specialties, from lung cancer to infectious disease to knee surgery, each making use of “big data” in different ways.

One tool developed at Penn Medicine is proving invaluable for mining patient data. PennSeek, designed by Penn’s Data Analytics Center, allows researchers across a variety of departments, from ophthalmology to cardiology, to search unstructured or semi-structured electronic medical documents to better tailor treatment for individual needs. Unstructured data is not organized in a pre-defined way – it may take the form of handwritten notes added to a patient’s file by a member of their care team, for example. Users of PennSeek dig through this data to identify trends to improve patient care, likening PennSeek to the “Google of EMRs.”

For example, in the lab of Dan Rader, MD, chair of Genetics, researchers are using PennSeek to analyze the reports of about 100,000 echocardiograms to better characterize the progression of aortic stenosis (when the heart's aortic valve narrows, obstructing blood flow from heart to the aorta and rest of body). The information will be combined with data from cardiac catheterization reports and other clinical information for patients.

Aeron Small, a fourth-year medical student at Penn, took a year off from his medical studies to earn a master’s in Translational Research here, studying with Rader and his colleagues. “My interests are in aortic stenosis and natural language processing to describe cardiovascular phenotypes,” Small explains. He defines natural language processing, in this case, as searching text and patient case narratives in a way that’s akin to using the Control-F key for searching text in document files.

Small works with Daniel Kiss, MD, a Penn cardiology fellow, and programmer Jesse Vlatsin, MBA, using PennSeek to map out what happens to patients with aortic stenosis and coronary artery disease. They use a set of targeted keywords relating to aortic valve disease within the echocardiography and catheterization lab reports to better identify symptoms that may be precursors to aortic stenosis.

Comparing results with a similar search using diagnostic billing codes from a random sampling of patients’ records that were manually reviewed, the team showed that PennSeek more accurately classifies disease status in terms of how each patient's disease progresses from the time of diagnosis. This behind-the-scenes research is particularly important because billing diagnostic codes are often used to identify who is eligible to enter clinical trials.

Penn Seek is also useful for identifying patients for disease biomarker profiles, to be able to determine, for example, the characteristics of patients who progress to more serious forms of a disorder versus those who do not, in hopes of tailoring care for earlier intervention. So far, Small has identified several novel biomarkers associated with aortic stenosis.

PennSeek also provides a trove of information for the Ophthalmology department‘s research on glaucoma genetics in a population that is disproportionately affected by the condition -- African Americans. “We’re in the early days of applying a precision medicine approach to diseases of the eye,” says the department’s chair Joan O’Brien, MD. “Our hypothesis is that genetic variants influence the risk of glaucoma and the traits related to that risk. It is likely that glaucoma represents a wide-spectrum of distinct genetic diseases.”

Glaucoma is the leading cause of irreversible blindness worldwide and affects African Americans more than other ethnic groups. However, studies have not previously concentrated on patients of African descent. With an $11.25 million grant from the National Eye Institute, Penn’s Scheie Eye Institute is recruiting more than 7,700 African Americans for this study. The patients will have the entire protein-encoding part of their genome sequenced. Then, using PennSeek, the investigators will match their genetic data with EMRs to look for relationships between genetic variants and each patient’s unique physiology.

PennSeek is also being used by physicians and others in the departments of Radiology, Pathology and Laboratory Medicine, Dermatology, and Rheumatology, among others. Given its endless array of applications, big data analysts will be using PennSeek to sift through its mother lode for a long, long time.