Newswise — LOS ANGELES (June 24, 2024) -- A machine learning tool developed by Cedars-Sinai investigators can answer questions about genes, drugs, and biochemical pathways associated with Alzheimer’s disease and other health conditions. Their findings were published today in the journal Bioinformatics.

The study detailed how the tool, a free and publicly available software platform, analyzes and compiles data and information—including new peer-reviewed studies—to answer researchers’ queries. The key to the tool’s success is a new type of large language model, said Jason H. Moore, PhD, professor and chair of the Department of Computational Biomedicine at Cedars-Sinai and senior and corresponding author of the study.

Large language models are a specific type of AI programs that can distill large amounts of data—like medical studies, books, articles and interviews—and use that data to create new content.

“The large language model approach we developed uses knowledge stored in a special database, called a knowledge graph, that specializes in capturing the relationships between entities such as drugs and genes,” Moore said.

Historically, the main challenge in using large language models to generate content is ensuring quality, accuracy and reliability of the generated responses.

The Cedars-Sinai technique, however, moved past this challenge by using the graph-of-thoughts technique—a framework that allows investigators to break down a problem into subproblems, and turn the information generated by the large language models into a visual graph.  

The Cedars-Sinai tool also incorporates retrieval augmented generation, or RAG, which augments large language models with external data sources that provide relevant facts and context. Together, this powerful tool unearths efficient and accurate data and information about varying conditions and diseases, including Alzheimer’s disease, which was the focus of the research study published in Bioinformatics.

The open-source software, called Knowledge Retrieval Augmented Generation ENgine—or KRAGEN—is publicly available on GitHub, a cloud-based platform that helps developers collaborate and manage code. To date, the software has received more than 400 endorsements from users.

To demonstrate the usability of the database, Moore and team used KRAGEN to generate data on Alzheimer’s disease, including data on genes, drugs and other aspects related to the condition. Investigators asked the database questions like, “What drugs bind to the proteins APOE and PTAU?” And “Which are genes associated with Alzheimer’s disease?”

Instead of receiving a list of data points for their question, investigators received a synthesized summary of information.

“This AI approach is a step toward fully automating the analysis of Alzheimer’s disease data by incorporating knowledge generated from previous biomedical research studies,” Moore said.

As a next step, investigators are working on ways to integrate KRAGEN into Cedars-Sinai’s AI software for automated machine learning analysis of complex biomedical data.

“This advance is a big step toward allowing users to issue spoken commands to perform analyses in minutes, that otherwise could take weeks or months,” said Craig Kwiatkowski, PharmD, senior vice president and chief information officer at Cedars-Sinai, who was not involved in the study. “It’s encouraging to see the potential of this tool to impact AI-driven programs at Cedars-Sinai.”

Other authors involved in the study include Nicholas Matsumoto, Jay Moran, Hyunjun Choi, Miguel E. Hernandez, Mythreye Venkatesan, and Paul Wang.

This work is supported in part by funds from the Center for AI Research and Education at Cedars-Sinai Medical Center and grants from the National Institutes of Health USA (U01 AG066833 and R01 LM010098).

 Follow Cedars-Sinai Academic Medicine on X for more on the latest basic science and clinical research from Cedars-Sinai.

Journal Link: Bioinformatics