Potentially deadly mathematical errors are prevalent among mobile applications used in clinical and emergency room settings, but a team of researchers at New Jersey Institute of Technology's Ying Wu College of Computing has found mathematically provable solutions that may save lives.
The applications, known as medical score calculators, can be downloaded by anyone and are popular among less experienced health care staff. But the apps are rife with errors, sometimes because of flawed source data from medical reference tables and other times due to bad implementations from developers who don’t understand the science.
Examples of scores used for early warning, intensive care units and triage include HEART (history, EKG, age, risk, troponin), PAS (pulmonary asthma score) and SOFA (sepsis-related organ failure assessment).
Computer science professor Iulian Neamtiu, overseeing graduate students Sydur Rahaman and Raina Samuel who now work for Google and Montclair State University, respectively, said they began finding such errors several years ago during wider work on event-based mobile applications.
“The barrier to entry for publishing an app is very low. Anyone with marginal programming skills can publish onto mobile app stores and call that a medical or health app. We also had a couple of preliminary papers where we looked at the claims these apps make. And we saw that the claims can be outrageous. We saw apps that are supposed to diagnose cancer, cure cancer, heal your DNA. So we knew that there's there there, so to speak,” Neamtiu said.
Overall, the team found significant errors in 14 of 90 Android applications. They expect to also find errors in Apple programs and in web-based applications, Neamtiu explained. Beyond simply finding incorrect data, the team built novel software precisely for this investigation. “The computer science [aspect] is casting the problem in a mathematical mode or mathematical framework. So we know we have a problem. There are ages or age spans, or physiological parameters that are interpreted or handled incorrectly, and one of our main contributions is casting this in a way that it can be checked mathematically, hence rigorously,” he added. In mathematical terms, they treated the apps as guilty until proven innocent, using a testing method called an automated theorem prover.
“Essentially, we phrase the problem as saying the score should meet certain correctness criteria, and if it doesn't, then go ahead and find me an example. And if the automated theorem prover finds an example, essentially you’ve managed to poke a hole in the score.”
“And then we realized that these apps merely interpret something that MDs have been publishing for 15, 20 years. So let's look at those papers, and to our surprise, and shall I say disappointment, we found that the original sin is actually in the medical papers, because the medical papers had errors in them, and had these parameter ranges and patient ages that were just not covered. So those excerpts are from the medical literature, and those papers have been cited and used for 20 years,” Neamtiu noted. “They are implemented in emergency room systems. Yet they have errors, and these errors perpetuate. So every time a new system is constructed, a new paper is published, they reproduce these errors. That is another problem that has nothing to do with apps.”
The trio published their paper, Diagnosing Medical Score Calculator Apps in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies last year. Several offshoot papers were also written, and others are upcoming, including some that address skin surface error calculations — those are important because they’re used to determine chemotherapy dosages, Neamtiu said.
Several application developers responded positively to the NJIT team’s research and made necessary updates, although errors in a popular medical manual are not yet corrected, Neamtiu observed. Looking forward, “Frankly we were surprised that papers published in prestigious journals were just not held up to elementary mathematical scrutiny. So that's one line of work that I intend to pursue, such as with funding from the National Institutes of Health, on finding these kinds of errors in the medical literature. I think the scrutiny needs to be tighter. There needs to be more mathematical rigor in there, because those errors are relatively easy to spot, so I'm surprised that not only did they make it, that they passed peer review, but that these errors have perpetuated.
“We're taking all these calculators. Anything that remotely has to do with fitness, health or medical calculations. Medical scores, dosage, anything that is based on formulas, or involves any sorts of calculation, and just chipping away at the problem.
“Imagine that you have a monolith the size of a mountain, and those are medical errors. We're there with our pickaxes, with our jackhammers, and we’ve managed to have a positive impact. My goal is public health.”