Skip to end of metadata
Go to start of metadata

Hannah Nelson


Machine learning systems can mitigate burden and boost EHR usability for disease phenotyping to support clinical research, according to a new study.


Machine learning systems can aid EHR usability and cut burden for disease phenotyping to support clinical research, according to a recent Mount Sinai study published in the journal Patterns.

The machine learning-based algorithm diagnosed patients as accurately as the standard set of disease phenotyping algorithms for conditions like dementia, sickle cell anemia, and multiple sclerosis.

“There continues to be an explosion in the amount and types of data electronically stored in a patient’s medical record,” Benjamin S. Glicksberg, PhD, a senior author of the study, said in a press release. “Disentangling this complex web of data can be highly burdensome, thus slowing advancements in clinical research.”

“In this study, we created a new method for mining data from electronic health records with machine learning that is faster and less labor intensive than the industry standard,” continued Glicksberg, an assistant professor of genetics and genomic sciences and a member of the Hasso Plattner Institute for Digital Health at Mount Sinai (HPIMS).

Clinical research scientists currently use a standard set of disease phenotyping algorithms managed by a system called the Phenotype Knowledgebase (PheKB).

The study authors noted that while effective, implementing a PheKB algorithm on a new dataset is time-consuming as it requires variably formatted data, as well as specific laboratory or clinical information.

PheKB algorithms also have limited scalability since they are curated based on expert knowledge for one disease at a time, the researchers explained.

Only 46 diseases or syndromes are represented by public PheKB algorithms as of July 2020.

To develop a new algorithm for a disease, researchers must manually go through EHR data looking for certain data that is associated with the disease and then program an algorithm to identify patients with those disease-specific pieces of data.

The Mount Sinai researchers automated the disease phenotyping process through machine learning in an effort to save clinical researchers time and effort.

The researcher teams’ new method, Phe2vec, was based on studies they had already conducted.

“Previously, we showed that unsupervised machine learning could be a highly efficient and effective strategy for mining electronic health records,” explained Riccardo Miotto, PhD, a former assistant professor at the HPIMS and a senior author of the study.

“The potential advantage of our approach is that it learns representations of diseases from the data itself,” Miotto continued. “Therefore, the machine does much of the work experts would normally do to define the combination of data elements from health records that best describes a particular disease.”

Glicksberg noted that the study’s promising results suggest the algorithm could be used for large-scale phenotyping of diseases in EHR data.

“With further testing and refinement, we hope that it could be used to automate many of the initial steps of clinical informatics research, thus allowing scientists to focus their efforts on downstream analyses like predictive modeling,” he said. “We hope that this will be a valuable tool that will facilitate further, and less biased, research in clinical informatics.”

The study authors said that they plan to analyze how phenotypes change over time. They also plan to embed other kinds of data, such as genetics and clinical imaging, into the framework for refined disease phenotyping.

Additionally, they intend to explore the use of the system to create reliable disease-specific control cohorts for observational studies.