Skip to end of metadata
Go to start of metadata

Hannah Nelson


Using machine learning to extract SDOH data from EHR clinical notes could aid in the development of clinical decision support systems, a study says.


Machine learning offers significant potential to extract social determinant of health (SDOH) data from EHR clinical notes, which may aid in the development of clinical decision support systems, according to a study published in JAMIA.

The researchers conducted a literature review of 82 publications focused on the extraction of SDOH data from EHR clinical notes.

Despite increased interest in capturing SDOH in EHRs, data is typically locked in unstructured clinical notes, the study authors explained.

In general, the researchers observed two major steps associated with SDOH extraction systems from the literature.

“The first step is gathering SDOH-related keywords to create lexicons for each SDOH category, and the second step is developing rule-based or supervised systems to locate clinical notes associated with SDOH categories or extract SDOH concepts,” they wrote.

Rule-based approaches require manual chart review while supervised systems leverage machine learning approaches, the researchers explained.

In total, 22 out of 82 publications used rule-based methods to identify SDOH in clinical notes. Health IT researchers leveraged rule-based systems more frequently for housing, transport, and social isolation.

On the other hand, researchers used machine learning techniques more frequently for smoking, alcohol, and substance use data extraction efforts.

Insufficient volumes of structured data for social support and homelessness may explain why rule-based systems were more common for these variables, the researchers suggested.

The study authors pointed out that integrating automated SDOH data extraction systems into EHRs may aid in clinician burden.

“In clinical settings, providers report spending less time on patient care and more time on administrative burdens that are byproducts of data management in the EHR,” they wrote. “Manual screening of SDOH could potentially further complicate and delay the process for healthcare staff.”

“We believe that the NLP-based SDOH identification and the developed outcome analysis tools may offer an optimal solution that may minimize impact on current documentation routines while guiding providers to make better, informed and holistic clinical decisions, they explained.

The researchers also noted that as SDOH categories grow in number and complexity with the industry’s focus on health equity, storing SDOH in a structured framework could become inefficient and require frequent maintenance.

“With increasing recognition of nonclinical factors that define patients’ health risks, needs, and outcomes, it becomes equally imperative that social and behavioral concepts are captured in order to be leveraged during clinical decision-making related to diagnosis and therapy planning,” they wrote.

“Devising novel ways in which such data can be extracted and leveraged with as little impact on current documentation routines of providers is an ideal solution,” the authors continued. “With the valuable knowledge of the relatively new literature in this area, researchers can leverage such reviews to steer their study in innovative ways.”

The researchers noted several opportunities for future NLP research that extracts less-studied SDOH such as child and sexual abuse, financial issues, transportation, neighborhood, social isolation, family problems, employment, education, food insecurity, and access to healthcare.

“Another interesting study would be to compare different aspects of NLP algorithms, such as system performance, amount of annotated data, type of NLP systems, and so forth with the difficulty of SDoH extraction,” the researchers added.