Skip to end of metadata
Go to start of metadata

Hannah Nelson


Standardized EHR documentation practices for SDOH such as race and ethnicity could help prevent data quality issues that can lead to bias.


Social determinant of health (SDOH) data quality issues signal the need for standardized SDOH EHR documentation practices to avoid bias and promote health equity, according to a study published in JAMIA.

Researchers conducted a review of 76 studies related to SDOH data quality.

The majority of articles that discussed race, ethnicity, or country-of-origin data (65 percent) examined data plausibility, which refers to data accuracy.

Accurate race/ethnicity data is key for clinical research, especially as the industry continues its focus on SDOH data and health equity.

However, researchers noted misclassification bias as a problem or a potential problem in more than half of the articles about race/ethnicity data plausibility. What’s more, several studies reported that implausible data and misclassification errors were more likely for certain groups.

Notably, 14 studies reported that Hispanic patients were more likely to be misclassified in terms of their ethnicity. Patient misclassification includes missing ethnicity information or misclassification into the “Other” category.

Several studies speculated that this may be due to the fluid nature of the definitions of race and ethnicity.

“The fluidity of these definitions leads patients to respond inconsistently to questions about their race/ethnicity, thus causing problems with data reliability,” the literature review noted. “Further, the fact that these categories are so broad and poorly defined leads to difficulties with data validity.”

Misclassification can have profound impacts on clinical research, the study authors noted.

“When patients from one racial or ethnic group are lost in another group or mistakenly categorized as ‘Other,’ subsequent analysis can cause those groups to be under-represented in research results,” they explained. “Misidentification of the race or ethnicity of groups of patients can inadvertently lead to the erasure of those groups from clinical research.”

Several studies speculated that variations in how healthcare organizations collect and record race/ethnicity information have impaired data quality.

“Consistently applied standards for SDOH data collection in the EHR would result in improved data quality, which in turn would lead to more robust research, care coordination, and population health management,” they noted.

The review authors noted that the integration of patient-facing health IT could also help mitigate race/ethnicity misclassification.

“It is possible that the increasing use of dynamic patient-facing data entry tools may allow people to inform and correct their own demographic information, thus helping to improve the quality of race, ethnicity, and country-of-origin data in the future,” the study authors suggested.

The quality of data elements is key in supporting interoperability for large-scale research, data analytics, and care coordination, the review authors emphasized.

“Privacy-preserving record linkage (PPRL) methods identify when records from different sources belong to the same entity while minimizing the exposure of sensitive personal information,” the review authors wrote. “These techniques often rely heavily on patient address along with name and date of birth. When there are errors or missing address data, linkage quality suffers.”

The review authors added that healthcare organizations originally collected demographic data such as race, ethnicity, insurance status, and address, for administrative purposes. Repurposing administrative data for secondary, retrospective research can result in poor data quality.

“Given the increasing importance of social determinants in health equity research and intervention, it is crucial that healthcare institutions work to improve the quality and availability of these data,” the authors noted.

The review revealed several evidence-based solutions to mitigate issues associated with data quality problems. The researchers grouped these recommendations into five main suggestions: avoid complete case analysis, impute data, rely on multiple sources, use validated software tools, and select addresses thoughtfully.