How to Preserve Data Quality in Healthcare When Working with Big Data Sets

A clinical registry is only as valuable and impactful as its data is trusted and usable.

High-quality data is the foundation of any clinical data registry, yet it does not develop on its own. In the world of healthcare, we have millions of unique data elements of different sizes, formats, and characteristics.

Before we get into the process of achieving high quality data, let’s start with a brief definition.

What Is High-Quality Data in Healthcare?

Characteristics of data quality include data that is precise, validated, and comprehensive. These characteristics provide valuable assets to meet the evolving data requirements of varied healthcare stakeholders.

Acquiring data from various sources is the first step. The necessary and important next step is to take all the data blocks and build something grand, nimble, and valuable. 

Data Integrity Process in Healthcare

In order to preserve and extract data value, you need a solution that is centered around a multi-layered data integrity process. This is of paramount importance for registry activities.

Multi-layered data integrity processes enable comprehensive, persistent, and widespread data integrity protocols that protect registry data and reporting in serving the needs of diverse stakeholders.

There are five methods for perpetually protecting data integrity

Data Review & Validation

Whenever connecting to a new data source, EHR system, or modifying an existing source, registries should engage in a thorough analysis to assess data integrity across several domains:

  • Completeness: Is the data element available and populated?
  • Concordance: Do the values of the data element agree with those of related variables?
  • Plausibility: Do the values of the data element make clinical sense?
  • Currency: How recent is the available data?

Comprehensive Validation

It is critical to preserve data quality throughout the life of a registry. The best way to accomplish this is through real-time and ongoing data validation checks that deliver the following components.

  1. Validation Rules: Each data integration interface should be configured to include robust and comprehensive validation rules that ensure only valid records enter the registry.
  2. Data Quality Reporting: Dedicated registry dashboards should provide registry participants with real-time information on their data qualityacross dimensions such as completeness, concordance, plausibility, and currency.
  1. Systems Monitoring and Alerting: Real-time monitoring and alerting tools should report any unexpected data events during the data processing. Alerts should go to the right stakeholders for immediate action and resolution. For example, a practice might receive an alert that their most recent data included data elements outside of normal ranges.

Using this multi-layered strategy ensures that all registry data is efficiently ingested, transformed, persisted, and analyzed while maintaining complete fidelity to the original clinical record.

Documentation and Transparency

A registry is only as valuable as it is trusted. Registry stakeholders need to feel confident interacting with their data and interpreting its results. Thus, all data transformations need to be fully documented and approved.

This includes documentation of data processing rules, data validation logic, and measure calculations. This information provides full transparency into how data is used in the registry and helps to increase trust in the clinical insights generated from the registry.  

Change Management

Registries will evolve and grow as new priorities emerge – from new data elements being added to changes in measure calculations. These updates increase the value of the registry, but in the process, they must not interrupt current data protections that might affect registry results and further confuse users.

Any data change should go through a systematic change control process to transparently document, test, and have approved any update so its impact is known and widely shared prior to implementation.

Data Logging and Auditing

Health data is sensitive and requires data protections to ensure it is responsibly handled. Furthermore, registry engagement will lead to numerous questions from registry stakeholders on how data was processed and handled.  All data entering the registry must be fully traced from the receipt of data to when it is displayed on reports.

This includes auditable logs of all activity related to data submission, data manipulation, and registry interactions. Each data element underlying the registry is tied to information on the date of receipt, data submission protocols, date and time, data quality results, data transformation steps, and the capturing of raw source data for comprehensive auditing and tracking.  

Dedicated Team

Data integrity is best protected when the above processes are controlled by data experts who are intimately familiar with the registry data, goals, and stakeholders. Projects should have a dedicated team of data scientists, data engineers, and data integration specialists who provide long-term, end-to-end registry support through their extensive knowledge of all aspects of the project.

Achieve High Data Quality with Your Registry

Data quality is essential to registries because it establishes trust and value. When your clinicians, researchers, and other stakeholders are confident in the data, they can use it to advance research and improve outcomes.