Two Programs Offer Different Approaches to Health Data Analysis

by Harley Geiger


Two large government programs are being designed to gather and analyze electronic health data. One program will create a centralized database, and the other program will use a distributed query system. They represent different approaches to health data analysis with implications for security and privacy of the data subjects.

The Warehouse Model

In October, the Office of Personnel Management announced plans to create a centralized database containing copies of health claims data, including diagnoses, medical procedures and full identifying information of several million Americans. To fill this database -- called the Health Claims Data Warehouse (or the Warehouse) -- OPM would set up data feeds to receive the copies from three major health insurance programs: the Federal Employee Health Benefits Program, the National Pre-Existing Condition Insurance Program and the Multi-State Option Plan. According to its announcement, OPM intends to use this information to actively manage the three health programs, to run cost-comparison and quality queries and for other unspecified research. OPM said it would de-identify the information before turning it over to researchers and other third parties.

FEHBP covers federal employees, retirees, some spouses and family members (the other two plans were created under the Patient Protection and Affordable Care Act of 2010 and are not limited to federal employees). OPM, acting as employer, is the plan sponsor for FEHBP, and the plan carriers are group health plans -- which are covered entities under HIPAA. In the past, OPM has requested information from the group health plans that administer the care to patients, but patients' raw claims data continued to reside with FEHBP plan carriers. However, with the creation of the Warehouse, OPM seeks to compile its own repository of copies of patients' claims data sent directly from the carrier plans.

The Center for Democracy & Technology issued comments to OPM, urging them to reconsider their database model. We have concerns that the program risks violating Americans' expectations of privacy in their health records. Most people expect their health plans to keep records about their medical claims, but most people are unaware that a government agency intends to aggregate copies of the same data for research and other purposes.

Collecting the information into a centralized location also increases the severity of a breach should the database be compromised, due to the large volume of data in one place. The claims data would be better off left where it is -- with the health plans. OPM could require the plans to provide the information required for analysis. And, it turns out, there is precedent for this distributed database model in the federal system: FDA's Sentinel Initiative.

The Sentinel Model

FDA announced the Sentinel Initiative in 2008 as a means to conduct post-market surveillance on the safety of drugs and medical devices. The Sentinel system can presently access electronic health care data for 25 million people and is on track to access the data of 100 million people in 2012. The health care data will come from dozens of data partners collaborating with FDA, including health care systems and insurance companies. From a privacy perspective, the huge benefit of Sentinel is that FDA does not actually receive identifiable patient data.

The patient data remain housed with the original data holders. FDA sends its safety-related questions to an independent Coordinating Center, which then works with the data partners to develop analytical programs to answer them. The data partners process the queries on their own systems, de-identify or aggregate the data and pass the answers back to the Coordinating Center. Through this distributed system, FDA can use electronic health data to evaluate safety issues for both targeted demographics and the general population in near real-time.

Where Else Can Distributed Systems Be Used?

Distributed systems like Sentinel can leverage existing databases without creating new ones, minimize transfer of data, reduce the number of individuals with access to identifiable data and mitigate the severity of a data breach. Importantly, keeping health claims data with the plans is more consistent with the public's privacy expectations than compiling copies in a separate, centralized database. With these advantages, it's worth exploring whether FDA's Sentinel model can be exported to other data analysis programs, like OPM's Warehouse.

There are some important differences in the two programs. Congress actually required FDA to use a distributed system, so it's unclear whether FDA would have chosen a distributed model if it had had freer rein. Also, the Warehouse may need to process a higher volume of queries than Sentinel. Nonetheless, Sentinel's success to date will hopefully encourage greater use of privacy-protective distributed systems and counter the instinct to construct a whole new database for each analytical need that arises.

Barbara Duck
I think the Office of Personnel management developing their own formulas is a great idea and this way we could have some impartial algorithms not slanted by any one insurance company. I wrote about that this week. Also on the FDA model, they had to outsource part of their Sentinel project on this to an insurance company, as they were not ready with the IT infrastructure required to do this independently. At any rate it is a good opportunity for both to begin updating and using internal IT processes that are not generated by insurance companies.

to share your thoughts on this article.