Background: Enforcement of the Health Insurance Portability and Accountability Act (HIPAA)
began in April, 2003. Designed as a law mandating health insurance availability when coverage
was lost, HIPAA imposed sweeping and broad-reaching protections of patient privacy.
These changes dramatically altered clinical research by placing sizeable regulatory burdens
upon investigators with threat of severe and costly federal and civil penalties. This report describes
development of an algorithmic approach to clinical research database design based
upon a central key–shared data (CK-SD) model allowing researchers to easily analyze, distribute,
and publish clinical research without disclosure of HIPAA Protected Health Information
(PHI).
Methods: Three clinical database formats (small clinical trial, operating room performance,
and genetic microchip array datasets) were modeled using standard structured query language
(SQL)–compliant databases. The CK database was created to contain PHI data, whereas a
shareable SD database was generated in real-time containing relevant clinical outcome information
while protecting PHI items. Small (<100 records), medium (<50,000 records), and
large (>108 records) model databases were created, and the resultant data models were evaluated
in consultation with an HIPAA compliance officer.
Results: The SD database models complied fully with HIPAA regulations, and resulting
"shared" data could be distributed freely. Unique patient identifiers were not required for
treatment or outcome analysis. Age data were resolved to single-integer years, grouping patients
aged >89 years. Admission, discharge, treatment, and follow-up dates were replaced
with enrollment year, and follow-up/outcome intervals calculated eliminating original data.
Two additional data fields identified as PHI (treating physician and facility) were replaced
with integer values, and the original data corresponding to these values were stored in the
CK database. Use of the algorithm at the time of database design did not increase cost or design
effort.
Conclusions: The CK-SD model for clinical database design provides an algorithm for investigators
to create, maintain, and share clinical research data compliant with HIPAA regulations.
This model is applicable to new projects and large institutional datasets, and should
decrease regulatory efforts required for conduct of clinical research. Application of the design
algorithm early in the clinical research enterprise does not increase cost or the effort of
data collection.