Abstract
Epilepsy is a chronic neurological condition that requires active self-management to reduce personal and population burden. The Managing Epilepsy Well Network, funded by the US Centers for Disease Control and Prevention, conducts research on epilepsy self-management. There is an urgent need to develop an integrated informatics platform to maximize the secondary use of existing Managing Epilepsy Well Network data. We have implemented multiple steps to develop an informatics platform, including: (a) a survey of existing outcome data, (b) identification of common data elements, and (c) an integrated database using an epilepsy domain ontology to reconcile data heterogeneity. The informatics platform enables assessment of epilepsy self-management samples by site and in aggregate to support data interpretations for clinical care and ongoing epilepsy self-management research. The Managing Epilepsy Well informatics platform is expected to help advance epilepsy self-management, improve health outcomes, and has potential application in other thematic research networks.
Introduction
Epilepsy is a common serious neurological disorder that affects 65 million persons worldwide.1,2 People with epilepsy experience repeated seizures that manifest as physical or behavioral changes. Epilepsy is treated with anti-epileptic medications to reduce or eliminate seizures; however, 30 percent of people with epilepsy continue to experience seizures in spite of medications,3,4 and additional approaches are greatly needed. Epilepsy imposes substantial personal and societal burden. In the United States, annual cost of epilepsy is estimated to be US $12.5 billion with about 85 percent of costs attributed to loss in productivity and in disability. 5
Epilepsy self-management is being increasingly explored as a potential approach to complement other existing treatments. 6 Approaches that actively engage persons with epilepsy in taking care of their own health can potentially reduce the burden of epilepsy.7–12 In 2007, the Centers for Disease Control and Prevention (CDC) established the Managing Epilepsy Well (MEW) Network to develop, test, and disseminate epilepsy self-management interventions. 13 Between 2007 and 2014, the MEW Network has incrementally grown to include six geographically distinct sites conducting epilepsy self-management research. This thematic research Network promotes collaboration on epilepsy self-management with a focus on gaps in knowledge or research related to public health practice. Sites are tasked with developing and implementing a coordinated, applied research agenda; conducting research activities that promote epilepsy self-management and quality of life (QOL); and identifying and collaborating with public health, mental health, and other services agencies.
The MEW Network has developed successful approaches for epilepsy self-management, including web-based epilepsy self-management10,14 and community-based programs that address the common comorbidity, such as depression, in people with epilepsy.8,15 These approaches have made innovative use of electronic, wireless, and mobile devices (e-Tools) to help people with epilepsy overcome the many barriers to care and self-management, such as lack of transportation and the stigma associated with having seizures in public.13,16 In addition to use of remotely delivered interventions, the MEW Network taps into the growth of interest in patient-reported outcomes 17 similar to the Patient Reported Outcomes Measurement Information System (PROMIS) 18 program funded by the United States National Institutes of Health. Patient-reported outcomes that are highly relevant to people with epilepsy such as QOL and patient-perceived depression severity are a key component of many of the MEW Network initiatives and projects.
The MEW Network centers have collected a large amount of research data since 2007 that represents important information about people with epilepsy and self-management. However, most self-management studies typically are not large, and there is a significant need to integrate datasets to answer questions that could otherwise not be answered with small individual study sample sizes. An integrated analysis of this large dataset will enable: (a) comparative effectiveness studies of self-management approaches, (b) cross-cohort queries to test various hypotheses, and (c) generation of insights to develop future self-management programs. Many existing biomedical research networks are using informatics platforms for facilitating collaborative research and high performance distributed computing approaches for data managements.19–21 In addition, secondary use of existing biomedical and health-care data has been identified as a priority by the Presidential Council of Advisors on Science and Technology (PCAST) report on health information technology. 22
However, data heterogeneity makes it difficult to integrate and query data across the Network. Reconciling data heterogeneity to develop data integration platforms has been an area of active research in computer science.23,24 Data integration systems often use a common terminology system, which is mapped to terms from disparate data sources, to support both data processing and user queries. 23 An ontology is an example of common terminology that uses formal knowledge representation language to model domain information for consistent and accurate interpretation by software applications.25,26 The National Institutes of Health (NIH) funded National Center for Biomedical Ontologies (NCBO) lists more than 370 biomedical ontologies that have been developed to create common terminology systems in many biomedical and health-care domains.
In this article, we describe the development of an ontology-driven MEW Network informatics platform with an integrated database to support future data analysis, cohort queries, and research studies that may be implemented across the MEW Network sites. This informatics infrastructure will significantly enhance the secondary use of MEW research data and ensure maximized returns on investments made by funding agencies to support epilepsy self-management research.
Methods
MEW database workgroup
Recognizing the need to develop an informatics platform that could analyze the data collected from disparate sources, the MEW Network convened a workgroup tasked with developing an integrated database strategy and testing the strategy in a proof-of-concept exercise. The informatics platform development was initiated with seed funding from the Case Western Reserve University (CWRU) Prevention Research Center (PRC). The scope and functionality of the informatics platform was defined based on the results of a survey conducted in February 2013 with participation of all MEW Network sites. The survey collected data regarding the different categories of study data and the informatics approach used to manage the data in the Network. The survey used a questionnaire with seven query categories that required either a “yes/no” or a descriptive response. Completed surveys on 11 specific research studies were manually processed, and findings were assessed by the MEW workgroup.
MEW Network survey results
Most MEW research studies focused on comorbidities, treatment outcomes, and data required for predicting as well as evaluating the impact of self-management. Seizure frequency was collected in eight studies and emergency department data were collected in five. All studies collected data on seizure type, but only one collected data on seizure duration. Data on anti-epileptic drugs were collected in seven studies, and two collected data on psychotropic drugs. Other major categories of data included measures of QOL, cognition, depression, sleep quality, and general functioning.
An important survey finding was the large number of disparate metrics used to collect research data across the MEW Network. For example, four different QOL metrics were used, with three studies using the Quality of Life in Epilepsy–31 (QOLIE-31), 27 two using the QOLIE-10, 28 one using the Neuro-QOL,29,30 and one using internally developed Likert metrics. Similarly, 10 different cognitive measures and 5 depression measures were used. Heterogeneous measurement metrics represent a critical challenge to an integrated informatics infrastructure.
In addition to data heterogeneity, the survey found that only a few sites use a database management system (DBMS) to store and access data. Six of the studies used paper forms and nine used Microsoft Excel spreadsheets. Only one study used the Microsoft Access DBMS, and another study used REDCap (Research Electronic Data Capture) research software. 31 Based on survey results and a presentation on the epilepsy informatics platform developed for clinical data management, 32 the Network centers collectively agreed to participate in further work to integrate research data across participating sites. Figure 1 illustrates the architecture of the MEW Network informatics platform that consists of the following three components:
A common terminological system modeled as a domain ontology using a set of epilepsy self-management common data elements;
A data processing and curation pipeline that can reconcile heterogeneity in datasets from different sites using the epilepsy domain ontology as reference; and
An integrated database to support: (a) data analysis across different studies, (b) cohort building, and (c) cross-cohort queries.

Proposed architecture of the Managing Epilepsy Well (MEW) informatics platform.
The MEW informatics platform re-uses resources from an existing clinical informatics platform developed for epilepsy called the Multi-modality Epilepsy Data Capture and Integration System (MEDCIS). 33 The MEDCIS platform was developed to enable multi-center data integration and querying using the Epilepsy and Seizure Ontology (EpSO) as the common terminology system. 34 We extended EpSO with epilepsy self-management terms to support the MEW informatics platform.
A common terminology for epilepsy self-management
A common terminology system is essential to develop an integrated database from disparate sources that will support user queries across different datasets. In the first step, we used the survey results, a literature review and group consensus to identify essential measurement constructs and metrics. The MEW informatics initiative included experts in epilepsy and mental health care, computer science, and public health. The workgroup members reviewed survey findings, the literature on data elements common to epilepsy self-management, and relevant large-scale epilepsy assessment and monitoring protocols.
To efficiently use our limited resources to model the large number of epilepsy self-management terms, we used an incremental 3-tier phased development approach to develop the terminology system. In the Tier 1 phase, a core set of data elements (Table 1) were identified consisting of 16 variables that are frequently analyzed in epilepsy self-management research studies. The Tier 1 variables describe basic demographic/clinical characteristics, health status, selected comorbidity measures, employment status, and seizure information. Existing epilepsy terminology references were used to define the variables, such as the Institute of Medicine’s Report on “Epilepsy Across the Spectrum: Promoting Health and Understanding 2012,” 2 the recommended standards for epilepsy surveillance studies, 35 the National Institute of Neurological Disorders and Stroke (NINDS) Common Data Elements (CDE), 36 and the Behavioral Risk Factor Surveillance System (BRFSS) Questionnaire. 1
Basic (Tier 1) elements in an integrated epilepsy self-management database.
GED: General Educational Development.
National Institute of Neurological Disorders and Stroke (NINDS) Common Data Elements (CDE).
Centers for Disease Control and Prevention (CDC). 1
Behavioral Risk Factor Surveillance System (BRFSS): Section 1.
BRFSS: Section 2.
Quality of Life in Epilepsy–10 (QOLIE-10). 28
Patient Health Questionnaire–9 (PHQ-9): Kroenke K, Spitzer RL and Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 2001; 16: 606–613.
Additional variables will be added in the Tier 2 phase, based on the specific requirements and Institutional Review Board (IRB) approvals of existing or new participating sites, while a comprehensive list of variables for epilepsy self-management will be constructed during the Tier 3 phase.
Standardization of data elements
A total of 42 data elements were used to describe the 16 Tier 1 variables, and these data elements were standardized and incorporated into the EpSO epilepsy domain ontology. 34 EpSO uses the well-known four-dimensional epilepsy classification schema to model concepts of seizures, location of seizures, etiology, and related medical conditions.37,38 In addition, EpSO models electroencephalography (EEG) and comprehensive drug information based on the US National Library of Medicine RxNorm standard. 39 EpSO concepts are mapped to the NINDS CDE, which represents nine categories of terms describing imaging, neurological exam, neuropsychology, seizures, and syndromes. 36 EpSO has been used successfully to support standardized patient data entry in the Ontology-driven Patient Information Capture system, 40 clinical text processing in the Epilepsy Data Extraction and Annotation system, 41 and cloud-based electrophysiological signal processing in the Cloudwave platform. 42 Hence, extending EpSO with epilepsy self-management terms will not only address data heterogeneity in the MEW informatics platform, but also support interoperability with other existing epilepsy informatics tools.
Figure 2 illustrates the modeling of the Tier 1 epilepsy self-management terms in EpSO, for example seizure count and the patient health questionnaire (PHQ). In addition to modeling the terms, EpSO annotates each class with additional information, for example PHQ item 7 (label) and Trouble concentrating on things, such as reading the newspaper or watching television (textual description). We plan to extend EpSO with additional terms during the Tier 2 and Tier 3 phases.

Screenshot of the Epilepsy and Seizure Ontology class hierarchy.
Data curation workflow for MEW Network datasets
In a proof-of-concept exercise, the MEW Network informatics platform was populated with research datasets from three studies: (a) the Web Epilepsy Awareness, Support, and Education (WebEase) project at Emory University, 10 (b) the Targeted Illness Management for Epilepsy and Mental Illness (TIME) project at CWRU, and (c) the pilot of the FOCUS on Epilepsy program (FOCUS) at the University of Michigan. 14 The WebEase study involved 148 individuals participating in an online epilepsy self-management study. The TIME study is an in-person, community-based intervention to improve mood and epilepsy outcomes in people with epilepsy and comorbid serious mental illnesses like schizophrenia, bipolar disorder and depression. The FOCUS study is testing a hybrid in-person and phone-based program that develops self-regulation skills in both adults with epilepsy and a key friend or family member who provides support.
We developed three data flows corresponding to each research study as part of the data curation, mapping, and transformation layer to extract, transform, and load (ETL) data from each of the three research studies (Figure 1). The ETL workflows consisted of multiple data processing phases. The first step generates mapping between study-specific variables and the MEW common terminology system, which involves reconciling differences in both data values and interval values used to categorize the data elements. For example, the first three interval values for the education variable in TIME correspond to “Never attended school,” “grades 1 through 8,” and “grades 9 through 11.” The WebEase study used a single interval of “School from 1 through 11;” hence, education values of “1,” “2,” and “3” in TIME correspond to a value of “3” in WebEase, but the inverse in not true. To address this issue, mappings were defined to map value “1” in WebEase to value “3” in the integrated database.
Another example of data heterogeneity involves the use of value “1” in the FOCUS dataset in contrast to value “2” used in the MEW common data elements to represent “Hispanic or Latino” participants. There are no completely automated techniques to reconcile this “semantic heterogeneity” in data integration systems; 43 hence, the appropriate mappings were manually generated for the MEW datasets. In the next step, these mappings are used to extract and automatically transform each research dataset, using common terms that are mapped to EpSO.
The data processing step assigns each study participant a unique identifier and annotates the corresponding data value with provenance information describing the source study. Provenance metadata, which describes the source of data, 44 allows the MEW informatics platform to filter query results based on the research study. The ETL workflows were developed in a modular manner, which allows the MEW informatics platform to incrementally add new research datasets without disrupting user interactions over the deployed database. In future work, we propose to deploy the study-specific workflows as Web services, 45 which are Web applications that can be remotely invoked by the MEW informatics platform as part of a Service Oriented Architecture to populate the integrated database.
MEW Network database design and functionality
The integrated database used a relational data model to store the data. The integrated database is currently hosted on a local protected computer drive and will be moved to a dedicated and secure MEW computing environment in the future to make it available to the MEW Network. We are currently developing an intuitive user interface to query the database that will re-use many resources of the existing Visual Aggregrator and Explorer user interface deployed for the MEDCIS platform. The user interface will feature query widgets corresponding to study variables that can be composed to express queries, 46 a result display section with “live count” of patient cohorts, and visualization tools that can support statistical analysis.
We used the open source MySQL DBMS for the integrated database, together with the Structured Query Language (SQL), to compose and execute the queries described in this article. As the volume of the data increases with integration of additional MEW research datasets, we will migrate the database to open source high performance column-based HBase DBMS. 47 Two categories of queries were executed over the integrated database to retrieve: (a) values for seven variables describing demographic and clinical data that are related to QOL values for all participants across the three research studies and (b) values corresponding to all 42 common data elements divided into three sets for each study to describe the total sample represented by the integrated database. The query results were analyzed using standard statistical techniques.
Results
Sample description
Table 2 illustrates the Tier 1 demographic and clinical variables from the three MEW Network datasets in the proof-of-concept exercise. It is apparent that there are substantial differences across studies in demographic and clinical characteristics of participants with epilepsy. The WebEase study, which is delivered remotely via the Web, enrolled a predominantly White sample population with relatively high levels of employment and social support. In contrast, the TIME and FOCUS studies enrolled much larger proportions of minorities, with African-Americans comprising approximately one-fourth to one-third of the individual study samples. None of the studies enrolled more than 2 percent Asians. Income data, in line with the published literature, suggest that individuals with epilepsy are generally living under difficult financial circumstances. Seizure frequency varied widely with nearly 10 seizures per month in the WebEase sample versus 2 or less seizures per month in the FOCUS and TIME samples. Depression scores suggested moderate to high levels of depression in the TIME and FOCUS studies (depression scores were not available for the WebEase study).
MEW Network integrated database Tier 1 variables.
SD: standard deviation; QOLIE-10: 10-item Quality of Life in Epilepsy; PHQ-9: Patient Health Questionnaire 9-item version.
Lower scores indicate better quality of life. Higher scores indicate worse depression.
Seizure frequencies were standardized to a consistent time period (30 days). If a study reports only the cumulative number of seizures over the past 3 months instead of the average number of seizure per month, then the total number of reported seizures is averaged over the 3 months period to derive a 30-day frequency count.
Secondary use of integrated data
As a point of reference, and in order to maintain a practical focus on how the informatics platform might address future important and clinically relevant epilepsy self-management research, the MEW Network database workgroup reviewed descriptive data across the three studies and summarized some general observations. The data analytic approach was iterative in that the context of the study (inclusion criteria and sample focus) needed to be considered for clinical relevance and query formulation. For instance it appeared that the WebEase sample had relatively greater social support and employment advantages in contrast to the TIME and FOCUS samples. In spite of having a higher mean number of seizures, the group mean on QOL was not worse than the TIME and FOCUS samples. The cross-sectional nature of this preliminary analysis and missing elements in a number of Tier 1 domains precludes our ability to attribute causality to findings or make definitive conclusions. In order to more fully explore the potentially protective effects of social and occupational supports for people with epilepsy, it will be critical to collect as many shared data constructs as possible across the MEW Network, expand the aggregate sample size, and include longitudinal datasets.
Discussion
An incremental and multidisciplinary approach to developing an integrated informatics platform within an existing research network of multiple sites and research studies is feasible and potentially useful. In this pilot implementation, the proposed informatics platform demonstrates the ability to effectively manage and share datasets from three Network sites. Taken together, the integrated database could support research questions that go beyond individual studies or sites, and can harness the power of aggregate data to address important gaps in knowledge regarding epilepsy self-management.
Data-sharing framework for the MEW network and other thematic research groups
In addition to data harmonization and other challenges described earlier, the workgroups encountered issues that need to be addressed when planning an integrated data management strategy, such as privacy/data confidentiality issues and ongoing infrastructure needs. It is essential for the MEW Network to establish regulatory and ethical procedures for sharing data between sites that are well defined and conform to patient privacy standards. For the proof-of-concept exercise, the use of data that were de-identified was considered to be nonhuman subject research activity by the local IRB and thus did not require informed consent. For ongoing studies, it is essential that study participants provide written informed consent for data inclusion. In some instances, a Data Use Agreement (DUA) is needed for data exchange across sites. Having available DUA templates with appropriate regulatory language can greatly reduce the burden for sites to participate in the integrated database. Appropriate aspects of this data-sharing agreement need to be incorporated into the informatics platform, such as access control and audit trails indicat-ing usage by researcher that satisfy the requirements of Health Insurance Portability and Accountability Act (HIPAA) and Health Information Technology for Economic and Clinical Health (HITECH) standards.
The MEW informatics platform needs to be underpinned by a highly scalable infrastructure that can address the challenges of both an increasing amount of data and complex analytical queries. The pilot implementation of the informatics platform was deployed on local computing resources with minimal cost. For future deployment of an informatics platform that supports an intuitive user interface and visualization software tools, emerging cloud computing resources with low initial cost also need to be considered.
Providing insights for future self-management strategies through data-driven research
An integrated informatics platform can serve as a valuable data resource for epilepsy research. Such a resource effectively shares data to give researchers the power needed to learn more about epilepsy and self-management, and has the potential to substantially contribute to epilepsy research, clinical care, and public health efforts. For researchers, the platform could enable possible estimation and projection of plausible effect sizes from treatments and strengths of association between variables for sample size planning. For clinicians, approximate prevalence levels of comorbid conditions such as depression can be ascertained, and patterns of comorbidity and outcomes in sub-samples such as minorities and those individuals at the ends of the age spectrum (e.g. young adults and elderly) can be assessed in relation to the general adult population with epilepsy. For program planners, representation of specific sub-populations with epilepsy could help inform future research priorities. For example, Asians with epilepsy were not well-represented in the MEW Network proof-of-concept exercise, and this may be a sub-group that needs additional study.
Finally, the informatics platform can serve as a resource for psychometric analysis of new measures, such as a standardized tool to assess epilepsy self-management. QOL and other self-reported measures can be compared and validated with rich external criteria, including legacy standardized measures.
Conclusion
In conclusion, secondary use of clinical data through data sharing and integration is an important objective for the biomedical community to advance scientific endeavor. An incremental approach to consolidate and standardize data collection across a US thematic research network has potential to advance epilepsy self-management, and may help patients and families dealing with epilepsy minimize complications and achieve better health outcomes.
Footnotes
Acknowledgements
The authors would like to acknowledge the helpful suggestions of Rosemarie Kobau from the Centers for Disease Control and Prevention (CDC) and the support of Dr Elaine Borawski of the Case Western Reserve University (CWRU) Prevention Center for Healthy Neighborhoods (PRCHN).
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Centers for Disease Control and Prevention (CDC) (grant number 3U48DP001930-04S3), and National Institute of Neurological Disorders and Stroke (NINDS) Prevention and Risk Identification of SUDEP (sudden unexplained death in epilepsy) Mortality (PRISM) Project (grant number P20NS076965). Additionally, this publication was made possible by the Clinical and Translational Science Collaborative of Cleveland (grant number UL1TR000439), from the National Center for Advancing Translational Sciences (NCATS) component of the National Institutes of Health (NIH) and NIH roadmap for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.
