Abstract

Managing clinical research data is moving beyond spreadsheets on USB sticks. Large anaesthesia and intensive care unit clinical trials now often use sophisticated study-specific databases. 1 Most of these databases, however, are not usually designed for long-term data curation to facilitate access years to decades later.2–4 Such data could be lost. Our recent experience highlights this problem.
We were undertaking a project looking at de-identified historical data on thrombelastographic results for patients undergoing total hip replacement. Some of the original data were more than 20 years old and stored on Minitab Worksheet (.mtw) files (Minitab, State College, PA, USA). We could not open the files with the current version of Minitab (v18). One of us (PL-D) approached technical staff from Minitab Australia who suspected the files were v12 or older, but they were unable to provide software to open the files. They sought advice from Minitab developers in the USA. They in turn thought the files were v8 or older, and they ultimately found a senior developer from San Francisco who provided the oldest ‘legacy’ version they could find. Fortunately, we were able to open the files with that museum version.
The company were very helpful, and the technical experts enjoyed the challenge, but this process took a lot of correspondence and time. Further, we could have ultimately been unable to access the data which would then have been lost. Does this matter? We think it does. First, easy—even open access—to research data enhances the impact, efficiency and effectiveness of research.4,5 This includes contributing to meta-analysis of the original research question or to ask new unanticipated questions. Second, many datasets that are not drawn from big-data sources are unique to time and place. While many could be replicated, none can be replaced if lost. Third, there are regulatory requirements about data curation by government or other regulatory authorities, including funders.
In the State of Victoria, 6 research data should be kept for a minimum of five years ‘after completion of research activity’. But there is variation, depending on the data. 6 Clinical trial data must be kept for 15 years, 6 but data from children must be kept for 15 years after they turn 18 years old. Therefore, neonate data needs to be kept for 33 years, as we are doing for one of our studies. 7 Further, some research data should be retained permanently 6 if the data are one or more of: from genetic research, controversial or of high public interest, costly or impossible to reproduce, about an innovative technique for the first time, or of significant community or heritage value to the state or nation. There is variation across jurisdictions within Australia and beyond. We suspect many researchers are unaware of these rules and the variations.
A recent editorial in Anaesthesia and Intensive Care described the importance of diligent data storage and sharing to enhance transparency and public benefit. 8 Our experience exemplifies the potential problems associated with data storage methods. Possible solutions for data storage include archiving via a public repository (either local or international, domain specific or general) or institutional repository. The process for this may involve submitting smaller datasets to repositories routinely as part of the manuscript submission process to journals (Table 1) and/or archiving larger more complete datasets at key milestones or at the end of research projects. In addition to site of data storage and timing of data archiving, other considerations include data format, supporting documentation and data sensitivity (Table 1). Consistent with our recent experience, a study from the USA found that data availability declined by 17% per year after publication. 2 Further, the researchers found the most frequent reason for unavailable data was that the data were lost or on inaccessible storage media. As others have noted,4,5 we think one answer to maintain long-term data access is data repositories for long-term curation. In Australia and New Zealand, the most likely bodies to facilitate this kind of curation are universities and government organisations (Table 1). We encourage researchers, particularly the clinical trials networks, to explore options for long-term curation3,5,9 of their data, both current and historic.
Curating data.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
ORCID iDs
Laurence Weinberg https://orcid.org/0000-0001-7403-7680 David A Story ![]()
