Abstract

Analytical instruments are among the most prolific generators of data, and as the throughput of modern instrumentation increases the actual amount of data that science-based organizations are faced with is growing at an increasingly rapid rate. Over the past two years a number of data management systems have emerged to attempt to help organizations handle this data growth. However, until recently there have been no commercially available solutions able to address these issues over longer periods of time.
To enable organizations to preserve their valuable instrument data as long term records and gain maximum benefit from the knowledge held within, Thermo LabSystems has introduced the first of a new generation of data management systems, called eRecordManager™. It creates a common format for the capture, archival, mining and retrieval of analytical data, without relying on the original application and without losing any of the valuable information in the traces. This common format is based on XML and eRecordManager features a library of converters for approximately 150 different instrument data file types. In addition to all of the most commonly used formats, some of the file types handled are from systems that ran on computer platforms that are no longer produced, or that at least are being phased out in many organizations.
Following the acquisition of Galactic Industries Corp (now Thermo Galactic) in 2001, Thermo LabSystems utilized Galactic's unique technologies and expertise in handling data from a wide range of analytical instruments to develop eRecordManager to help organizations overcome the challenge of compliant data management. As its name suggests, eRecordManager is a solution for the management of electronic records. The ‘eRecord’ aspect refers to ‘Electronic Records,’ as defined by the FDA in their ruling 21 CFR Part 11, that deals with the requirement for secure archiving and the ability to retrieve records in the future. The ‘Manager’ aspect refers to Knowledge Management and the ability to share all this information across the enterprise. In summary, eRecordManager is designed to meet the requirements for securely storing spectral and chromatographic data from multiple data formats.
COMPLIANCE
Many of the environments in which this type of system will be invaluable are, of course, heavily regulated. For this reason, eRecordManager is built on a well proven security model with a number of functions to support compliance with regulations such as 21CFR Part 11 and C.R.O.M.E.R.R.R., the US EPA's equivalent protocol on electronic records.
These include comprehensive server based authentication based on either Oracle or Windows NT/2000, meaning that even if eRecordManager's client is by-passed by a third party application, the user still has to be authenticated by the eRecordManager server. This authentication includes password rules concerning expirations, reuse, complexity, etc. Each of the configuration entities (archive policies, etc.) is also subject to version control with full audit trail and authorization, including electronic signing.
INTEGRATION WITH OTHER APPLICATIONS
Like many scalable systems capable of being deployed at the enterprise level, eRecordManager has software interfaces available to integrate with other computer systems. These may be instrument data systems for direct archive, ELNs (Electronic Laboratory Notebooks) for combining chromatograms and spectra with other information (text, numeric, chemical structures) or LIMS for tying results directly to the original instrument files that produced them.
eRecordManager is designed specifically with a component-based architecture making it straightforward to interface at any level in its n-tier architecture. Such integration can be performed by end user IT teams, or through consultancy services such as Pathfinder, Thermo LabSystems' global informatics services group. Such groups can also offer other services ranging from migration for obsolete file and media formats through to validation consultancy.
INSTRUMENT DATA FILES
Of the 150+ different instrument data file formats that can be interpreted and archived by eRecordManager, a number are file types generated from technology from other Thermo Electron businesses. These are the first file types that Thermo LabSystems presented at the launch of eRecordManager at Pittcon 2002. For example, eRecordManager interfaces with Thermo Nicolet's OMNIC™ and RESULT™ software, Thermo Spectronic's AMINCO•Bowman® Series 2 Luminescence Spectrometer and Thermo Finnigan's Xcalibur® software, all of which offer the user the ability to archive files for the long term to support compliance. Integration with Thermo Galactic's desktop spectroscopy software, GRAMS/AI™, is now also a reality. Using eRecordManager and GRAMS/AI in tandem, spectral data in any original format is available for advanced reprocessing, visualization and library searching without the requirement for the original instrument system and hardware to be installed.
DATA FORMAT CONVERSION AND XML
XML has become the industry standard platform-neutral format for data storage and exchange. Through a unique library of powerful file converters that automatically generate XML versions of the data, the archived information can be viewed on virtually any platform in the future, without using the original instrument software. This provides users with the ability to respond much more promptly and effectively to requests from regulatory or legal bodies.
The market's acceptance of XML ensures that it should predominate for many years to come, regardless of the evolution of operating systems and computer hardware. To satisfy the long-term record-keeping requirements and to facilitate knowledge management, eRecordManager also archives the original raw data files from the instrument software, along with a normalized representation in XML.
Users with access to the eRecordManager archive can view the normalized version of the data from any computer. In addition, the XML or the original data files can be retrieved for use with additional applications. When storing analytical instrument data in XML, eRecordManager makes use of a public-domain XML schema known as Generalized Analytical Markup Language (GAML).
WORKFLOW
The moving of data into the eRecordManager system can be scheduled interactively or automatically
Interactive — scheduling can be initiated by a user either from within eRecordManager itself or via an eRecordManager-enabled LIMS or other data system
Automatic — scheduling can be controlled through an archival policy which sets criteria for archiving and deletion (i.e. based on time since creation, time since modification, time since last access etc.)
This can be managed via the eRecordManager client application, or some other application integrated with the eRecordManager server such as a LIMS or a Web application. Alternatively, eRecordManager can have one or more archiving agents that automatically schedule archiving based on an Archive Policy' (See Archival Process section).
Once an eligible file has been identified, eRecordManager executes rules configured for each file type, which govern the related files that must be archived. Unlike a .doc file for Word which is completely self contained, most instrument data systems produce a number of files for each ‘run’ or ‘experiment’, such as the raw file, processed file, method, report and so on. Some systems go a stage further and actually group traces for samples, standards, blanks, QCs etc., into a single sequence or workbook. eRecordManager understands these relationships for each type of data system meaning that everything needed to completely recreate the entire analysis is stored together as one atomic unit. This is invaluable in the case of calibration bracketing for HPLC determinations.
The entire sample set is then moved to the server together with a 128 bit ‘message digest’ based on the widely used MD5 algorithm. This is a one-way hash algorithm specifically intended for digital signature applications, and the digest can be verified at any point in the future to confirm that the contents of the files have not been tampered with. This is essential for compliance with the US FDA's ruling 21CFR Part 11, as well as the US EPA's C.R.O.M.E.R.R.R standard.
Once at the server, an accurate copy of the data is made through the appropriate converter, and after extracting specified metadata the resultant XML file is then stored together with the original data using a suitable long term storage system. This may be optical or magnetic media, either directly attached to a host computer or distributed using a SAN (Storage Area Network).
For organization of the data, eRecordManager uses the concept of an unlimited hierarchy of folders that can be defined using filters on the underlying Oracle database. This means that a single piece of data can appear in multiple folders, although of course there is only one copy. This allows data to be grouped together simultaneously by (for instance) study, user, instrument, date of collection and so on. These filters can also be dynamic, in that they can be configured to prompt for specific criteria each time they are refreshed. This allows generic folders such as “Data created by user”, where the ‘user’ is prompted for. For ad-hoc searches, apart from the usual database-type queries, there are now powerful tools becoming available from Oracle to enable data mining via the BLOBS (Binary Large Objects) containing XML. This is in addition to searching on the XY points in the case of spectral data such as MS, FT-IR, UV and NMR.
Once a particular set of data has been identified the user can retrieve either the original data files for use with the original application, or the XML representation for visualization, comparison and manipulation via eRecordManager itself. For more complex work, the data can be restored into widely used desktop data systems such as GRAMS/32 (Thermo Galactic). The GRAMS product supports a range of advanced processing and review operations on a wide range of data files, including eRecordManager's XML format.
ARCHIVAL PROCESS
An Archive Policy is a set of rules used to determine when files should be automatically archived. Files can be automatically archived after a user-defined interval. This interval can be measured from either the file creation date or last modified date. The Archive Policy can optionally be used to automatically delete files after they have been archived.
Instrument data systems often produce a number of files for each analysis performed. Typical of these are raw files, processed files, method and sequence information, GLP information and so on. It is a pre-requisite to treat all these as a single inseparable unit and ensure they are all archived together. eRecordManager understands the relationships for each of the supported instrument software formats, and can easily be configured for new or additional file types. For example, eRecordManager can be configured to capture a relevant Microsoft Word document along with the actual instrument data.
Once the instrument data files are gathered together they are moved to the server. There, the appropriate translator creates an XML copy that is then stored as a BLOB in the Oracle database.
Metadata information is extracted from the data (i.e. date/time created, owner, application type, sample name, component names, batch, LIMS ID, etc.) and stored in the Oracle database based on standard system rules and configurable mapping policies.
LONG-TERM STORAGE
Long term availability of the data is paramount. For this reason, eRecordManager stores the complete ‘Record’ including the original data and a copy of the XML and Meta data in portable ZIP archive files. Each archive file itself is uniquely named allowing separate companies, organizations, departments or labs to merge their eRecordManager archives without the worry of duplicate identifiers.
The choice of storage media can be regular magnetic drives, CD-R, DVDRAM etc., either directly attached to the server or configured as a Storage Area Network (SAN). As mass storage technologies change, eRecordManager is designed to allow easy migration of all archive files from one media to another. This is an essential part of any archiving system where data is likely to be kept for many decades.
Each archived data unit can exist without relying on the Oracle database. The archived ZIP files are completely self-describing records of everything known about the data including raw data, processed data, metadata, names, dates, file paths, and associated files.
Using the extracted metadata there are a number of different ways that the data can be organized and retrieved. eRecordManager offers the ability to define any number of user-defined folders containing groupings of “filtered” data, organized, for instance, by date, application type, analyst, etc. Folders are displayed and manipulated using an Explorer-style user interface, which can be easily configured to show various fields from the Oracle database.
SUMMARY
The management of high volumes of data is one of the biggest issues facing the scientific community today and traditional archiving and data management systems require the original data system to be able to review and manipulate the real data (the XM data pairs). eRecordManager has been aptly described as a ‘Data Preservation System’ making extensive use of XML to ensure that data are available for publishing, mining, review and manipulation many years into the future, long after the original data systems and the computers they run on have become obsolete. With eRecordManager, there is no longer the requirement to retain legacy computer and software systems in order to access their archive of proprietary, binary data file formats.
