Abstract
The Mouse Tumor Biology Database (MTB) is designed to provide an electronic data storage, search, and analysis system for information on mouse models of human cancer. The MTB includes data on tumor frequency and latency, strain, germ line, and somatic genetics, pathologic notations, and photomicrographs. The MTB collects data from the primary literature, other public databases, and direct submissions from the scientific community. The MTB is a community resource that provides integrated access to mouse tumor data from different scientific research areas and facilitates integration of molecular, genetic, and pathologic data. Current status of MTB, search capabilities, data types, and future enhancements are described in this article.
In the postmolecular/genomics scientific world, it is increasingly difficult for individual researchers to locate, identify, and synthesize the huge and diverse amount and diversity of scientific data that may be relevant to their research. This challenge is amplified by the advent of large-scale data–generating projects, such as the human genome sequencing project, 8 the ENCyclopedia Of DNA Elements (ENCODE) project, 21 Genome Wide Association Studies (GWAS) studies, 14 and the promise of the $1000 genome. 15 Concurrent studies in mice have produced a complete reference genome sequence, 7 The Mammalian Gene Collection, 20 and the RIKEN Mouse Encyclopedia sequence. 12 Recent initiatives are leveraging advances in sequencing technology to deeply sequence the genomes for 17 inbred lines of mice (http://www.sanger.ac.uk/resources/mouse/genomes/) and to characterize their transcriptomes. 17
The mouse has become the most common and important model system used in studying human diseases and has resulted in the generation of large amounts of data, much of it disease related. Factors contributing to the preeminence of mouse models are practical (economy of maintenance), genetic (the mouse has both an exquisite genetic map and is fully sequenced), accessible (the mouse can be studied at all life stages, including embryonic), and the available tools to manipulate its genome are unmatched in any other mammalian system. 4,5,22 Mouse models take advantage of the similarity of physiology and genetics of humans and mice, established inbred strains, and a wide array of molecular tools to generate targeted (so-called knockouts) and conditional mutations to simulate human disease states. Many of these model systems are designed to investigate cancer. An indicator of how prevalent cancer models are and how much pathological data are derived from them comes from a simple PubMed search (http://www.ncbi.nlm.nih.gov/sites/entrez) for all references containing the terms “mouse,” “cancer,” “human,” and “pathology.” In total, 32,297 scientific articles included the search terms, from 2,061 published in 2000 to 6,065 published in 2009. Collating published data on the strains, mutations, nomenclature, and pathological descriptions in the increasing number of mouse model systems and interpreting these data to discover new insights and enable design of new experiments has become virtually impossible on an individual level. The Mouse Tumor Biology Database (MTB) was created to integrate tumor data generated by these models and provide the ability to query and analyze these data. As such, the MTB presents a unique opportunity for pathologists to both search for and examine existing primary data, study mouse tumor genetic and epidemiological data associated with pathology records, and provide a forum for presentation of published and unpublished photomicrographs, complete with detailed annotations, to the scientific community at large.
Mouse Tumor Biology Database Overview
The MTB was first made available to the public in 1998. 2,13 The goal of the MTB is to provide a centralized electronic resource to collect and integrate the many different types of data obtained from mouse cancer models in an easily searchable database and provide analysis tools that allow users to identify existing models and facilitate the development of new models. Data include incidence and latency of mouse tumors, pathology reports and images, and strain and somatic genetics. The MTB also includes cytogenetic images showing changes in tumor karyotype, spectral karyotyping (SKY), comparative genome hybridization (CGH), and quantitative trait loci (QTL) data, and it will soon include gene expression array data from the Gene Expression Omnibus (GEO) and the ArrayExpress Archive (ArrayExpress). All data are attributed to the original reference, a contributor citation, or the source Web site. The MTB uses multiple controlled vocabularies and standardized nomenclature to allow for integrated searches of data from different sources. Searches of the MTB are accomplished using Web-based query forms. Each query form uses terms specific for a primary data type in the MTB, such as tumor class, mouse strain, genetics, images, reference, and mouse homologs of human genes and associated data. Combined searches using terms from the strain, genetics, pathology image, and tumor search forms simultaneously are also available using the advanced search form.
Data Sources and Links in the MTB
The MTB is updated weekly and includes curated data from the scientific literature, data submissions from cancer researchers, and data downloaded from public databases, such as Pathbase (http://www.pathbase.net/), 18 and health surveillance data from production colonies at The Jackson Laboratory (JAX) and colonies of aging mice from the Jackson Aging Center. The MTB is part of the Mouse Genome Informatics (MGI) 9 resource and can be accessed from the MGI Web site (http://www.informatics.jax.org/). The use of standard gene nomenclature, controlled vocabularies for tumor and anatomical terms, and shared database infrastructure facilitates links between the MTB and other MGI databases, Mouse Genome Database (MGD), 1 Gene Expression Database (GXD), 19 and the Gene Ontology Database (GO). 10 The MTB provides links to supplementary on-line resources described in Table 1 . Additional resources, such as Entrez Gene (http://www.ncbi.nlm.nih.gov/gene), Online Mendelian Inheritance in Man (OMIM) (http://www.ncbi.nlm.nih.gov/omim), and Ensembl (http://www.ensembl.org/) can be accessed from reference links to MGI and associated gene detail pages. The MTB also maintains a list of over 200 mouse-specific antibodies available, and how to use them for immunohistochemistry, with links to positive control sample/images in both HTML and Microsoft Excel format. 16
MTB-Provided Online Resource Links
Disease Data and Images in the MTB
The largest portion of data in the MTB, including pathologic notations and images, comes from expert curation of published scientific literature by the MTB biocurators. The MTB works directly with journals to obtain permissions to incorporate images from publications into MTB. Although MTB biocurators have expertise in cancer biology and mouse genetics, they are not trained pathologists. As a result, the MTB staff encourages direct submission of annotated data and images from pathologists involved in cancer research. The MTB currently includes such data obtained from pathologists at JAX and from direct submission by cancer pathologists in the greater scientific community. Currently, the MTB contains 5,451 tumor pathology reports, which contain 4,187 photomicrographs submitted by 46 researchers from 32 institutions or that were obtained from journals. Pathology and image data are presented in MTB as pathology records (Fig. 1) attached to specific tumor frequencies and can be accessed from the tumor frequency or the originating reference. Pathology-based annotations include information on the organ of origin and the affected organ, treatments, classification of the tumor, tumor frequency, mouse strain in which the tumor was observed or induced, and relevant genetic mutations (germ-line and somatic). Table 2 shows the current data content of the MTB.

Example of pathology report and associated image detail page. Clicking on thumbnails opens image detail page (arrow).
Current Data Content for the Mouse Tumor Biology Database
Searching for Pathology Data in the MTB
Pathology descriptions and images in the MTB can be accessed using links from the tumor frequency record or by directly searching with the Pathology Image Search Form (Supplementary Data Figure S1; a supplemental appendix to this article is published electronically only at http://vet.sagepub.com/supplemental). For example, to search for data on ovarian hemangiomas, users would first select hemangioma from the list of tumor types and then select ovary as the organ affected (Fig. S1). This search returns 3 pathology reports with 10 thumbnail-size images (Supplementary Data Fig. S2). Clicking on an image (arrow in Fig. S2) opens a new window displaying relevant pathology descriptions, a higher resolution version of the image, and tumor, strain, and reference details. In addition, links are provided to other MTB data associated with this photomicrograph (Fig. 2 ).

Pathology image detail page with associated data and links to additional tumor, strain, and reference data.
Submitting Data to MTB
A Web-based submission system is available to facilitate direct submission of data and images to the MTB. The pathology submission forms allow researchers to create records containing detailed information on strain, genetics, tumor diagnosis, treatment regimens, and pathology image descriptions and attach images to complement the descriptions. High-quality images can be entered as Zoomify images (http://www.zoomify.com/). Researchers register for an ID and password to create submissions in a private MTB database space, which allows private access to their data and flexibly allows in-progress submissions to be saved and editorial access to expand or correct previously submitted data. Data can be submitted immediately, partially entered, and saved to be completed at a later date, and edited after being submitted to amend or add additional data to an entry. Data are private until released by the submitter. Once submitted, data are reviewed by the MTB staff for obvious inconsistencies and then released for public view. Large image files, for example whole slide images (WSI) from Aperio or Hamamatsu scanners, require uploads by FTP. The MTB provides a user support link on the MTB home page that allows users to arrange for alternate data submission methods or provide feedback to curators on existing MTB data records.
MTB Enhances Availability of Tumor Data
The availability of the MTB as a vehicle to publish pathology images and diagnostic data is a significant augmentation for the scientific community’s access to these data. Because of publication costs, scientific journals restrict the number of photomicrographs published, which limits the amount of data publicly available for comparison and interpretation. The mechanisms for querying Supplementary Data for journal articles are severely restricted, and access may be limited and transient, depending on individual journal policies. Submitting unpublished data, or additions to previously published data, to MTB enables significantly more data to be made available in a format that is integrated with other published data and easily searchable. In addition, the MTB offers a high degree of flexibility in the types of data that can be associated with the images.
Future Directions
The MTB will continue to increase the quantity and types of tumor-associated mouse data included in the MTB and enhance the representations of mouse tumors as models for human cancer. Currently planned new developments include integration of relevant mouse tumor gene expression data from the Gene Expression Omnibus and Array Express, complex trait data from the Collaborative Cross, 6 and connecting large-scale human cancer genomics data with corresponding mouse cancer models. In addition, new graphical interfaces for viewing and querying tumor-associated genome-wide mutation data, QTLs, and sequence data are being developed.
Conclusions
The incredible explosion of both the volume and variety of data in science today has made the development of new data-search and integration tools vitally important. The MTB is designed to collect, store, and integrate a wide variety of data on mouse tumor development and human cancer models and make these data freely available to the scientific community in an easily searchable form. A key component is the inclusion of mouse pathology diagnostic descriptions and images and supporting data on tumor incidence, genetics, treatment, and frequency. In conclusion, the MTB provides the most comprehensive source of mouse tumor data available, the highest level of data curation, and multiple search forms allowing data queries from many different scientific perspectives, for example, searches using gene, strain, or phenotype terms. The MTB also serves as a repository for data from the pathology community; for example, the MTB stores images from the Jackson Aging Center and Pathbase and is working with external scientists to store details on antibodies used to study mouse cancer models. The MTB enables mouse pathologists to present their tumor data to the scientific community in a way that will place these data in the wider context of genetic and molecular data, which also enables community access to key pathology data connected to tumor diagnoses and outcomes that are otherwise unavailable for scientific analysis.
Footnotes
The authors declared that they received no commercial financial support for their research and/or authorship of this article.
This work was supported in part by the National Institute of Health (CA089713, CA034196, HG000330, and RR017436) and The Ellison Medical Foundation.
