Abstract
Sixteen longSAGE libraries from four different clinical stages of cervical intraepithelial neoplasia have enabled us to identify novel cell-surface biomarkers indicative of CIN stage. By comparing gene expression profiles of cervical tissue at early and advanced stages of CIN, several genes are identified to be novel genetic markers. We present fifty-six cell-surface gene products differentially expressed during progression of CIN. These cell surface proteins are being examined to establish their capacity for optical contrast agent binding. Contrast agent visualization will allow real-time assessment of the physiological state of the disease process bringing vast benefit to cancer care. The data discussed in this publication have been submitted to NCBIs Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) and are accessible through GEO Series accession number GSE6252.
Introduction
Clinical diagnosis of most cancers and their precursors is predominantly based on phenotypic markers such as appearance of cell nuclei. Classification and staging of disease is determined by evaluation of gross structural features, such as extent of local tumour invasion and presence of disease in other organs. It is now established that cancer arises as a result of successive genetic changes altering cellular processes including growth, angiogenesis, senescence, and apoptosis (Hanahan and Weinburg, 2000). Additionally, many cancers appear to have active inflammation and wound healing mechanisms (Chang et al. 2004). Proteins taking part in these cellular mechanisms are often strong candidates for biomarkers and molecular targets.
Cervical cancer is usually the result of a human papillomavirus (HPV) infection which initiates neoplastic progression mainly through viral oncoproteins E6 and E7 within the cervical transformation zone at the squamous/columnar junction. The role of HPV to the pathogenesis of cervical cancer has been addressed in recent reviews (zur Hausen, 2002; Woodman et al. 2007). Many HPV types produce only productive lesions following infection and are not associated with human cancers. In such lesions, the expression of viral gene products is carefully regulated, with viral proteins being produced at defined times and at regulated levels as the infected cell migrates towards the epithelial surface. The events that lead to viral synthesis in the upper epithelial layers appear common to both the low- and high-risk HPV types. Virus-induced cancers most often arise at sites where productive infection cannot be suitably supported. Productive infection can be divided into distinct phases, with different viral proteins playing specific roles (Doorbar, 2006). Upon infection, normal cells gradually advance through stages of cervical intraepithelial neoplasia (CIN). Mild dysplasia (CINI) presents as only a subset of the low third of the epithelium appearing dysplastic, moderate dysplasia (CINII) occurs where the dysplastic cells involve about one-half of the thickness of the epithelium of the cervix, and severe dysplasia (CINIII), or carcinoma-in-situ, is described as the condition where the entire thickness of the epithelium is disordered but the abnormal cells have not yet spread below the surface. If carcinoma-in-situ is not treated, it will often grow into an invasive cervical cancer. High grade dysplasia is considered the most advanced dysplasia with atypical changes in many of the cells and a very abnormal growth pattern of the glands; some of the glands are branching or budding. More than 50% of the cells have large, spotted nuclei and are frequently dividing while the cellular cytoplasm is reduced and looks abnormal. Cancer of the cervix was one of the most common causes of cancer death for American women, but between 1955 and 1992 the number of cervical cancer deaths in the United States dropped by 74% due to the introduction of the Pap test (Papanicolaou and Traut, 1943). Death rates from cervical cancer continue to decline by nearly 4% per year. Even so, the American Cancer Society reports that in 2006, about 3,700 of the 9,710 women diagnosed with cervical cancer in the United States have died from this disease. HPV infection causes changes in expression levels of a wide variety of genes (Yim and Park, 2006). These differences in gene expression between pre-invasive neoplastic and non-neoplastic tissue give clues to the molecular basis of cancer. Early detection of cervical cancer based on molecular characterization would be clinically advantageous; risk of neoplastic lesion progression could be predicted and response to therapy could be monitored in real time at a molecular level. To monitor molecular characterisation of cancer it follows that the ability to optically image in real-time the molecular features of cancer in vivo is critical (González et al. 1999; Rajadhyaksha et al. 1999; White et al. 1999; Huzaira et al. 2001; Langley et al. 2001; Selkin et al. 2001; Collier et al. 2002) and requires safe, molecular-specific contrast agents whose images can be monitored rapidly and non-invasively during their uptake and distribution. The analysis presented here evaluates serial analysis of gene expression (SAGE) libraries to identify novel, cell-surface gene products. Upon mapping of highly differentially expressed SAGE tags to their corresponding genes, the gene products are candidates for antibody testing and optical contrast agent development.
Contrast Agents and Optical Imaging
Short of prevention, improved early stage cancer diagnosis would provide the greatest benefit for cancer patients. Because proteins may regulate gene expression, ligand-binding properties, molecular structure and dynamics on a temporal basis, protein biomarkers have a significant impact in cancer detection and therapy as therapies are becoming targeted to specific signal transduction and metabolic pathways. For example, breast cancers respond to HERCEPTIN (trastuzumab) if the tumor over-expresses Her-2/neu (Baselga et al. 2004; Ross et al. 2004). In the same way, GLEEVEC (imatinib) is most effective against cancers carrying the bcr-Abl translocation (Druker, 2004) and targeted molecular cancer therapy is already used successfully for the eradication of acute leukaemia (Frater et al. 2003; Yee and Keating, 2003). These examples imply that it will be important to produce biomarkers for all stages of cancer. Reliable diagnostics such as DNA screening and immunocytochemical analysis of known cervical neoplasia biomarkers p16INK4A and minichromosome maintenance (MCM) proteins are not implemented in vivo. Real-time biomarkers of the physiological state of the disease process or markers representative of treatment efficacy will bring immeasurable benefit to cancer care in terms of individualized agent selection and dosing. Furthermore, series of agents could be tested to determine empirically the localization of cancer and/or the most effective therapy.
Routine clinical cancer detection employs non-specific contrast agents such as acetic acid which enhance the nuclear backscattering but are limited by small signal magnitude. The field of molecular imaging is rapidly developing imaging agents with high affinity and specificity for targeted biomarkers. These new agents allow for the possibility of disease detection earlier than is currently feasible (Weissleder, 2001; Jaffer and Weissleder, 2005). For example, cancer metastases missed by conventional anatomically based imaging methods may be detected in patients by molecular imaging (Harisinghani et al. 2003). Optical imaging of tissue can be carried out non-invasively in real time, giving high spatial resolution (< 1 μm lateral resolution). A number of optical techniques have been established including confocal microscopy (White et al. 1999; Collier et al. 2005), multispectral fluorescence imaging (Andersson-Engels et al. 1997; Ferris et al. 2001), reflectance spectroscopy with polarised and unpolarised light (Sokolov et al. 1999, 2002; Utzinger et al. 2001), multispectral reflectance imaging with polarised and unpolarised light (Ferris et al. 2001; Gurjar et al. 2001), and fluorescence spectroscopy (Gillenwater et al. 1998; Wagnières et al. 1998; Ramanujam, 2000; Sokolov et al. 2002). Together with emerging molecular tools (e.g. DNA screening, tissue proteomic and serum markers), biomarker imaging may soon be used for real-time screening, diagnosis, and detection of disease recurrence and progression (Rudin and Weissleder, 2003).
Contrast agents consist of a biomarker specific probe molecule, such as an antibody, conjugated to an optically suitable label. By topically applying molecular specific contrast agents to tissues, the scope of molecular changes that can be probed using optical imaging is significantly enhanced. Presently, contrast agents based on metal nanoparticles, organic fluorescent dyes, and quantum dots coupled to monoclonal antibodies against cancer specific biomarkers are being developed (Sokolov et al. 2003; Rahman et al. 2005).
SAGE Libraries and Tag Mapping
The SAGE technique is capable of producing a molecular representation of cervical tissue based on expressed genes. SAGE is not dependent on pre-existing databases of expressed genes and so provides an independent view of gene expression profiles within the mRNA populations (Velculescu et al. 1997). SAGE library construction is well documented in the literature (Velculescu et al. 1995 and 1997; Madden et al. 2000; Saha et al. 2002; Pleasance et al. 2003; Sander et al. 2005). Several recent gene expression profiles of in vitro HPV-infected cultured keratinocytes and from cervical carcinoma clinical samples have proposed changes in gene expression induced by HPV and in early cervical carcinomas (Thomas et al. 2001; Ruutu et al. 2002; Duffy et al. 2003; Pérez-Plasencia et al. 2005). Some studies have compared normal versus tumor-induced gene expression in cervical samples with the aim of identifying potential tumor markers of clinical value (Shim et al. 1998; Chen et al. 2003).
To identify genes expressed at dissimilar levels in preinvasive neoplastic and non-neoplastic untyped cervical tissue, we analysed sixteen longSAGE libraries; 4 from normal cervical tissue samples, 3 of a mild dysplasia (CINI), 3 of moderate dysplasia (CINII), and 6 of severe dysplasia (CINIII), or carcinoma-in-situ. The CIN tissues are positive for MUC16. Raw numbers of longSAGE tags generated and library names are given in Tables 1 and 2. DiscoverySpace (Robertson et al. 2007), an in-house graphical software application backed by a relational database system designed to support SAGE gene expression analysis, was used to query data from over 25 publicly available data sources, as well as internal experimental results. Using DiscoverySpace, selected SAGE tag sequences were mapped to counterpart RefSeq (Pruitt et al. 2000, 2005) genes and confirmed using SAGE tag co-ordinates to establish gene identity through Ensembl (Hubbard et al. 2007; homo_sapiens_core_41_36c). Genes were manually curated (EntrezGene) to ascertain gene identity and gene product localisation. These cervical longSAGE libraries were created from the epithelium of cervical biopsy samples collected just prior to LEEP (Loop Electrosurgical Excision Procedure). Tissue samples were placed into RNAIater and frozen at –80 °C within 10 minutes of being excised from the patient. These longSAGE libraries (Shadeo et al. 2007) have been submitted to the NCBI Gene Expression Omnibus (GEO) repository.
Highly expressed genes with membrane-bound gene products up-regulated in cervical dysplasia stages CINI and CINIII. Up-regulated gene expression, from normal, ≥2-fold.
GEO Series Accession Numbers GSE6252. 1,101,702 total longSAGE tags in four normal libraries, GEO Alias N3, N1, N2, N4; 2,165,777 total longSAGE tags in six CINIII libraries, GEO Alias C1, C3, C2, C4, C5, C6; 785,642 total longSAGE tags in three CINI libraries, GEO Alias M1, M2, M3.
Mapping unable to be confirmed through ENSEMBL or BLAT.
Highly expressed genes with membrane-bound gene products up-regulated in normal cervical tissue. Up-regulated gene expression, from CINI and/or CINIII, ≥2-fold.
GEO Series Accession Number GSE6252. 1,101,702 total longSAGE tags in four normal libraries, GEO Alias N3, N1, N2, N4; 2,165,777 total longSAGE tags in six CINIII libraries, GEO Alias C1, C3, C2, C4, C5, C6; 785,642 total longSAGE tags in three CINI libraries, GEO Alias M1, M2, M3.
Any protein differentially expressed in cancer tissue, compared to normal tissue, or any protein known to be involved in cancer development, has potential as a candidate cancer biomarker. Genes presenting properties which identify them as likely targets for cancer diagnosis or prognosis must be separated from thousands of other genes which also may also possess clinical potential. Hundreds of potential candidates must be set aside in favour of gene products which offer the most promising characteristics. We focus on genes encoding membrane associated proteins because membrane-bound proteins are most likely to be accessible to topical application of contrast agents and have a rapid time frame for contrast agent visualization. Genes expressing the greatest number of tags combined with high levels of differential expression between dysplastic and normal tissue are the most likely to be observed by contrast agents in vivo.
For optical imaging of tumors by topical application in vivo of contrast agents to be of practical use, a large number of contrast agent receptors are required. One of the standard methods to detect candidate biomarkers is to identify genes with amplified expression in cancer and/or normal tissues. We compared transcription profiles and retained the most highly expressed membrane-bound gene products whose differential expression level is greater than two-fold. Many cell-surface proteins can potentially be developed as targets for optical contrast agents. Using longSAGE, we have also identified the most highly differentially expressed transcripts between disease and normal tissue. Table 1 specifies those genes (with cell-surface gene products) up-regulated in the CINI and CINIII stages of dysplasia and Table 2 lists genes up-regulated in normal tissue. Short descriptions of these protein biomarkers are given in the appendix and annotations of protein structural information, if available, are included. Given that CINII is difficult to determine clinically, it was not included in these comparisons. Contrast agent visualization of the epidermal growth factor receptor (EGFR) using an anti-EGFR monoclonal antibody has already been successful (Rahman et al. 2005). More of these markers should prove amenable to contrast agent development and topical formulations consisting of a range of contrast agents could help adjust to individual patient differences in gene expression.
Cervical Intraepithelial Neoplasia Stage Biomarkers
It is possible to evaluate a marker for presence or absence, but to correlate a marker or array of markers to changes in cellular localization relative to other markers is probably the most interesting and beneficial in terms of dysplasia progression, environment, therapy selection, and follow-up. The known function of these genes grants some insight into the biology of cervical neoplasia. For instance, several of these cell surface markers are involved in transport and/or signaling. MUCX and CD74, upregulated in CINI and CINIII, have signaling gene products known to be associated with carcinomas. CD74 is also known to be a high affinity binding protein for macrophage migration-inhibitory factor (MIF) which is implicated in tumor cell growth and angiogenesis. TSPAN1, upregulated in CINIII almost 10-fold, also plays a role in cell motility and growth. See Appendix for gene-specific references.
Our analysis of cervical cancer longSAGE expression profiles direct attention to some genes with relatively equal distribution in CINI and CINIII, such as PIGR. Another marker, ANPEP, is present at significantly different expression levels in CINI and CINIII. This knowledge expands the possibilities for rapid visualization between normal and stages of dysplasia in vivo. As discussed earlier, these cell surface targets were found by identifying differentially expressed genes. More often than not, a highly expressed tag is not localised to the cell surface and, for these purposes, does not warrant further attention. However, a highly differentially expressed gene whose gene product is not membrane-bound is sometimes found to be part of a mechanism which affects the cell surface and thereby the gene product becomes of potential use. TFF3 (trefoil factor 3), for instance, is up-regulated 13-fold in CINIII and 27-fold in CINI. Members of the trefoil family are characterised by having at least one copy of the trefoil motif, a 40-amino acid domain that contains three conserved disulphides. They are stable, secretory proteins whose functions are not defined but may protect the mucosa from insults, stabilize the mucus layer and affect healing of the epithelium. VANGL1 (Van Gogh-like protein 1) is an integral membrane protein which is serine/threonine phosphorylated and translocated to cytoplasmic vesicles in response to TFF3 stimulation (Kalabis et al. 2006). VANGL1 protein acts as a downstream effector of TFF3 signalling and regulates wound healing of intestinal epithelium. TFF3 is commonly expressed in hepatocellular carcinoma and its expression correlates with tumor grade (Khoury et al. 2005). TFF3 overexpression may be a critical process in mouse and human hepatocellular carcinogenesis (Okada et al. 2005). The group of trefoil factor peptides (TFF1–3) are part of the protective mechanism operating in the intestinal mucosa and play a fundamental role in epithelial protection, repair, and restitution (Vieten et al. 2005). TFF3 and the essential tumor angiogenesis regulator VEGF exert potent pro-invasive activity through STAT3 signalling in human colorectal cancer cells (Rivat et al. 2005). That VANGL1 returned to cell membranes within 45 minutes of TFF3 stimulation (Kalabis et al. 2006) could explain the low VANGL1 tag counts, 1–3 tags per library, observed in the longSAGE libraries.
Conclusions
Molecular specific contrast agents may provide the ability to directly image the cancer process; but biomarker discovery can be a lengthy process as candidate markers suitable for the task-at-hand must be identified from among thousands of proteins. SAGE-identified biomarkers hold promise for recognition of the stages of neoplasia by proteomic patterns. Optical contrast agents bound to these membrane-bound protein biomarkers will serve as a complement to histopathology, thus allowing more effective determination of tumor borders and non-invasive observation of response to treatment at a molecular level. We present fifty-six cell-surface gene products differentially expressed during progression of cervical intraepithelial neoplasia. Differential gene expression of these biomarkers will allow individualized selection of therapeutic combinations that best target the entire disease-protein system and advance understanding of carcinogenesis.
Footnotes
Acknowledgments
This work was supported by a grant from the National Institutes of Health (NIH/NCI 1R01-CA103830–01) and by the British Columbia Cancer Foundation. MA Marra and SJM Jones are Scholars of the Michael Smith Foundation for Health Research and MA Marra is a Terry Fox Young Investigator of the National Cancer Institute of Canada. We also thank R Varhol, BCGSC, for tag mapping verficiation.
