Abstract
Objectives
To develop a Web-based tool to draw attention to patients positive for human papillomavirus (HPV) who have a high risk of progression to cervical cancer, in order to increase compliance with follow-up examinations and facilitate good doctor–patient communication.
Methods
Records were retrospectively analysed from women who were positive for HPV on initial testing (before any treatment). Information concerning age, Papanicolaou (PAP) smear result and presence of 15 high-risk HPV genotypes was used in a support vector machine (SVM) model, to identify the patient features that maximally contributed to progression to high-risk cervical lesions.
Results
Data from 731 subjects were analysed. The maximum number of correct cancer predictions was seen when four features (PAP, HPV16, HPV52 and HPV35) were used, giving an accuracy of 74.41%. A web-based high-risk cervical lesion prediction application tool was developed using the SVM model results.
Conclusions
Use of the web-based prediction tool may help to increase patient compliance with physician advice, and may heighten awareness of the significance of regular follow-up HPV examinations for the prevention of cervical cancer, in Korean women predicted to have heightened risk of the disease.
Keywords
Introduction
Cervical cancer is the second most common form of cancer in women worldwide. 1 The development of cervical cancer requires the persistent presence of human papillomavirus (HPV). 2 To date, ∼120 HPV types have been identified and classified according to oncogenicity into high-risk types (such as HPV16 and HPV18), and low-risk types (such as HPV6, HPV11 and HPV40).2,3 The most common high-risk types worldwide are HPV16 (∼57% of cases) and HPV18 (∼16% of cases); however, the proportions of invasive cervical cancers related to particular HPV types vary, depending on geographical location. 4 Not all HPV infections progress to cervical cancer; it has been reported that ∼90% of HPV infections resolve within 2 years of identification. 5 However, it remains unclear why HPV infections resolve in some cases and not in others; individual differences in susceptibility may be contributing factors. 6 At present, surgical treatments (such as conization or loop electrosurgical excision) are the main treatment options for patients with pre-cancerous conditions or early stage cancers.7,8
In the Republic of Korea, efforts by national and private health organizations have helped to increase the rate of cervical cancer screening to 52.5% in 2010: 9 a positive change that may, in due course, deliver a similar decline in cervical cancer death rates in the Republic as has been seen in the USA. 10 HPV DNA tests may be used as primary screening tests, in conjunction with Papanicolaou (PAP) smears; a number of studies have shown that HPV DNA tests have greater sensitivity for the detection of cervical intraepithelial neoplasia than PAP smear tests alone.11–13
Not all women undergo screening tests for cervical cancer, and not all women attend regular check-ups after the detection of abnormal epithelial lesions. In view of the long precancerous stage of cervical cancer (which can be ≥10 years), more frequent follow-up examinations can enable precancerous lesions to be detected at an earlier stage. In the Republic of Korea, irregular screening (due to cultural reluctance or resistance and poor comprehension of the need for regular screening or follow-up) leads to the development of cervical cancers that could have been avoided.
In doctor–patient communication, a patient’s need for information is particularly great when cancer is involved. Almost half of cancer patients reported that no information had been given to them on how to deal with their disease. 14 This knowledge gap can widen in time-constrained environments such as secondary and tertiary hospitals, where each patient encounter is hardly long enough to ensure patient satisfaction. This, in turn, can influence patient compliance with physician advice. 15
The goal of the present study was to develop an intuitive, web-based, visual application that can be used during patient–doctor encounters to quickly draw attention to those patients with a high risk of progression to cervical cancer. Use of the tool should help increase patient compliance with physician advice, and should also heighten awareness of the significance of regular follow-up HPV examinations. The application aims to visualize potential progression or regression of HPV infection, according to patient genotype and PAP test results, based on a support vector machine model.
Subjects and methods
Subjects
Records were retrospectively analysed from women who had undergone HPV DNA tests at Bucheon St Mary’s Hospital, Catholic University of Korea, Bucheon, Republic of Korea, during the period between 1 January 2006 and 31 December 2012; the women were HPV positive before any treatment was administered and their HPV genotype did not change during follow-up. If a cervical lesion suspicious of HPV infection was seen on colposcopic examination, HPV testing was performed. If testing proved to be HPV-negative and the lesion persisted in the immediate follow-up period, another HPV test was recommended. Subjects were followed up with colposcopic examination, PAP smear and HPV test every 3 months; all patients had an initial examination and more than one follow-up check. Subjects were included in the present study if they had undergone HPV DNA chip tests and had PAP smear or biopsy results available. Biopsies were performed if pathology worse than a low-grade squamous intraepithelial lesion was detected on PAP smear. A second HPV test was carried out in order to confirm the clearance of HPV infection after treatment for cervical intraepithelial neoplasia (CIN).
For each subject, their age when first tested for HPV, HPV, PAP smear and biopsy results, and the duration of follow-up were recorded. For the purposes of the present study, a high-risk lesion was defined as high-grade squamous intraepithelial lesion (HSIL), atypical squamous cells that cannot exclude HSIL, carcinoma in situ, squamous cell carcinoma or adenosquamous carcinoma, atypical glandular cells-favour neoplasia, adenocarcinoma or neuroendocrine carcinoma on PAP smear, or cervical intraepithelial neoplasia (CIN) 2, CIN3, carcinoma in situ or invasive carcinoma on biopsy.
Due to the retrospective nature of the study, the need for ethical approval and for patient consent was waived by the Catholic Medical Centre Institutional Review Board.
HPV testing
The HPV tests were performed using the HPVDNAChip kit (BioMedLab, Seoul, Republic of Korea) as previously described. 16 Using a sterilized vaginal speculum, cervical specimens were collected during colposcopic examination by inserting a cytobrush attached to an HPVDNAChip Sampler (BioMedLab) into the endocervical and exocervical areas. After removal, the cytobrush was placed into transport medium (2 ml phosphate buffered saline) and immediately refrigerated at 4℃ for ≤72 h before analysis. DNA extracted in compliance with the manufacturer's instructions was immediately used or stored at 4℃; samples kept for >24 h were stored at −70℃ until analysis. The HPVDNAChip kit is able to identify 22 HPV genotypes: 15 high-risk types (HPV16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, 68 and 69) and seven low-risk types (HPV6, 11, 34, 40, 42, 43 and 44).
Support vector machine model
The support vector machine (SVM), developed by Vapnik, 17 has been used to model various biomedical situations, including protein–protein interaction predictions,18,19 gene expression data analysis 20 and decision support in the diagnosis of heart valve disease. 21 The SVM classifies data into two categories of membership by constructing a high dimensional hyperplane, while being optimized to maximize the margin that best divides membership. For the present study, an SVM model was developed using LibSVM (http://weka.wikispaces.com/LibSVM) with the radial basis function as a kernel function. A total of 17 features were used to build the model: age, PAP result and the 15 high-risk HPV genotypes. Each subject in the study population was classified by the model as either regression (ω1) or progression to cancer (ω2). To identify features that deliver the highest classification performance of the model, a forward feature selection process was used. 22 The model focused on reducing the rates of false-positive and false-negative results.
Statistical analyses
Fisher’s exact test and χ2-test were used to analyse relationships between variables. For two-group comparisons, two sample t-tests (unpaired t-tests) were used for parametric analyses and Wilcoxon’s rank sum test for nonparametric analyses. For comparisons of more than two groups, analysis of variance was used for parametric analyses and the Kruskal–Wallis test for nonparametric analyses. A P-value < 0.05 was considered to be statistically significant. All statistical analyses were performed using SAS® software, version 9.2 (SAS Institute, Cary, NC, USA).
Results
A total of 731 subjects were included in the study: 75 (10.26%) were aged 10–29 years; 169 (23.12%) were aged 30–39 years; 290 (39.67%) were aged 40–49 years; 123 (16.83%) were aged 50–59 years; 74 (10.12%) were aged ≥60 years. A total of 310 subjects (42.41%) were diagnosed with high-risk lesions at the time of the initial HPV test. The remaining 421 subjects (57.59%) did not have high-risk lesions when initially tested for HPV; of these, 35 (4.79%) subsequently developed high-risk lesions and 386 (52.80%) had either regressive or persistent HPV infection, but did not develop high-risk lesions.
The 310 subjects diagnosed with high-risk lesions at the initial examination had a follow-up period of < 6 months. Of the remaining 421 subjects, 251 (59.62%) had a follow-up period of < 6 months, 102 (24.23%) had a follow-up period of ≥ 6 months but < 2 years and 68 (16.15%) had a follow-up period of ≥ 2 years.
Relationship between age and high-risk lesions
Of the 197 subjects aged ≥50 years, 114 (57.87%) progressed to high-risk lesions, compared with 231 out of 534 subjects aged <50 years (43.26%). This difference was statistically significant (P = 0.0004, χ2-test). Logistic regression analysis gave an odds ratio of 1.023; however, this may not indicate a linear increase with age.
PAP smear and biopsy results
Papanicolaou (PAP) smear and biopsy results in subjects positive for human papillomavirus.
High-grade squamous intraepithelial lesion (HSIL), atypical squamous cells that cannot exclude HSIL, carcinoma in situ, squamous cell carcinoma, adenosquamous carcinoma, atypical glandular cells-favour neoplasia, adenocarcinoma or neuroendocrine carcinoma.
Cervical intraepithelial neoplasia (CIN) 2, CIN3, carcinoma in situ or invasive carcinoma.
Relationship between genotype and high-risk lesions
The seven most frequently occurring genotypes, in decreasing order of occurrence, were: HPV16 (n = 207; 28.32%); HPV52 (n = 103; 14.09%); HPV58 (n = 90; 12.31%); HPV51 (n = 65; 8.89%); HPV56 (n = 57; 7.80%); HPV18 (n = 51; 6.98%); HPV35 (n = 40; 5.47%). A total of 594 subjects (81.26%) were infected with a single genotype; 137 subjects (18.74%) were infected with multiple genotypes. HPV infection regressed in 312 subjects (42.68%) and persisted in 419 subjects (57.32%).
The seven most frequently occurring genotypes in subjects (n = 731)positive for human papillomavirus (HPV), classified according to the development of high-risk lesions.
Data presented as n(%) of subjects.
Group I, subjects with benign lesions at initial HPV test who subsequently developed high-risk lesions; group II, subjects with regressive or persistent HPV infection who did not develop high-risk lesions; group III, subjects diagnosed with high-risk lesions at initial HPV test.
P < 0.05; Fisher’s exact test.
Specificity, sensitivity and negative and positive predictive values for the development of high-risk lesions for the seven most frequently occurring genotypes in subjects positive for human papillomavirus (HPV).
Data presented as n subjects affected/total n subjects (%).
SVM model and high-risk cervical lesion prediction tool
Support vector machine model construction using age, Papanicolaou smear result (PAP) and 15 high-risk genotypes in subjects positive for human papillomavirus (HPV): at each selection step a further feature is added and the predictive accuracy of the steps up to that point is calculated.
FSS, forward selection step; AF, added feature.
A web-based high-risk cervical lesion prediction application was developed using the SVM model; this is accessible at http://147.46.70.245:8181/HPV_Prediction/. The application interactively displays a chart that shows the probability of progression to high-risk lesions depending on feature values selected by the user. The high-risk lesion-based SVM model uses a set of four classification features: PAP smear result, HPV16, HPV35 and HPV52. Each genotype is recorded as negative (1) or positive (2). The application supports Ajax-based client–server interaction, and the Google® Web Toolkit (http://www.gwtproject.org) was used to develop the user interface. The charting library HighCharts (http://www.highcharts.com) was used to produce the interactive charts.
Discussion
In the present study the SVM model identified four features (PAP smear result, HPV16, HPV35 and HPV52) that best classified subjects using available data (age, HPV genotype and PAP smear result). As further datasets become available, the model and web application has the potential to dynamically reflect any changes by adjusting the classification features.
In clinical practice it is very difficult to enforce strict adherence to follow-up examinations, and subjects in the present study were no exception to this. In addition, PAP results are subjective in nature. The current application focused on developing a visual consultation tool from available data to help facilitate patient understanding during the doctor–patient encounter, and emphasize the potential for cervical high-risk lesion development.
Age has a known relationship with cervical cancer development, but is not a sole indicator. 23 Data on the social and health care settings of subjects, such as sexual behaviour and the absence of screening, would have made the role of age more meaningful in the present study.
It is known that the longer the HPV infection has been present, the higher the chance of progression to a high-risk lesion. 2 In practice it is difficult to know when a patient first became infected with HPV, and therefore it is also difficult to know how long the infection has lasted. In the present study, data on follow-up periods from the time when subjects were first tested and found to be HPV positive were available, but were likely to have marginal value in predicting progression, and so were not included in the SVM model.
Individual susceptibility factors (such as host-genome variability, lifestyle, and infection with Chlamydia trachomatis or HIV) 2 have been shown to be strong determinants in relation to the development of cervical high-risk lesions. 2 Due to the limitations of outpatient settings in the Republic of Korea, information about host genome variability and lifestyle was rarely documented in the subject records used for the present study. Co-infection with Chlamydia trachomatis was recorded in six subjects in the present study; however, since very few people are required to undergo Chlamydia tests within the current national health programme, such data are of little statistical value and were not analysed further. No subjects in the present study were co-infected with HIV. The effect of co-infection in the prediction of cancer progression requires further study and analysis.
The prediction application tool developed in the present study is a systematic attempt to display the risk of progression to cervical high-risk lesions, with the goal of increasing informed compliance from patients, with respect to regular follow-up examinations. This tool is not a comprehensive source of information for the patient but is rather a means to increase patient awareness about high-risk HPV genotypes. It must be used in conjunction with expert medical advice, derived from clinical experience and up-to-date knowledge of cervical cancer.
Footnotes
Declaration of conflicting interest
The authors declare that there are no conflicts of interest.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
