Sage Journals: Discover world-class research

Abstract

This work examines the feasibility and implementation of information service-orientated architecture (ISOA) on an emergent literature domain of human papillomavirus, head and neck cancer, and imaging. From this work, we examine the impact of cancer informatics and generate a full set of summarizing clinical pearls. Additionally, we describe how such an ISOA creates potential benefits in informatics education, enhancing utility for creating enduring digital content in this clinical domain.

Keywords

literature management human papillomavirus head and neck cancer imaging cancer informatics

Introduction

Worldwide, head and neck squamous cell carcinoma (HNSCC) is the sixth most common type of cancer.¹ As last reported in 2010, there is an estimated 600,000 cases of HNSCC reported each year around the globe.² Unfortunately, only 40–50% of these patients will survive more than five years.³ Based on epidemiological projections, head and neck human papillomavirus-positive (HPV+) cancers are estimated to comprise over 25% of all known HNSCC.⁴ These data suggest that approximately 150,000 new cases are affected by HPV+ head and neck cancer around the world each year. Currently, a subset of HNSCC, oropharyngeal squamous cell carcinoma, accounts for approximately 10% of all HNSCC worldwide.²

In the United States alone, HPV+ cancers represent around 63% of all oropharyngeal cancers.⁵ However, some reports also speculate that HPV and HNSCC could be as large as 90% in some demographics.² There are approximately 40,250 new cases of cancer of the oral cavity and pharynx diagnosed each year, as reported by the American Cancer Society in 2012.⁶ Additionally, according to the North American Association of Central Cancer Registries and from data collected by the National Cancer Institute's SEER program and the CDC's National Cancer Registries during the 1999–2008 period, the incidence rates of HPV+ oropharyngeal cancer have increased dramatically by 4.4% per year, particularly among white males.⁶

In recent publications, it has been postulated that the overall decreasing incidence of oral cavity/pharyngeal cancer may reflect reductions in smoking and alcohol consumption among all race/sex groups. But decreases in smoking-related HNSCC are offset by the trend of rising HPV+ cancers among other demographics, which may reflect changes in sexual practices since the mid-1960s, particularly among white males.⁷ Mounting evidence suggests increased radiosensitivity of HPV+ HNSCC.⁸ Because of the changing demographics of HPV+ HNSCC, some clinical investigators, including our group, have now proposed that it is feasible to lower radiation dose in patients with HPV+ cases.

A phase II trial is currently underway that seeks to identify patients with HPV+ oropharyngeal cancer. Early evidence suggests that patients with HPV-associated HNSCC may not require the full radiation dose given in a standard regimen of Intensity-Modulated Radiation Therapy (IMRT).⁹ In this study, 46 patients had a complete clinical response with a lower-than-standard radiation dose, at 54 Gy,⁹ with standard treatment doses being 66–70 Gy.

A reduction in radiation dose could potentially reduce late-term morbidity of radiation therapy in some cases and could change the standard of treatment for patients with HPV+ HNSCC.⁸ Nevertheless, it is still too early to determine the correct dose to apply in radiation therapy or the appropriate imaging follow-up just from a single small study. Typical numbers for a phase II trial would be between 100 and 300 patients.¹⁰ Because of the small number of participants in the ECOG trial (N = 46), it appears that studying this population may be challenging. In an age where there is shrinking funding for clinical trials and research, we also believe that coordination, planning, and informatics support could help achieve greater effect in future clinical trials.

Recognizing challenges for informatics for HPV and head and neck cancer

We are inspired by several targets that, despite the attempts of some evidence-based services, remain unmet in the status quo.¹¹ These unmet challenges in literature management and informatics include the failure to recognize an information need, physicians not being prepared for where a particular field may be heading and a preference for choosing the most convenient clinical paths rather than most appropriate choices. Users are further hindered by the excessive time and effort required to find answers to appropriate questions from resources, the difficulty of navigating overwhelming bodies of literature, and the inadequacy of current literature search technologies.

Our goal is to construct a literature informatics management education system (LIMES) for implementation in an academic teaching hospital. Unfortunately, organization and knowledge collection, such as what we are attempting to do with our LIMES, are unfortunately often relegated to “Cinderella” roles, assumed to be automatic to perform and often difficult to measure. Thus, we have chosen to focus on a small but emergent literature set, for which our case includes the use and role of imaging for HPV head and neck cancers, which is the primary interest and expertise of our institution. The proposed LIMES not only archives the articles, but also employs a newly recommended technique to systematically summarize and categorize content in a digital repository that is semantically mineable.

Another compounding challenge is the organization of information across different departmental clinical teams (eg, diagnostic radiology, radiation therapy and otorhinolaryngology). Such coordination can currently take years to establish communication between different departments. However, communication among different specialties could potentially benefit from informatics tools. The idea is to lay down what is considered as known and what is not known in the current published literature on HPV-HNSCC imaging and have multiple disciplines represented in the literature management process. Because of physician time constraints, we also seek to potentially share the burden of search, summarization, categorization, and organization with medical trainees.

Unfortunately, to our knowledge, the study/discussion of data collection by individuals is often not, if ever, described in published literature. Yet, the process of reading and reviewing in research, as well as learning how best to do these things, is often a very cumbersome process. For example, Ely has described how there appears to be a trend of increasing time demands on clinicians and less time to find literature resources.¹² Coumou and Meijman have described systematic methods by which primary care physicians search for answers to clinical questions.¹³ They describe how physicians often first choose to consult colleagues whom they know well.¹³ This practice of asking colleagues has not changed much in recent years, despite the proliferation of electronic publications made available worldwide.

Physicians in the past had limited options for finding answers, sometimes reduced to only the use of an intermediary person such as a librarian.¹³ In this work, we propose, and have tested, a structured approach that generates clinical roadmaps and potential “pearls” from abstract information found in HPV head and neck cancer literature, using what we have called information service oriented architecture (ISOA). Essentially, we use an ISOA to break down the application functions of a large organization into a set of interoperable services. In our case, trainees, ie, medical school or premedical school students, will evaluate and structure the elements into our content management system (CMS) prior to review by attending physicians. We conjecture that this process will benefit trainee education.

Background on systems that reduce content with literature

Much of the focus on automated content extraction from literature management centers is on Natural Language Processing (NLP).^14,15 NLP methods are most effective when they have a wide body of open literature to train on. In medicine, a successful NLP-based program was able to examine electronic clinical notes in women with early-stage incident breast cancers and identify whether and when recurrences were diagnosed. This was performed on clinical notes from 1,472 patients in which the NLP-based system correctly identified 92% of recurrences.¹⁶ The area of medical informatics that implements NLP has strong roots in the domains of knowledge engineering and management. Knowledge engineering has been defined as the process of building tools for programing automation, including advances in machine learning, data analysis, rule induction, and analytic learning algorithms.¹⁷

In our ISOA, our focus has been on collection from trainees and oversight by the attending physician. We have considered four primary concerns: (1) content management, (2) classification, (3) summarization, and (4) evidence-based tools. However, because of the numerous effort and ubiquity of proposed automated tools, we aim to incorporate the use of automated tools once they can achieve consistent results that approach 90% precision and recall.¹⁸

CMSs are technologies that support the collection and management of published content. For literature, CMSs are used to store Adobe Systems Portable Document Format (PDF) files and potentially share this software. This includes both free and commercial packages such as Endnote, RefManager, Mendeley, Qiqqa, and Quosa among many more reference management software.¹⁹ The first two, Endnote and RefManager, have roots in desktop software and are purchasable by those who have financial resources. Mendeley, Qiqqa, and Quosa are supported by premium options, such as space, sharing requirements, and/or are advertisement driven, and have a concern in the social sharing and citing of articles. All the above systems are superb for the management of content (PDFs) and highlighting. These tools are designed mainly for managing large libraries of resources, but currently do not have sufficient summarizing and classification/indexing tools within and across articles that we seek to achieve in our LIMES.

In informatics, classification can usually be referred to as indexing.²⁰ Medline indexers are employees who are required to have at least a bachelor's degree in a biomedical science.²¹ A few university centers and the National Library of Medicine among others have put efforts into NLP tools for indexing. Hliaoutakis, in 2009, was able to demonstrate algorithmic improvements over the MMTX indexing package in speed and accuracy, using a C-value/NC-value method with the AMTex package.²² However, both these software programs rely heavily on the publically available packages, such as Metamap, Medline and other important UMLS libraries.¹⁵ Moreover, the NLM has been tasked to and continues to improve their own indexing tools²³ on a daily basis and continues to surpass many other indexing engines in the medical domain to our knowledge.

Text summarization (TS) methodology includes work by Luhn going as far back as 1958.²⁴ More recently, TS has been applied successfully to the medical field.^23,25,26 Some additional current research in biomedical domains includes investigative work using abstractive-oriented and extractive paradigms.²⁶

Presently, evidence- based clinical services such as “UpTo-Date” are all that attending physicians have available to them for current evidence-based reviews. These services provide clinical topical reviews written by recognized physicians who provide evidence-based reviews with recommendations. These services are purchasable.

Improving and sharing information in the clinical environment is vital for enhancing patient care. There continues to be daily proliferation of new journal articles and Internet digests that describe emergent health issues. Unfortunately, physicians and researchers have limited ways to ensure that they remain up to speed on this changing information. In the past, this was strictly done by browsing abstracts. However, current methods of updating their knowledge include Google searching and/or the use of evidence-based services. All the three methods (browsing abstracts, Google searching, and evidence-based services) while having some benefits for searching also have some pitfalls.

We have provided a limited review of the literature on automation, as it in large part motivates our design. If one could create a mineable LIMES service, then other clinical investigators could build upon that work. However, we are focused on the ISOA as a method to provide a systematic support for literature management. Automation, while having the potential of extreme benefit, has only partial or limited efficiency in terms of language processing; extracting medical problems from narrative text still continues to evolve. Regardless, human experts still have to rank and rate automation services as well as provide “training” data even to evaluate the efficacy of automation. We believe that the development and use of this ISOA can assist in the education of our medical trainees.

Methods

HPV-HNSCC is an emerging field that was brought to our attention by several of our attending physicians. Dr. Alleman is the Bob G. Eaton Chair in Radiological Sciences and is boarded by both the American Board of Radiology and American Board of Otolaryngology as well as serves on several national Radiation Therapy Oncology Group (RTOG) trials. Dr. Chance Matthiesen is an Assistant Professor of Medicine and Radiation Oncologist at the University of Oklahoma Health Sciences Center and serves as a scientific reviewer for various national journals. He presently serves on the RTOG Head and Neck Steering Committee. Gynecological cancer is a prime concern in Oklahoma, and the University of Oklahoma has five board-certified gynecologic oncologists and currently in a National Cancer Institute endorsed clinical trial site.

To obtain the medical articles for our research, we performed a search in PubMed using “HPV head and neck cancer” as our search terms, which gave us 2187 articles. Because we wanted to focus on the imaging of HPV+ HNSCC, we added “imaging” to our original search criteria and ended up with an initial of 22 articles that matched our search. From these 22 articles, only two articles had to be discarded. One of these articles was not currently available in English and the other focused on HPV+ head and neck cancer in animals. The 20 remaining articles were used to initiate our research in the late 2013. Note, however, that as of May 2014, performing the same search now produces 37 results that could be potentially useful for a second study.

Table 1

Approximate incidence of HPV+/- HNSCC/oropharyngeal cancer both worldwide and in the Us as estimated from publication reports.²

Worldwide cases of HNSCC (2010)	600,000
HPV+ HNSCC	150,000
HPV- HNSCC	450,000
U.S. cases of oropharyngeal cancer (2012)	40,250
HPV+ oropharyngeal cancer	25,358
HPV- oropharyngeal cancer	14,892

Our first goal was to find appropriate trainees because they were essential to the task. Medical students are extremely busy studying for board examinations and other tests, and there is very little time to work on research projects. Our team felt that, as time was limited, the LIMES system would be easier to implement with less-formalized training and could be performed more on-the-job. For example, learning MESH terms and controlled vocabulary would likely go beyond the time commitments possible for physicians to manage and potentially for the trainees to perform. Our early estimate was a time commitment of about 10–20 hours per attending physician and about a 20–40 hours commitment for students to perform this work. Both may seem to be small numbers for researchers, but in reality had to fit into an academic hospital schedule of practicing clinical physicians.

For this project, we employed one medical student and one pre-medical student. We assessed their ability to understand a selected set of 20 medical articles about HPV-HNSCC imaging. To evaluate the performance of the ISOA, we developed a simple instrument, as shown in Table 2. We defined an “item” in each cell of a spreadsheet (it represents one block of text). Additionally, topic accuracy (multi) is when there are multiple topics for a single “item”.

Table 2

Evaluation instrument used by physician mentors.

I. Item Questions
1. Item Information retrieval	Was the Information in the item* appropriate to extract/retrieve from the article? (note: does not cover feasibility/practically – see #2 below) Uninformative 1 2 3 4 5 Very Informative
2. Item Practically	Is the item applicable to current clinical situations? Not Currently Practical 1 2 3 4 5 Very Practical
3. Item extraction	Did the ‘curator’ reduce the material to an appropriate level? (note: consider the curator's derivation, inference, deduction) Inaccurate 1 2 3 4 5 Very Accurate
4. Item Accuracy	Does the description of the reduced text still represent the author's original idea (actual text)? (note: precision and representation stand in) Inaccurate 1 2 3 4 5 Very accurate
5. Item educational Value	Rank the educational value or benefit of this item: Not Educational 1 2 3 4 5 Very Educational
II. Topic Questions (Topic Subtopic as assigned by curator)
6. Category Accuracy	How accurately was the article Categorized (as the main topic) Inaccurate 1 2 3 4 5 Very accurate
7. Subcategory Accuracy	How accurately was the topic placed in the subcategory Inaccurate 1 2 3 4 5 Very accurate
8. Category Multi Accuracy	If item was placed in multiple categories (more than one topic) was this accurately done? (Yes, No, NA) If no, elaborate
9. Category Introduction (adding new category)	Did ‘trainee’ add a new topic to the knowledge tree? (Yes, no) If so, rank its value to the knowledge tree below: Unimportant 1 2 3 4 5 Very Important
10. Categorization Education Value	Is this topic useful for the education of a trainee? Uninformative 1 2 3 4 5 Very Informative
III. Article Questions (Article Value-relative to the bold)
11. Article Importance	Rank the overall level of the article Importance to the literature Top 20% 21–40%, middle 41–60%, 61–80%, Bottom 81–100%
12. Educational Value for this Article	Rank the overall level of the article's educational benefit to the student level of Importance Unimportant 1 2 3 4 5 Very Important

Note: The 12 questions are broken down into three different groupings: (1) item-level questions, (2) topic-level questions, and (3) article-level questions.

The trainees each created a chart for each of the 20 articles. Within each tab, the trainee placed information extracted from each article that they deemed to be of significance, as well as rewriting the text they extracted from the article into simpler terms and then categorizing this information using our Clinical Classification Schema (Fig. 1). The attending physicians “graded” the work of the trainees using these spreadsheets and the information provided in Table 2. This table represents the 12 specific questions the attending physicians used to “grade” the trainees’ work. The “items” that the table refers to are specific cells within the spreadsheets that relate to the 12 questions the attending physicians used for their grading.

Figure 1.

Our Clinical Classification Schema, which was used by trainees to classify the items they extracted from the 20 articles into the proper place.

The 12 questions are broken down into three different categories: (1) item-level questions, (2) topic-level questions, and (3) article-level questions. Item questions refer to the specific items trainees put into the cells. This could be the actual text that was extracted directly from the article or the condensed version of that same text.

Topic questions relate each row of text the trainee inputted on a classification level. The goal of this level was to classify the articles correctly. The attending physicians graded the trainees in the five classification areas: classification accuracy, sub-classification accuracy, multiple classification accuracy, classification introduction (adding a new classification to our Clinical Classification Schema), and classification educational value. The final category of questions is the article questions. These questions relate to each article, specifically determining their importance toward improving the knowledge of HPV-HNSCC literature and the educational benefit for trainees. Finally, for the entire project, attending physicians would then select the most relevant clinical pearls²⁷ among the abstracted text from both an instructional and those having informatics value. The goal of this work was to present clinical “pearls” that could be used to help future clinicians/researchers follow the field.

Results

Our clinical “Pearls” represent the data that were extracted from the 20 original articles and found to be important by the trainees. The attending physicians, when reviewing the articles, narrowed these “Pearls” into a smaller expert opinion “Pearl” list. This list is shown in Table 3. These “expert Pearls” represent key information that is deemed helpful by physicians for other physicians toward the demographics, role of imaging, and key therapeutic information.

Table 3

Clinical pearls generated from this LIMES study.

RADIOLOGIST (AND FORMER OTORHINOGOLOGY SURGEON) CLINICAL PEARLS	RADIATION ONCOLOGIST CLINICAL PEARLS
ROLE of IMAGING	ROLE OF IMAGING
HPV-positivity is associated with cystic or necrotic lymph node metastases on imaging, and imaging may allow for determination of HPV status and prognosis (Corey 2012).	Older techniques such as ultrasound and CT are becoming less popular in favor of the more sensitive and specific PET/CT scan (Corey 2012) & (Hamoir 2012).the proper incorporation of imaging is essential to ensure accurate staging (Corey 2012).
Cystic lymph node metastases are frequently associated with HPV-positive squamous cell carcinoma with primary tumor located in Waldeyer's ring. These lymph nodes have a characteristic appearance on imaging that distinguishes them from solid metastases with necrotic degeneration. This information is especially helpful when searching for the primary tumor when the cystic lymph node is the only presenting symptom (Goldenberg 2008). Cystic lymph nodes are associated with HPV head and neck squamous cell carcinoma and have a distinct appearance on imaging. Contrast-enhanced PET/CT offers the greatest accuracy in the n staging of head and neck squamous cell.	MRI can also be incorporated, but is debatable as to the true benefit in all cases (Hamoir 2012). Imaging consistently shows bulky large and multiloculated lymph nodes in the neck (Goldenburg 2008). Most of these tumors are centered in the oropharynx, or base of tongue and tonsil, but can also be found in the oral cavity, nasopharynx, and larynx (Corey 2012) & (Strojan 2013). However with reduced frequency in the larynx and nasopharynx.
HPV-positive tumors tended to be well demarcated, while HPV-negative tumors demonstrated ill-defined borders with increased propensity to invade surrounding muscle. The index of suspicion for HPV-positive squamous cell carcinoma of the oropharynx should be high in the setting of a cystic neck mass (Cantrell 2013).
PET/CT has no proven benefit over CT alone in detecting residual disease in locally advanced HNSCC in unselected patients. There is, however, a benefit in high-risk patients, such as those with HPV-negative disease, positive tobacco history, and nonoropharyngeal cancer (Moeller 2009).
PET/CT and CT are both useful in predicting disease-specific survival in high-risk HNSCC patients. These modalities are less helpful in low-risk patients and patients with distant metastases (Moeller 2010).
Contralateral neck metastases in the setting of a small primary should raise the index of suspicion for a synchronous tumor. Pet scanning is helpful in identification of primary tumors as well as searching for other metastases (Roeser 2010).
Head and neck squamous cell carcinoma may best be addressed with consideration to different etiologies and mechanisms of carcinogenesis, as well as using novel techniques such as molecular imaging to guide therapy and evaluate response. HPV-positivity in particular is an important prognostic biomarker, predicting disease behavior and response to therapy (Pryor 2011).
Biomarkers provide an effective way to stratify patients to different treatment modalities. Patients with aggressive biologic features such as aneuploidy, high serum VEGF, and infiltrating histologic pattern may best be treated with chemoradiation, while those with less-aggressive biologic features may best be treated with surgery (Wolf 2007).
A negative post-treatment PET/CT scan may identify patients who require less-intensive surveillance for recurrence in HNSCC. HPV-positivity increases the accuracy of this finding (Zhang 2011).
Hypoxia imaging may provide a way to stratify patients into those who may require hypoxia modification strategies to improve tumor control (Pryor 2011).
Pet scanning shows increased accuracy in the assessment of response to chemoradiotherapy in the setting of locally advanced HNSCC (Pryor 2011).
Techniques such as FLT-PET which indicate cellular proliferation may be helpful in determining response early in treatment (Pryor 2011).
DEMOGRAPHICS	DEMOGRAPHICS
HPV-positivity is associated with cystic or necrotic lymph node metastases on imaging, and imaging may allow for determination of HPV status and prognosis (Corey 2012).	Patients being treated for such HPV-positive disease originating in the oropharynx often have a better response to therapy, which is suggestive of improved outcomes (Chen 2013).
Second primary cancers in the setting of HNSCC are an important consideration in prognosis as well as treatment. Their management may affect the treatment of the HNSCC and should be addressed first (Myers 2010).	There are many different presentations of head and neck cancer and other rare entities that must be considered in the differential diagnosis (Corey 2012).
Cancers with an unknown primary present a difficult scenario regarding workup and treatment. Treatment should be tailored to the individual, with different regimens indicated based on extent of disease and HPV-positivity (Strojan 2013).	A careful history and physical examination eliciting an absence of smoking exposure and younger patient age are immediately suggestive of HPV infection (Chen 2013).
THERAPEUTICS	THERAPEUTICS
HPV-positive head and neck cancers regress in size more rapidly during the early phase of treatment when compared to HPV-negative head and neck cancers. This suggests that the traditional dose of 70 Gy may be too high in this population, and individualized treatment plans may help to decrease unnecessary toxicity (Chen 2013). HPV-associated HNSCC is more responsive to therapy and less likely to develop subsequent malignancies than HPV-negative HNSCC. It typically presents in younger, non-smoking individuals at the tonsil or at the base of tongue. Primary tumors are typically small, with large cystic lymph nodes. Histologically, they are non-keratinizing and basaloid (Pryor 2011).	The response to therapy for HPV-positive tumors in these locations appears to be more variable than when primary in the oropharynx (this also RELATES TO IMAGING) (Corey 2012). Further understanding of these tumors and perceived increased response to treatment could potentially open the avenue to de-escalate treatment therapy, potentially reducing treatment toxicity and long-term morbidity for survivors (Chen 2013). The response of all HPV-positive cancers to chemoradiation appears to be more responsive than non-HPV cancers, although a range of responses are noted (response to IMRT, Chen 2013).

Our experts’ gradings of the two trainees’ work is shown in Figure 2. As seen, our two trainees scored an average of 4 (top 20%) on the eight questions that were numerically based. Questions 8 and 9 were qualitative, not quantitative, so there was no statistical information to collect. Since questions 11 and 12 were answered once per article, their statistical data are not of value.

Figure 2.

The means and standard deviations of the scores for the eight numerically represented questions. Note that these values are aggregated across all physicians and trainees.

Figure 3 illustrates trends from several of the questions, revealing visible relationships between trainees 1 and 2. We noted that there appeared on average some reasonable agreement at least on the item level; this seems to become much weaker at the classification level of questions (classification-level assessment appears to be less consistent than item-level assessment).

Figure 3.

Selected trend values between the two trainees on item information retrieval, item practicality, item education value and classification education value.

Finally, because of the low number of trainees and physician mentors, only one of the instrument questions appeared to have a low statistical correlation value of 0.2254.

Figure 4.

Item extraction (instrument question 3) illustrated a correlation value of 0.22564 on item extraction “grades” between the two physician experts.

Discussion

This work focuses on informatics by working with practicing academic clinicians (Drs. Alleman from Diagnostic Radiology and Matthiesen from Radiation Oncology) who are managing real medical trainees to curate and create a resource on HPV and head and neck cancer. This work is less concerned with the significantly more complex task of reducing the textual extraction to automation tools. Instead, we have created what we termed as an ISOA. This ISOA relies on a small group of early career professionals and trainees, overseen by senior attending physicians, to collect, manage, and organize this information. We believe that our ISOA is a feasible solution, as we have shown it to work successfully within the confines of a clinical environment. Additionally, it is mainly resourced by medical students and attending physician experts who do not require extensive technical training. However, based on the feedback from our attending physicians, we have already considered future steps.

Our physician reviewers provided these actual responses prior to the entire team finally assembling to reflect and summarize our results as a community. Dr. Alleman shared the following: the area of HPV and HNSCC and imaging “is a very dynamic body of literature that is diversified between radiology, head and neck, radiation therapy, and with ties to gynecological cancer, epidemiology, and even immunology. So, this is a diverse and dynamic body of literature that is difficult for a single individual to assimilate”. The main benefit, according to Dr. Alleman, is exposure and training for informatics provided to medical students and pre-medical students.

According to our Radiation Oncologist expert, Dr. Matthiesen, “The student summarizations were accurate and of use, but the selection of information to include in the summary needs work. However, the project showed the students often struggled to grasp the main concepts of many articles. Occasionally they appeared to get lost in the details of the manuscripts. Students also struggled to categorize information properly. I would attribute this likely to lack of general knowledge of the disease and lack of exposure. Because of this, more advanced radiation oncology residents or fellows might be better suited as test subjects in a future project for the categorization portion of the ISOA.”

In summary, we have concluded that initiating a repository with pre-medical students and medical students is feasible at the “item” level. Summary accuracy for each item is consistent and accurate enough to be useable by our attending physicians. However, more experienced students, or additional preparation or training, may be needed prior to the second phase. Perhaps, a second phase could include integration with medical residents/fellows who can also benefit from this exercise. In the long run, we hope to deploy an automated NLP system.

We sought to engage critical thinkers on the topic of HPV and head and neck cancer and prioritized efforts to recruit these physicians to participate in the mentorship of students. However, physician time is understandably focused on the direct care of patients, and this fact limited the participation of a large number of mentors in this project. We do believe that an ISOA could be designed to ft even the hectic schedules of physicians due to its simplicity and the large amounts of potential users. Additionally, we hope such a tool can better communication of information between different departments such as Radiation Oncology and the Radiology. This work has the potential to be extended not just at single institution, but also to provide a training method for future generations of medical professionals in this emerging issue. We are limited by our focus on emergent literature, and so may not necessarily provide a method that is targeted for research involving large numbers of publications, but we feel that it is timely and necessary to plan and implement a multicenter trial. Because of the low overall prevalence of HPV-HNSCC case studies, such tools may be helpful for greater efficiency of clinical trial design. Both physicians and our team agree that the creation of enduring content is one of the biggest benefits, as it is a snapshot in time that can be built upon by future informatics and knowledge workers, whether they expand this directly in their medical practice or organize these data as training material for future automatic processes in natural language processing constructs.

Author Contributions

Conceived the concepts: DHW. Analyzed the data: DHW, ALF. Wrote the first draft of the manuscript: DHW. Contributed to the writing of the manuscript: DHW, CLM, AMA, ALF, TCG. Agree with manuscript results and conclusions: DHW, CLM, AMA, ALF, TCG. Jointly developed the structure and arguments for the paper: DHW, CLM, AMA, ALF, TCG. Made critical revisions and approved final version: DHW, CLM, AMA, ALF, TCG. All authors reviewed and approved of the final manuscript.

Footnotes

Acknowledgments

We thank Shari C. Clifton, Prasanna Vaduvathiriyan, and Phill Jo for their assistance in the search for the referenced articles used in this publication.

References

Oncology ASOC. Clinical Cancer Advances 2013: Head and Neck Cancers. 2013.

Ramqvist

, Dalianis

. Oropharyngeal cancer epidemic and human papillomavirus. Emerg Infect Dis. 2010; 16(11): 1671–77.

Leemans

C.R.

, Braakhuis

B.J.M.

, Brakenhoff

R.H.

. The molecular biology of head and neck cancer. Nat Rev Cancer. 2011; 11: 9–22.

Joseph

A.W.

, D'Souza

. Epidemiology of human papillomavirus-related head and neck cancer. Otolaryngologic Clinics of North America. 2012; 45(4): 739–64.

Centers for Disease Control and Prevention. “HPV-Associated Oropharyngeal Cancer Rates by Race and Ethnicity,” 2013. http://www.cdc.gov/cancer/hpv/statistics/headneck.htm. Accessed May 22, 2014.

American Cancer Society. “Cancer Facts & Figures 2012,” 2012. http://www.cancer.org/acs/groups/content/@epidemiologysurveilance/documents/document/acspc-031941.pdf. Accessed May 22, 2014.

Brown

L.M.

, Check

D.P.

, Devesa

S.S.

. Oropharyngeal cancer incidence trends: diminishing racial disparities. Cancer Causes Control. 2011; 22: 753–63.

Chen

A.M.

, Li

, Beckett

L.A.

. Differential response rates to irradiation among patients with human papillomavirus positive and negative oropharyngeal cancer. Laryngoscope. 2013; 123(1): 152–7.

ECOG 1308.

A phase II trial of induction chemotherapy followed by cetuximab with low dose versus standard dose IMRT in patients with HPV-associated resectable squamous cell carcinoma of the oropharynx (OP).

J Clin Oncol. 2012. http://meetinglibrary.asco.org/content/97623-114. Accessed May 22, 2014.

10.

Administration USFaD. Inside Clinical Trials: Testing Medical Products in People. 2013; Available at http://www.fda.gov/drugs/resourcesforyou/consumers/ucm143531.htm, 2014

11.

Basow

D.S.

. Use of Evidence-based Resources by Clinicians Improves Patient Outcomes. Wolters Kluwer Health. 2014. http://www.wolterskluwerhealth.com/News/Documents/White%20Papers/Evidence-based%20Resources%20to%20Improve%20Patient%20Outcomes.pdf. Accessed May 22, 2014.

12.

Ely

J.W.

, Osheroff

J.A.

, Chambliss

M.L.

, Ebell

M.H.

, Rosenbaum

M.E.

. Answering physicians’ clinical questions: obstacles and potential solutions. J Am Med Inform Assoc. 2005; 12(2): 217–24.

13.

Coumou

H.C.

, Meijman

F.J.

. How do primary care physicians seek answers to clinical questions? A literature review. J Med Libr Assoc. 2006; 94(1): 55–60.

14.

Meystre

S.M.

, Haug

P.J.

. Comparing natural language processing tools to extract medical problems from narrative text. AMIA. Annual Symposium Proceedings/AMIA Symposium. Washington, DC: AMIA Symposium, 2005: 525–9.

15.

Aronson

A.R.

, Mork

, Lang

F.M.

, Rogers

, Jimeno-Yepes

, Sticco

J.C.

. The NLM Indexing Initiative: Current Status and Role in Improving Access to Biomedical Information. Bethesda, MD: U.S. National Library of Medicine, 2012.

16.

Carrell

D.S.

, Halgrim

, Tran

D.T.

. Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. Am J Epidemiol. 2014; 179(6): 749–58.

17.

Chen

Hsinchun

. Medical Informatics: Knowledge Management and Data Mining in Biomedicine. New York, NY: Springer, 2005.

18.

Powers

D.M.W.

. Evaluation: from precision, recall and F-measure to roc, informedness, markedness & correlation. J Mach Learn Technol. 2011; 2(1): 37–63.

19.

Wikipedia. Comparison of Reference Management Software. 2014; Available at http://en.wikipedia.org/wiki/Comparison_of_reference_management_software, 2014.

20.

Lancaster, Wilfrid

Frederick

. Indexing and Abstracting in Theory and Practice. Champaign, IL: Library Association, 1998.

21.

Medicine USNLo. Bibliographic Services Division Index. 2014; Available at http://www.nlm.nih.gov/bsd/indexfaq.html, 2014.

22.

Hliaoutakis

, Zervanou

, Petrakis

E.G.M.

. The AMTEx approach in the medical document indexing and retrieval application. Data Knowl Eng. 2009; 68(3): 380–92.

23.

Jimeno-Yepes

A.J.

, Plaza

, Mork

J.G.

, Aronson

A.R.

, Diaz

. MeSH indexing based on automatically generated summaries. BMC Bioinformatics. 2013; 14: 208.

24.

Luhn

H.P.

. The Automatic Creation of Literature Abstracts. Riverton, NJ: MIT Press, 1958.

25.

Workman

T.E.

, Fiszman

, Hurdle

J.F.

. Text summarization as a decision support aid. BMC Med Inform Decis Mak. 2012; 12: 41.

26.

Lloret

, Romá-Ferri

, Palomar

. COMPENDIUM: a text summarization system for generating abstracts of research papers. In: Muñoz

, Montoyo

, Métais

, eds. Natural Language Processing and Information Systems. Vol 6716. Berlin, Heidelberg: Springer; 2011: 3–14.

27.

Lorin

M.I.

, Palazzi

D.L.

, Turner

T.L.

, Ward

M.A.

. What is a clinical pearl and what is its role in medical education? Med Teach. 2008; 30(9-10): 870–4.