Sage Journals: Discover world-class research

Abstract

We have developed a method for extracting the number of trial participants from abstracts describing randomized controlled trials (RCTs); the number of trial participants may be an indication of the reliability of the trial. The method depends on statistical natural language processing. The number of interest was determined by a binary supervised classification based on a support vector machine algorithm. The method was trialled on 223 abstracts in which the number of trial participants was identified manually to act as a gold standard. Automatic extraction resulted in 2 false-positive and 19 false-negative classifications. The algorithm was capable of extracting the number of trial participants with an accuracy of 97% and an F-measure of 0.84. The algorithm may improve the selection of relevant articles in regard to question-answering, and hence may assist in decision-making.

Get full access to this article

View all access options for this article.

References

Mattox

. Welcome to ARCHIVES CME. Arch Otolaryngol Head Neck Surg 2000;126:914

Ebbert

, Dupras

, Erwin

. Searching the medical literature using PubMed: a tutorial. Mayo Clin Proc 2003;78:87–91

Swinglehurst

. Information needs of United Kingdom primary care clinicians. Health Info Libr J 2005;22:196–204

, Garten

, Supekar

, Das

, Altman

, Garber

. Extracting subject demographic information from abstracts of randomized clinical trial reports. Stud Health Technol Inform 2007;129:550–4

Niu

, Hirst

, McArthur

, Rodriguez-Gianolli

. Answering clinical questions with role identification. In: Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine 2003

Mendonca

, Cimino

, Johnson

, Seol

. Accessing heterogeneous sources of evidence to answer clinical questions. J Biomed Inform 2001;34:85–98

Westbrook

, Coiera

, Gosling

. Do online information retrieval systems help experienced clinicians answer clinical questions? J Am Med Inform Assoc 2005;12:315–21

Burns

, Feng

, Hovy

. Intelligent Approaches to Mining the Primary Research Literature: Techniques, Systems, and Examples. Berlin/Heidelber: Springer, 2008

Chung

, Coiera

. A study of structured clinical abstracts and the semantic classification of sentences. In: Proceedings of BioNLP 2007

10.

Richardson

, Wilson

, Nishikawa

, Hayward

. The well built clinical question: a key to evidence-based decisions. ACP J Club 1995;123:A12–A13

11.

Huang

, Lin

, Demner-Fushman

. Evaluation of PICO as a knowledge representation for clinical questions. AMIA Annu Symp Proc 2006;2006:359–63

12.

Demner-Fushman

, Lin

. Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics 2007;33:63–103

13.

Knight

. Mining online text. Commun ACM 1999;42:58–61

14.

Aronson

. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp 2001:17–21

15.

Rindflesch

, Fiszman

. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform 2003;36:462–77

16.

Humphreys

, Lindberg

, Schoolman

, Barnett

. The Unified Medical language System: an informatics research collaboration. J Am Med Inform Assoc 1998;5:1–11

17.

Moher

, Schulz

, Altman

. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet 2001;357:1191–4

18.

Tsuruoka

, Tateishi

, Kim

, Developing a Robust Part-of-Speech Tagger for Biomedical Text. Berlin/Heidelber: Springer, 2005

19.

Liu

, Motoda

. Computational Methods of Feature Selection. Boca Raton, FL: Chapman and Hall/CRC, 2007

20.

Vapnik

, Drucker

, Wu

. Support vector machines for spam categorization. IEEE Trans Neural Netw 1999;10:1048–54

21.

Yang

, Liu

. A re-examination of text categorization methods. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval 1999

22.

Joachims

. Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the ECML-98 10th European Conference on Machine Learning 1998

23.

Witten

, Frank

. Data Mining: Practical Machine Learning Tools and Techniques. San Francisco, CA: Morgan Kaufmann, 2005

A method of extracting the number of trial participants from abstracts describing randomized controlled trials

Abstract

Get full access to this article

References