Semi-supervised learning approaches for predicting semantic characteristics of lung nodules

Abstract

Several research studies have shown that interpretation performance varies greatly among radiologists. One specific example is the Lung Image Database Consortium (LIDC) dataset. Although it was created to serve as an international research resource for the development and evaluation of computer-aided diagnosis (CAD) algorithms, out of 149 distinct nodules detected by up to four different radiologists, there were only 80 nodules on which at least three radiologists agreed in average with respect to seven nodule semantic characteristics (lobulation, malignancy, margin, sphericity, spiculation, subtlety, and texture).

In this paper, we propose two semi-supervised learning approaches for automatically predicting semantic characteristics of lung nodules based on low-level image features with the final goal of using these approaches to reduce the radiologists' interpretation variability. The nodules on which at least three radiologists agree serve as the labeled data and all the other nodules serve as unlabeled data for the proposed approaches. The learning approaches have their roots in the ensemble technique DECORATE and use decision trees to build the ensemble of classifiers. We show that, in the case of the LIDC data, we are able to improve the accuracy prediction by 50% on average when using our proposed semi-supervised approaches versus the traditional supervised classification approaches.

Get full access to this article

View all access options for this article.