Abstract
Microarray technologies help to observe the expression levels of thousands of genes. Analysis of gene expression data arising from these experiments provides insight into different subtypes of diseases and functions of genes. Gene expression data are characterized by a large number of genes and a few samples. Employing traditional supervised classifiers for prediction requires adequate labeled data. However, the limited number of samples make the prediction of disease subtypes a difficult task. Hence, we investigate the potential of semi-supervised learning to delineate the tissue samples from a few labeled data. The available labeled samples were exploited to guide the clustering of unlabeled samples. A classification system by integrating feature selection techniques with semi-supervised fuzzy c-means algorithm was built. The system was evaluated using publicly available gene expression datasets and results showed that a few labeled tissue samples can assist in the accurate prediction of disease subtypes.
Get full access to this article
View all access options for this article.
