Abstract
High-dimensional microarray data presents a major challenge for accurate disease classification due to feature redundancy and limited sample sizes. We hypothesize that modeling multivariate gene relationships using a graph-based structure can enhance the effectiveness of gene selection without relying on labelled data. This paper proposes Feature Graph-based Unsupervised Gene Selection (FGUGS), a novel unsupervised filter method that constructs a threshold-free directed graph to represent pairwise gene similarities and selects a compact subset of genes by identifying high in-degree nodes, thereby minimizing redundancy. FGUGS eliminates the need for manual parameter tuning, enhances interpretability through graph analysis, and scales efficiently with the number of genes while preserving structural relationships. Experimental results on four benchmark datasets show that FGUGS outperforms both traditional and recent state-of-the-art gene selection methods, achieving up to a 20% improvement in classification accuracy and demonstrating strong clustering performance. FGUGS provides a reproducible and scalable solution for biomedical data analysis, particularly in scenarios where class labels are scarce or unavailable.
Keywords
Get full access to this article
View all access options for this article.
