Application of perturbed datasets to evaluate genomic biomarkers identified by microarray analysis

Abstract

An important application of microarray analysis is to identify a subset of differentially expressed genes as biomarkers. Microarray experiments are often performed with few replicates and data are obtained from a limited sample size. To overcome the impact of small sample size on gene marker identification, we used a new approach to validate the candidate gene markers. In this study, candidate genes were first identified based on the statistically significant p-value levels and the biologically significant levels of fold-change from original microarray data. Multiple new, perturbed datasets were then generated from the original dataset by introducing artificial errors at different levels. Subsequently, the performance of candidate genes in the new perturbed datasets was evaluated using t-test at various statistical significance levels. Based on the stability of candidate genes in the perturbed datasets at different error levels, a subset of candidate genes can be selected as potential target biomarkers. Perturbed artificial datasets provide a new opportunity to validate candidate target genes identified by microarray experiments with a limited sample size.

Keywords

Perturbed datasets microarray analysis gene expression profiling biomarkers

Get full access to this article

View all access options for this article.