Abstract
In previous papers, we used one-step-ahead predictors for the genomic sequence recognition scores computation. The genomic sequences are coded as distances between successive bases. The recognition scores were then used as inputs for a hierarchical decision system. The relevance of these scores might be affected by the prediction quality. It is necessary to appreciate the prediction performance in a framework based on the analyzed time series predictability. The aim of this paper is to determine which predictors are most suitable for genomic sequence identification. We analyze linear predictors (like linear combiner), neuronal predictors (RBF or MLP type), and neuro-fuzzy predictors (Yamakawa model based). Several methods to appreciate the predictability of time series are used, like Hurst exponent, self-correlation function, and eta metric. All predictors were tested and compared for prediction quality using sequences from HIV-1 genome. The mean square prediction error (MSPE), direction test, and Theil coefficient were used as prediction performance measures. The prediction results obtained with the predictors are contrasted and discussed.
Get full access to this article
View all access options for this article.
