Analysis of the predictability of time series obtained from genomic sequences by using several predictors

Abstract

In previous papers, we used one-step-ahead predictors for the genomic sequence recognition scores computation. The genomic sequences are coded as distances between successive bases. The recognition scores were then used as inputs for a hierarchical decision system. The relevance of these scores might be affected by the prediction quality. It is necessary to appreciate the prediction performance in a framework based on the analyzed time series predictability. The aim of this paper is to determine which predictors are most suitable for genomic sequence identification. We analyze linear predictors (like linear combiner), neuronal predictors (RBF or MLP type), and neuro-fuzzy predictors (Yamakawa model based). Several methods to appreciate the predictability of time series are used, like Hurst exponent, self-correlation function, and eta metric. All predictors were tested and compared for prediction quality using sequences from HIV-1 genome. The mean square prediction error (MSPE), direction test, and Theil coefficient were used as prediction performance measures. The prediction results obtained with the predictors are contrasted and discussed.

Keywords

Distance series genomic sequences predictability prediction performances recognition scores

Get full access to this article

View all access options for this article.