Abstract
Abstract
Identifying a word (pattern) in a long sequence of letters is not an easy task. To achieve this objective, several models have been proposed under the assumption that the sequence of letters is described by a Markov chain. The Markovian hypothesis imposes restrictions on the distribution of the sojourn time in a state, which has geometric distribution in a discrete process. This is the main drawback when applying Markov chains to real problems. By contrast, semi-Markov processes are generalized. In semi-Markov processes, the sojourn time in a state can be governed by any distribution function. The goal of this article is to compute the first hitting time (position) of a word (pattern) in a semi-Markov sequence. To achieve this objective, we use the auxiliary prefix and backward chain. To give an example of the applications of the proposed model, the model is tested in a bacteriophage DNA sequence that is lacking the enzyme SmaI. We compute the probability that a word occurs for the first time after n nucleotides in a DNA sequence. The corresponding probability distribution, the mean waiting position, the variance, and rate of the occurrence of the word are obtained.
Get full access to this article
View all access options for this article.
