Parametric Modelling Of Genomic Sequences Distance

Abstract

Genomic data models relate to a large number of positions exhibiting categorical responses, and larger part of that provides no statistical information . For this reason, a reduction in complexity attained by focusing on the functional part of sequences (genes) is helpful for an appraisal of the statistical intersite dependence. The problem of homogeneity among groups of genomic sequences incorporating the available (heuristic) evidence of diversity in categorical data models is considered here. The proposed fully operational parametric statistical model is fortified with flexibility to withstand use in several organisms and adaptability to intersite dependence. Properties of the proposed inference procedures are studied and an illustrative real data example is thoroughly explored.

Keywords

Amino acid Asymptotic distribution Maximum likelihood estimation Categorical data Genome Nucleotide Statistical genetics.

Get full access to this article

View all access options for this article.