Microsatellites, arrays of 1–6 bp sequences, are abundant in almost
all the eukaryotic genomes. Their distribution in the genome is widely accepted
to be differential and non random along the axis of the chromosomes.
Arabidopsis thaliana genome is dominated by mononucleotide repeats,
(A)
$_{n}$
being the most abundant motif. In total, 39
microsatellite motifs extended to more than 100 bp in length. Of these, 8 loci
are devoid of any gene in their proximity. (AG)
$_{n}$
is the
most abundant motif among longer repeats. The non-random distribution of
microsatellite in the genome is reflected as occurrence of microsatellite
clusters in the genome. In total, 3400 microsatellite clusters have been
identified in the Arabidopsis genome. Chromosome 2, which is 19.7 Mb
long, harbors 550 clusters accommodating 29% of all the microsatellites
present on this chromosome. Further, 409 of the 6239 genes on chromosome 2 are
associated with 323 microsatellite clusters. Motifs like
(AGG)
$_{n}$
and (ACT)
$_{n}$
, show
preferential accommodation in clusters that overlap with genes. Among all the
microsatellite clusters that show an overlap with genes, 80% of the clusters
show an overlap in such a way that the cluster ends beyond the 3'-end of the
gene or starts before the 5'-end of a gene. Genes with diverse functions show
association with the clusters. However, not all members of a gene family show
similar associations.