Critical Phenomena in Sequence Matching
Arratia, Richard ; Waterman, Michael S.
Ann. Probab., Tome 13 (1985) no. 4, p. 1236-1249 / Harvested from Project Euclid
We give a generalization of the result of Erdos and Renyi on the length $R_n$ of the longest head run in the first $n$ tosses of a coin. Consider two independent sequences, $X_1 X_2\cdots X_m$ and $Y_1Y_2\cdots Y_n$. Suppose that $X_1, X_2,\cdots$ are i.i.d. $\mu$, and $Y_1, Y_2,\cdots$ are i.i.d. $\nu$, where $\mu$ and $\nu$ are possibly different distributions on a common finite alphabet $S$. Let $p \equiv P(X_1 = Y_1) \in (0, 1)$. The length of the longest matching consecutive subsequence is $M_{m,n} \equiv \max \{k: X_{i+r} = Y_{j+r}$ for $r = 1$ to $k$, for some $0 \leq i \leq m - k, 0 \leq j \leq n - k\}$. For $m$ and $n \rightarrow \infty$ with $\log(m)/\log(mn) \rightarrow \lambda \in (0,1)$, our result is that there is a constant $K \equiv K(\mu, \nu, \lambda) \in (0, 1\rbrack$ such that $P(\lim M_{m,n}/\log_{1/p}(mn) = K) = 1$. The proof uses large deviation methods. The constant $K$ is determined from a variational formula involving the Kullback-Liebler distance or relative entropy. A simple necessary and sufficient condition for $K = 1$ is given. For the case $m = n (\lambda = 1/2)$ and $\mu = \nu, K = 1$. The set of $(\mu, \nu, \lambda)$ for which $K = 1$ has nonempty interior. The boundary of this set is the location of a phase transition. The results generalize to more than two sequences and to Markov chains. A strong law of large numbers is given for the proportion of letters within the longest matching word; the limiting proportion exhibits critical behavior, similar to that of $K$.
Publié le : 1985-11-14
Classification:  Entropy,  Kullback-Liebler distance,  large deviations,  sequence matching,  60J10,  68G10,  94A17
@article{1176992808,
     author = {Arratia, Richard and Waterman, Michael S.},
     title = {Critical Phenomena in Sequence Matching},
     journal = {Ann. Probab.},
     volume = {13},
     number = {4},
     year = {1985},
     pages = { 1236-1249},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1176992808}
}
Arratia, Richard; Waterman, Michael S. Critical Phenomena in Sequence Matching. Ann. Probab., Tome 13 (1985) no. 4, pp.  1236-1249. http://gdmltest.u-ga.fr/item/1176992808/