We show in general how the substitution matrix and gap penalty function for local sequence alignments can be chosen such that the score statistic grows at a logarithmic rate when the two sequences are unrelated. The method used is the construction of a mixture distribution in which sequences with large scores are generated with uniformly higher likelihood. This distribution is also used for the importance sampling of the $p$-value of the score. An upper bound of this $p$-value is computed and compared against the simulated value.
@article{1068128974,
author = {Peng Chan, Hock},
title = {Upper bounds and importance sampling of p-values for DNA and protein sequence alignments},
journal = {Bernoulli},
volume = {9},
number = {3},
year = {2003},
pages = { 183-199},
language = {en},
url = {http://dml.mathdoc.fr/item/1068128974}
}
Peng Chan, Hock. Upper bounds and importance sampling of p-values for DNA and protein sequence alignments. Bernoulli, Tome 9 (2003) no. 3, pp. 183-199. http://gdmltest.u-ga.fr/item/1068128974/