A Poisson Approximation for Sequence Comparisons with Insertions and Deletions
Neuhauser, Claudia
Ann. Statist., Tome 22 (1994) no. 1, p. 1603-1629 / Harvested from Project Euclid
We construct a statistical test for a sequence alignment problem which enables us to decide whether two given sequences are related. Such a test can be used in DNA and protein sequence comparisons. It is based on a comparison of two long sequences of i.i.d. letters taken from a finite alphabet. The test statistic typically employed is the length of the longest matching region between the two sequences in which a certain number of insertions and deletions but no mismatches are allowed. We give a distributional result which enables one to compute $P$-values, and hence to decide whether or not the two sequences are related. Its proof utilizes the Chen-Stein method for Poisson approximation. The test is based on a greedy algorithm that searches for the longest matching region. We show that this algorithm finds the longest matching region with probability approaching 1 as the lengths of the two sequences go to infinity.
Publié le : 1994-09-14
Classification:  Chen-Stein method,  sequence matching,  Poisson approximation,  DNA sequences,  greedy algorithm,  62F05,  92D20
@article{1176325645,
     author = {Neuhauser, Claudia},
     title = {A Poisson Approximation for Sequence Comparisons with Insertions and Deletions},
     journal = {Ann. Statist.},
     volume = {22},
     number = {1},
     year = {1994},
     pages = { 1603-1629},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1176325645}
}
Neuhauser, Claudia. A Poisson Approximation for Sequence Comparisons with Insertions and Deletions. Ann. Statist., Tome 22 (1994) no. 1, pp.  1603-1629. http://gdmltest.u-ga.fr/item/1176325645/