The EM algorithm and its implementation for the estimation of frequencies of SNP-haplotypes
Polańska, Joanna
International Journal of Applied Mathematics and Computer Science, Tome 13 (2003), p. 419-429 / Harvested from The Polish Digital Mathematics Library

A haplotype analysis is becoming increasingly important in studying complex genetic diseases. Various algorithms and specialized computer software have been developed to statistically estimate haplotype frequencies from marker phenotypes in unrelated individuals. However, currently there are very few empirical reports on the performance of the methods for the recovery of haplotype frequencies. One of the most widely used methods of haplotype reconstruction is the Maximum Likelihood method, employing the Expectation-Maximization (EM) algorithm. The aim of this study is to explore the variability of the EM estimates of the haplotype frequency for real data. We analyzed haplotypes at the BLM, WRN, RECQL and ATM genes with 8-14 biallelic markers per gene in 300 individuals. We also re-analyzed the data presented by Mano et al. (2002). We studied the convergence speed, the shape of the loglikelihood hypersurface, and the existence of local maxima, as well as their relations with heterozygosity, the linkage disequilibrium and departures from the Hardy-Weinberg equilibrium. Our study contributes to determining practical values for algorithm sensitivities.

Publié le : 2003-01-01
EUDML-ID : urn:eudml:doc:207655
@article{bwmeta1.element.bwnjournal-article-amcv13i3p419bwm,
     author = {Pola\'nska, Joanna},
     title = {The EM algorithm and its implementation for the estimation of frequencies of SNP-haplotypes},
     journal = {International Journal of Applied Mathematics and Computer Science},
     volume = {13},
     year = {2003},
     pages = {419-429},
     zbl = {1035.62115},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv13i3p419bwm}
}
Polańska, Joanna. The EM algorithm and its implementation for the estimation of frequencies of SNP-haplotypes. International Journal of Applied Mathematics and Computer Science, Tome 13 (2003) pp. 419-429. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv13i3p419bwm/

[000] Bonnen P.E., Story M.D., Ashorn C.L., Buchholz T.A., Weil M.M. and Nelson D.L. (2000): Haplotypes at ATM identify coding-sequence variation and indicate a region of extensive linkage disequilibrium. — Am. J. Hum. Genet., Vol. 67, No. 6, pp. 1437–1451.

[001] Chiano M.N. and Clayton D.G. (1998): Fine genetic mapping using haplotype analysis and the missing data problem. — Ann. Hum. Genet., Vol. 62, Pt. 1, pp. 55–60.

[002] Clark A.G. (1990): Inference of haplotypes from PCR-amplified samples of diploid populations. — Mol. Biol. Evol., Vol. 7, No. 2, pp. 111–122.

[003] Clark V.J., Metheny N., Dean M. and Peterson R.J. (2001): Statistical estimation and pedigree analysis of CCR2-CCR5 haplotypes. — Hum. Genet., Vol. 108, No. 6, pp. 484–493.

[004] Dempster A.P., Laird N.M. and Rubin D.B. (1977): Maximum likelihood from incomplete data via the EM algorithm. — J. R. Stat. Soc., Vol. 39, No. 1, pp. 1–38. | Zbl 0364.62022

[005] Excoffier L. and Slatkin M (1995): Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. — Mol. Biol. Evol., Vol. 12, No. 5, pp. 921–927.

[006] Fallin D. and Schork N.J. (2000): Accuracy of haplotype frequency estimation for biallelic loci, via the Expectation- Maximization algorithm for unphased diploid genotype data. — Am. J. Hum. Genet., Vol. 67, No. 4, pp. 947–959.

[007] Ghosh S. and Majumder P.P. (2000): Mapping a quantitative trait locus via the EM algorithm and Bayesian classification. — Genet. Epidemiol., Vol. 19, No. 2, pp. 97–126.

[008] Hawley M.E. and Kidd K.K. (1995): HAPLO: A program using the EM algorithm to estimate the frequencies of multi-site haplotypes. — J. Heredity, Vol. 86, No. 5, pp. 409–411.

[009] Hudson R.R. and Kaplan N.L. (1985): Statistical properties of the number of recombination events in the history of a sample of DNA sequence. — Genetics, Vol. 111, No. 1, pp. 147–164.

[010] Kalinowski S.T. and Hedrick P.W. (2001): Estimation of linkage disequilibrium for loci with multiple alleles: Basic approach and an application using data from boghorn sheep. — Heredity, Vol. 87, Pt. 6, pp. 698–708.

[011] Lin S., Cutler D.J., Zwick M.E. and Chakravarti A. (2002): Haplotype inference in random population samples. — Am. J. Hum. Genet., Vol. 71, No. 5, pp. 1129–1137.

[012] Long J.C., Williams R.C. and Urbanek M. (1995): An E-M algorithm and testing strategy for multiple-locus haplotypes. — Am. J. Hum. Genet., Vol. 56, No. 3, pp. 799–810.

[013] Mano S., Yasuda N., Tamiya G., Inoko H., Gojobori T. and Imanishi T. (2002): Phase space structure if haplotype frequency estimation by the EM algorithm. — Proc. Waterfront Symp. Human Genome ScienceWASH 2002, Tokyo, Japan.

[014] McKeigue P.M. (2000): Efficiency of estimation of haplotype frequencies: Use of marker phenotypes of unrelated individuals versus counting of phase-known gametes. — Am. J. Hum. Genet., Vol. 67, No. 6, pp. 1626–1627.

[015] McLachlan G.J. and Thriyambakam K. (1997): The EM algorithm and extensions. — New York: Wiley. | Zbl 0882.62012

[016] Meng X. and van Dyke D. (1977): The EM algorithm — An old folk-song sung to a fast new tune. — J. R. Statist. Soc. B, Vol. 59, No. 3, pp. 511–567. | Zbl 1090.62518

[017] Niu T., Qin Z.S., Xu X. and Liu J.S. (2002): Bayesian haplotype inference for multiple linked Single-Nucleotide Polymorphisms. — Am. J. Hum. Genet., Vol. 70, No. 1, pp. 157– 169.

[018] Patil N., Berno A.J., Hinds D.A., Barrett W.A., Doshi J.M., Hacker C.R., Kautzer C.R., Lee D.H. Marjoribanks C., McDonough D.P., et al. (2001): Blocks of limited halplotype diversity revealed by high-resolution scanning of human chromosome 21. — Science, Vol. 294, No. 5547, pp. 1719–1723.

[019] Qin Z.S., Niu T. and Liu J.S. (2002):Partition-Ligation- Expectation-Maximization algorithm for haplotype inference with Single-Nucleotide Polymorphism. — Am. J. Hum. Genet., Vol. 71, No. 5, pp. 1242–1247.

[020] Rohde K. and Fuerst R. (2001): Haplotyping and estimation of haplotype frequencies for closely linked biallelic multilocus genetic phenotypes including nuclear family information. — Hum. Mutat., Vol. 17, No. 4, pp. 289–295.

[021] Schneider S., Roessli D. and Excoffier L. (2000): Arlequin 2.001: A software for population genetics data analysis. — Genetics and Biometry Laboratory, University of Geneva, Switzerland.

[022] Single R.M., Meyer D., Hollenbach J.A., Nelson M.P., Noble J.A., Erlich H.A. and Thomson G. (2002): Haplotype frequency estimation in patient populations: the effect of departures from Hardy Weinberg proportions and collapsing over a locus in the HLA region. — Genet. Epidemiol., Vol. 22, No. 2, pp. 186–195.

[023] Slatkin M. and Excoffier L. (1996): Testing for linkage disequilibrium in genotypic data using the Expectation- Maximization algorithm. — Heredity, Vol. 76, Pt. 4, pp. 377–383.

[024] Stephens M., Smith N.J. and Donnelly P. (2001a): A new statistical method for haplotype reconstruction from population data. — Am. J. Hum. Genet., Vol. 68, No. 4, pp. 978–989.

[025] Stephens M., Smith N.J. and Donnelly P. (2001b): Reply to Zhang et al. — Am. J. Hum. Genet., Vol. 69, No. 4, pp. 912–914.

[026] Tishkoff S.A., Pakstis A.J., Ruano G. and Kidd K.K. (2000): The accuracy of statistical methods for estimation of haplotype frequencies: An example from the CD4 locus. — Am. J. Hum. Genet., Vol. 67, No. 2, pp. 518–522.

[027] Trikka D., Fang Z., Renwick A., Jones S.H., Chakraborty R., Kimmel M. and Nelson D.L. (2002): Complex SNP-based haplotypes in three human helicases: implication for cancer association studies. — Genome Res., Vol. 12, No. 4, pp. 627–639.

[028] Wang N., Akey J.M., Zhang K., Chakraborty R. and Jin L. (2002): Distribution of recombination crossovers and the origin of haplotype blocks: The interplay of population history, recombination, and mutation. — Am. J. Hum. Genet., Vol. 71, No. 5, pp. 1227–1234.

[029] Wu C.F.J. (1983): On the convergence properties of the EM algorithm. — Ann. Stat., Vol. 11, No. 1, pp. 95–103. | Zbl 0517.62035

[030] Xu C.F., Lewis K., Cantone K.L., Khan P., Donnelly C., White N., Crocker N., Boyd P.R., Zaykin D.V. and Purvis I.J. (2002): Effectivness of computational methods in haplotype prediction. — Hum. Genet., Vol. 110, No. 2, pp. 148– 156.

[031] Zhang S., Pakstis A.J., Kidd K.K. and Zhao H. (2001): Comparision of two methods for haplotype reconstruction and haplotype frequency estimation from population data. — Am. J. Hum. Genet., Vol. 69, No. 4, pp. 906–912.