The Robinson-Foulds (RF) distance is the most popular method of evaluating the dissimilarity between phylogenetic trees. In this paper, we define and explore in detail properties of the Matching Cluster (MC) distance, which can be regarded as a refinement of the RF metric for rooted trees. Similarly to RF, MC operates on clusters of compared trees, but the distance evaluation is more complex. Using the graph theoretic approach based on a minimum-weight perfect matching in bipartite graphs, the values of similarity between clusters are transformed to the final MC-score of the dissimilarity of trees. The analyzed properties give insight into the structure of the metric space generated by MC, its relations with the Matching Split (MS) distance of unrooted trees and asymptotic behavior of the expected distance between binary n-leaf trees selected uniformly in both MC and MS (Θ(n^{3/2})).
@article{bwmeta1.element.bwnjournal-article-amcv23z3p669bwm, author = {Damian Bogdanowicz and Krzysztof Giaro}, title = {On a matching distance between rooted phylogenetic trees}, journal = {International Journal of Applied Mathematics and Computer Science}, volume = {23}, year = {2013}, pages = {669-684}, zbl = {1294.05066}, language = {en}, url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv23z3p669bwm} }
Damian Bogdanowicz; Krzysztof Giaro. On a matching distance between rooted phylogenetic trees. International Journal of Applied Mathematics and Computer Science, Tome 23 (2013) pp. 669-684. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv23z3p669bwm/
[000] Alberich, R., Cardona, G., Rosselló, F. and Valiente, G. (2009). An algebraic metric for phylogenetic trees, Applied Mathematics Letters 22(9): 1320-1324. | Zbl 1173.05314
[001] Aldous, D.J. (1991). The continuum random tree II: An overview, in M.T. Barlow and N. H. Bingham (Eds.), Stochastic Analysis, Cambridge University Press, Cambridge, pp. 23-70. | Zbl 0791.60008
[002] Bansal, M., Burleigh, J.G., Eulenstein, O. and Fernández-Baca, D. (2010). Robinson-Foulds supertrees, Algorithms for Molecular Biology 5(1): 18.
[003] Bansal, M.S., Dong, J. and Fernández-Baca, D. (2011). Comparing and aggregating partially resolved trees, Theoretical Computer Science 412(48): 6634-6652. | Zbl 1227.92040
[004] Biedrzycki, R. and Arabas, J. (2012). KIS: An automated attribute induction method for classification of DNA sequences, International Journal of Applied Mathematics and Computer Science 22(3): 711-721, DOI: 10.2478/v10006-012-0053-2. | Zbl 1330.92092
[005] Bininda-Emonds, O.R.P., Cardillo, M., Jones, K.E., MacPhee, R. D.E., Beck, R.M.D., Grenyer, R., Price, S.A., Vos, R.A., Gittleman, J.L. and Purvis, A. (2007). The delayed rise of present-day mammals, Nature 446(7135): 507-512.
[006] Blum, M.G.B., François, O. and Janson, S. (2006). The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance, The Annals of Applied Probability 16(4): 2195-2214. | Zbl 1124.05025
[007] Boc, A., Philippe, H. and Makarenkov, V. (2010). Inferring and validating horizontal gene transfer events using bipartition dissimilarity, Systematic Biology 59(2): 195-211.
[008] Bogdanowicz, D. and Giaro, K. (2012). Matching split distance for unrooted binary phylogenetic trees, IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(1): 150-160.
[009] Bogdanowicz, D., Giaro, K. and Wróbel, B. (2012). TreeCmp: Comparison of trees in polynomial time, Evolutionary Bioinformatics 8: 475-487.
[010] Bolikowski, L. and Gambin, A. (2007). New metrics for phylogenies, Fundamenta Informaticae 78(2): 199-216. | Zbl 1116.92048
[011] Boorman, S.A. and Olivier, D.C. (1973). Metrics on spaces of finite trees, Journal of Mathematical Psychology 10(1): 26-59. | Zbl 0271.92011
[012] Bordewich, M. and Semple, C. (2005). On the computational complexity of the rooted subtree prune and regraft distance, Annals of Combinatorics 8(4): 409-423. | Zbl 1059.05035
[013] Brinkmeyer, M., Griebel, T. and Böcker, S. (2011). Polynomial supertree methods revisited, Advances in Bioinformatics 2011: 524182. | Zbl 1311.92132
[014] Bryant, D. (1997). Building Trees, Hunting for Trees, and Comparing Trees-Theory and Methods in Phylogenetic Analysis, Ph.D. thesis, University of Canterbury, Christchurch.
[015] Cardona, G., Llabrés, M., Rosselló, F. and Valiente, G. (2009a). Metrics for phylogenetic networks I: Generalizations of the Robinson-Foulds metric, IEEE/ACM Transactions on Computational Biology and Bioinformatics 6(1): 46-61.
[016] Cardona, G., Llabrés, M., Rosselló, F. and Valiente, G. (2009b). Metrics for phylogenetic networks II: Nodal and triplets metrics, IEEE/ACM Transactions on Computational Biology and Bioinformatics 6(3): 454-469.
[017] Cardona, G., Rossello, F. and Valiente, G. (2009). Comparison of tree-child phylogenetic networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics 6(4): 552-569.
[018] Cardona, G., Llabrés, M., Rosselló, F. and Valiente, G. (2010). Nodal distances for rooted phylogenetic trees, Journal of Mathematical Biology 61(2): 253-276. | Zbl 1202.92060
[019] Critchlow, D.E., Pearl, D.K. and Qian, C. (1996). The triples distance for rooted bifurcating phylogenetic trees, Systematic Biology 45(3): 323-334.
[020] Darlu, P. and Guénoche, A. (2011). TreeOfTrees method to evaluate the congruence between gene trees, Journal of Classification 28(3): 390-403. | Zbl 06562761
[021] Davies, T.J., Barraclough, T.G., Chase, M.W., Soltis, P.S., Soltis, D.E. and Savolainen, V. (2004). Darwin's abominable mystery: Insights from a supertree of the angiosperms, Proceedings of the National Academy of Sciences of the United States of America 101(7): 1904-1909.
[022] Felsenstein, J. (2003). Inferring Phylogenies, Sinauer Associates, Sunderland, MA.
[023] Frąckiewicz, M. and Palus, H. (2011). KHM clustering technique as a segmentation method for endoscopic colour images, International Journal of Applied Mathematics and Computer Science 21(1): 203-209, DOI: 10.2478/v10006-011-0015-0.
[024] Gabow, H.N. and Tarjan, R.E. (1989). Faster scaling algorithms for network problems, SIAM Journal on Computing 18(5): 1013-1036. | Zbl 0679.68079
[025] Górecki, P. and Eulenstein, O. (2012). A Robinson-Foulds measure to compare unrooted trees with rooted trees, in L. Bleris, I. Mandoiu, R. Schwartz and J. Wang (Eds.), Bioinformatics Research and Applications, Lecture Notes in Computer Science, Vol. 7292, Springer, Berlin/Heidelberg, pp. 115-126.
[026] Gusfield, D. (1991). Efficient algorithms for inferring evolutionary trees, Networks 21(1): 19-28. | Zbl 0719.92015
[027] Hayes, M., Walenstein, A. and Lakhotia, A. (2009). Evaluation of malware phylogeny modelling systems using automated variant generation, Journal in Computer Virology 5(4): 335-343.
[028] Hillis, D.M., Heath, T.A. and John, K.S. (2005). Analysis and visualization of tree space, Systematic Biology 54(3): 471-482.
[029] Kennedy, M., Page, R. D.M. and Prum, R. (2002). Seabird supertrees: Combining partial estimates of procellariiform phylogeny, The Auk 119(1): 88-108.
[030] Lin, Y., Rajan, V. and Moret, B.M.E. (2012). A metric for phylogenetic trees based on matching, IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(4): 1014-1022.
[031] Ma, B., Li, M. and Zhang, L. (1998). On reconstructing species trees from gene trees in term of duplications and losses, Proceedings of the 2nd Annual International Conference on Computational Molecular Biology, RECOMB'98, New York, NY, USA, pp. 182-191.
[032] McKenzie, A. and Steel, M. (2000). Distributions of cherries for two models of trees, Mathematical Biosciences 164(1): 81-92. | Zbl 0947.92021
[033] Munzner, T., Guimbretière, F., Tasiran, S., Zhang, L. and Zhou, Y. (2003). TreeJuxtaposer: Scalable tree comparison using focus+context with guaranteed visibility, ACM Transactions on Graphics 22(3): 453-462.
[034] Nguyen, N., Mirarab, S. and Warnow, T. (2012). MRL and SuperFine+MRL: New supertree methods, Algorithms for Molecular Biology 7(1): 3.
[035] Nye, T.M., Liò, P. and Gilks, W.R. (2006). A novel algorithm and web-based tool for comparing two alternative phylogenetic trees, Bioinformatics 22(1): 117-119.
[036] Orlin, J.B. and Ahuja, R.K. (1992). New scaling algorithms for the assignment and minimum mean cycle problems, Mathematical Programming 54(1-3): 41-56. | Zbl 0764.90059
[037] Penny, D., Watson, E.E. and Steel, M.A. (1993). Trees from languages and genes are very similar, Systematic Biology 42(3): 382-384.
[038] Pompei, S., Loreto, V. and Tria, F. (2011). On the accuracy of language trees, PLoS ONE 6(6): e20109.
[039] Restrepo, G., Héber, M. and Llanos, E.J. (2007). Three dissimilarity measures to contrast dendrograms, Journal of Chemical Information and Modeling 47(3): 761-770.
[040] Robinson, D.F. and Foulds, L.R. (1981). Comparison of phylogenetic trees, Mathematical Biosciences 53(1-2): 131-147. | Zbl 0451.92006
[041] Sackin, M.J. (1972). “Good” and “bad” phenograms, Systematic Zoology 21(2): 225-226.
[042] Semple, C. and Steel, M. (2003). Phylogenetics, Oxford University Press, Oxford.
[043] Shao, K.-T. and Sokal, R.R. (1990). Tree balance, Systematic Zoology 39(3): 266-276.
[044] Steel, M. A. and Penny, D. (1993). Distributions of tree comparison metrics-some new results, Systematic Biology 42(2): 126-141.
[045] Stockham, C., Wang, L.-S. and Warnow, T. (2002). Statistically based postprocessing of phylogenetic analysis by clustering, Bioinformatics 18(suppl 1): S285-S293.
[046] Suri, R. and Warnow, T. (2010). Spruce, Website, http://www.cs.utexas.edu/~phylo/software/spruce/.
[047] Swenson, M.S., Suri, R., Linder, C.R. and Warnow, T. (2011). An experimental study of Quartets MaxCut and other supertree methods, Algorithms for Molecular Biology 6(1): 7.
[048] Swofford, D. (2002). PAUP*. Phylogenetic Analysis Using Parsimony (*and other methods). Version 4, Sinauer Associates, Sunderland, MA.
[049] Wang, J.T., Shan, H., Shasha, D. and Piel, W.H. (2005). Fast structural search in phylogenetic databases, Evolutionary Bioinformatics Online 1: 37-46.
[050] Williams, W.T. and Clifford, H.T. (1971). On the comparison of two classifications of the same set of elements, Taxon 20(4): 519-522.