We present a method for determining the context-dependent denotation of simple object-denoting mathematical expressions in mathematical documents. Our approach relies on estimating the similarity between the linguistic context within which the given expression occurs and a set of terms from a flat domain taxonomy of mathematical concepts; one of 7 head concepts dominating a set of terms with highest similarity score to the symbol’s context is assigned as the symbol’s interpretation. The taxonomy we used was constructed semi-automatically by combining structural and lexical information from the Cambridge Mathematics Thesaurus and the Mathematics Subject Classification. The context information taken into account in the statistical similarity calculation includes lexical features of the discourse immediately adjacent to the given expression as well as global discourse. In particular, as part of the latter we include the lexical context of structurally similar expressions throughout the document and that of the symbol’s declaration statement if one can be found in the document. Our approach has been evaluated on a gold standard manually annotated by experts, achieving 66% precision.
@article{702605, title = {Using Discourse Context to Interpret Object-Denoting Mathematical Expressions}, booktitle = {Towards a Digital Mathematics Library. Bertinoro, Italy, July 20-21st, 2011}, series = {GDML\_Books}, publisher = {Masaryk University Press}, address = {Brno, Czech Republic}, year = {2011}, pages = {85-101}, url = {http://dml.mathdoc.fr/item/702605} }
Wolska, Magdalena; Grigore, Mihai; Kohlhase, Michael. Using Discourse Context to Interpret Object-Denoting Mathematical Expressions, dans Towards a Digital Mathematics Library. Bertinoro, Italy, July 20-21st, 2011, GDML_Books, (2011), pp. 85-101. http://gdmltest.u-ga.fr/item/702605/
An ontology for engineering mathematics, In: Proceedings 4th International Conference on Principles of Knowledge Representation and Reasoning. pp. 258–269 (1994). (1994)
Automatic recognition of multi-word terms: the C-value/NC-value method, International Journal on Digital Libraries 3(2), 115–130 (2000). (2000)
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL, In: Proceedings of the 12th European Conference on Machine Learning. pp. 491–502 (2001), http://cogprints.org/1796/. (2001) | Zbl 1007.68551
An algebraic proof of Iitaka’s conjecture, Archiv der Mathematik 79, 268–273 (2002), http://dx.doi.org/10.1007/s00013-002-8313-2. (2002) | MR 1944951 | Zbl 1011.14002
Maximizing semantic relatedness to perform word sense disambiguation, Research Report 25, University of Minnesota Supercomputing Institute (2005). (2005)
Corpus-based and knowledge-based measures of text semantic similarity, In: Proceedings of the 21st National Conference on Artificial Intelligence. pp. 775–780 (2006). (2006)
Modeling information scent: a comparison of LSA, PMI-IR and GLSA similarity measures on common tests and corpora, In: Proceedings of the 8th Conference on Large Scale Semantic Access to Content (RIAO-07). pp. 314– 332 (2007). (2007)
Extracting semantic representations from word co-occurrence statistics: A computational study, Behavior Research Methods 39(3), 510–526 (2007). (2007)
LaTeXML: A LaTeX to XML Converter, Web Manual at http://dlmf.nist.gov/LaTeXML/ (September 2007). (2007)
Semantic class learning from the Web with Hyponym Pattern Linkage Graphs, In: Proceedings of the ACL/HLT-08 Conference. pp. 1048–1056 (2008). (2008)
Mathematical Markup Language (MathML) version 3.0, W3C Working Draft of 24. September 2009, World Wide Web Consortium (2009), http://www.w3.org/TR/MathML3. (2009)
Towards context-based disambiguation of mathematical expressions, In: Selected Papers from the joint conference of ASCM 2009 and MACIS 2009: the 9th Asian Symposium on Computer Mathematics and the 3rd International Conference on Mathematical Aspects of Computer and Information Sciences. pp. 262–271 (2009). (2009) | Zbl 1186.68530
Word sense disambiguation: An overview, Language and Linguistics Compass 3(2), 537–558 (2009). (2009)
Transforming Large Collections of Scientific Publications to XML, Mathematics in Computer Science 3, 299–307 (2010). (2010) | Zbl 1205.68490
Symbol declarations in mathematical writing, In: Sojka, P. (ed.) Proceedings of the 3rd Workshop on Digital Mathematics Libraries. pp. 119–127 (2010). (2010)