We study the problem of learning regular tree languages from text. We show that the framework of function distinguishability, as introduced by the author in [Theoret. Comput. Sci. 290 (2003) 1679-1711], can be generalized from the case of string languages towards tree languages. This provides a large source of identifiable classes of regular tree languages. Each of these classes can be characterized in various ways. Moreover, we present a generic inference algorithm with polynomial update time and prove its correctness. In this way, we generalize previous works of Angluin, Sakakibara and ourselves. Moreover, we show that this way all regular tree languages can be approximately identified.
@article{ITA_2007__41_4_351_0, author = {Fernau, Henning}, title = {Learning tree languages from text}, journal = {RAIRO - Theoretical Informatics and Applications - Informatique Th\'eorique et Applications}, volume = {41}, year = {2007}, pages = {351-374}, doi = {10.1051/ita:2007030}, mrnumber = {2377968}, zbl = {pre05301986}, zbl = {1144.68031}, language = {en}, url = {http://dml.mathdoc.fr/item/ITA_2007__41_4_351_0} }
Fernau, Henning. Learning tree languages from text. RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications, Tome 41 (2007) pp. 351-374. doi : 10.1051/ita:2007030. http://gdmltest.u-ga.fr/item/ITA_2007__41_4_351_0/
[1] Forming grammars for structured documents: an application of grammatical inference, in Proceedings of the Second International Colloquium on Grammatical Inference (ICGI-94): Grammatical Inference and Applications, edited by R.C. Carrasco and J. Oncina. Lect. Notes Artif. Int. 862 (1994) 153-167.
, and ,[2] Inductive inference of formal languages from positive data. Inform. Comput. 45 (1980) 117-135. | Zbl 0459.68051
,[3] Inference of reversible languages. J. ACM 29 (1982) 741-765. | Zbl 0485.68066
,[4] Learning regular sets from queries and counterexamples. Inform. Comput. 75 (1987) 87-106. | Zbl 0636.68112
,[5] GIFT: grammatical inference for terms, in Conférence d'Apprentissage, Palaiseau, May 1999. English version: Late breaking paper of International Conference on Inductive Logic Programming; French journal version: Apprentissage de programmes logiques par inférence grammaticale. Revue d'Intelligence Artificielle 14 (2001) 375-396.
and ,[6] Identification of reversible dependency tree languages, in Proc. 3rd Workshop on Learning Languages in Logic LLL'01, edited by L. Popelínský and M. Nepil, 11-22. Technical report FIMU-RS-2001-08, FI MU Brno, Czech Republic, see http://www.fi.muni.cz/ilpnet2/LLL2001/#proceedings, September 2001.
and ,[7] Learning tree languages from positive examples and membership queries, in Algorithmic Learning Theory, 15th International Conference, ALT, edited by S. Ben-David, J. Case and A. Maruoka. Lect. Notes Comput. Sci. 3244 (2004) 440-453. | Zbl 1110.68390
and ,[8] Theory-guided induction of logic programs by inference of regular languages. in Proc. of the 13th International Conference on Machine Learning, Morgan Kaufmann, (1996) 46-53.
,[9] Stochastic inference of regular tree languages. Mach. Learn. 44 (2001) 185-197. | Zbl 0983.68106
, and ,[10] A similarity between probabilistic tree languages: Application to XML document families. Pattern Recogn. 36 (2003) 2197-2199. | Zbl 1047.68550
and ,[11] Query based learning of XPath fragments, in Proceedings of Dagstuhl Seminar on Machine Learning for the Semantic Web (05071), Dagstuhl, Germany, (2005).
and ,[12] The use of grammatical inference for designing programming languages. Comm. ACM 16 (1972) 83-90. | Zbl 0248.68037
, and ,[13] D. Maulsby, B.A. Myers and A. Turransky, Eds. Watch What I Do: Programming by Demonstration. MIT Press, 1993. Available online at http://www.acypher.com/wwid/WWIDToC.html
, , ,[14] Query by browsing. in Proceedings of IDS'94: The 2nd International Workshop on User Interfaces to Databases, edited by P. Sawyer, Springer (1994) 236-248. HTML version: http://www.comp.lancs.ac.uk/computing/users/dixa/papers/QbB-IDS94/
and ,[15] Grammatical Picture Generation - A Tree-Based Approach. Texts in Theoretical Computer Science. An EATCS Series. Springer, (2006). More information can be found on the web page http://www.cs.umu.se/drewes/picgen/index.html | Zbl 1085.68177
,[16] Learning a regular tree language from a teacher. In Proc. Developments in Language Theory DLT 2003, edited by Z. Ésik and Z. Fülöp. Lect. Notes Comput. Sci. 2710 (2003) 279-291. | Zbl 1037.68082
and ,[17] Query learning of regular tree languages: How to avoid dead states. Theor. Comput. Syst. (2006). To appear. | MR 2284836 | Zbl 1107.68049
and ,[18] Learning context-free languages from their structured sentences. SIGACT News, 15 (1983) 24-35. | Zbl 0528.68051
,[19] -gram extensions of terminal distinguishable languages. In International Conference on Pattern Recognition (ICPR 2000). IEEE/IAPR 2 (2000) 125-128.
,[20] Learning XML grammars. in Machine Learning and Data Mining in Pattern Recognition MLDM'01, edited by P. Perner. Lect. Notes Artif. Int. 2123 (2001) 73-87. | Zbl 0997.68607
,[21] Learning tree languages from text. in Computational Learning Theory COLT 2002, edited by J. Kivinen and R.H. Sloan. Lect. Notes Artif. Int. 2375 (2002) 153-168. | Zbl 1050.68060
,[22] Identification of function distinguishable languages. Theoret. Comput. Sci. 290 (2003) 1679-1711. | Zbl 1051.68092
,[23] Identifying terminal distinguishable languages. Ann. Math. Artif. Intell. 40 (2004) 263-281. | Zbl 1081.68035
,[24] Grammar induction: An invitation to formal language theorists. GRAMMARS 7 (2004) 45-55.
and ,[25] Algorithms for learning function distinguishable regular languages. in Structural, Syntactic, and Statistical Pattern Recognition SSPR and SPR 2002, edited by T. Caelli, A. Amin, R.P.W. Duin, M. Kamel and D. de Ridder. Lect. Notes Comput. Sci. 2396 (2002) 64-72. | Zbl 1073.68656
and ,[26] Permutations and control sets for learning non-regular language families. in Grammatical Inference: Algorithms and Applications, 5th International Colloquium (ICGI 2000), edited by A.L. Oliveira. Lect. Notes Artif. Int., 1891 (2000) 75-88. | Zbl 0974.68089
and ,[27] Consistent identification in the limit of any of the classes of -valued is NP-hard. in Logical Aspects of Computational Linguistics LACL'01, edited by P. de Groote, G. Morrill and C. Retoré. Lect. Notes Artif. Int. 2099 (2001) 125-138. | Zbl 0990.68080
,[28] Inference of tree automata from sample set of trees. Int. J. Comput. Inform. Sci. 13 (1984) 177-196. | Zbl 0566.68064
and ,[29] Learning -testable tree sets from positive data. Technical Report DSIC/II/46/1993, Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, http://www.dsic.upv.es/users/tlcc/tlcc.html (1993).
,[30] Inference of recognizable tree sets. Technical Report DSIC-II/47/93, Departamento de Sistemas Informáticos y Computación (1993).
and ,[31] XTRACT: learning document type descriptors from XML document collections. Data Min. Knowl. Discov. 7 (2003) 23-56.
, , , and ,[32] Language identification in the limit. Inform. Comput. 10 (1967) 447-474. | Zbl 0259.68032
,[33] Syntactic Pattern Recognition; An Introduction. Addison-Wesley (1978). | MR 495282 | Zbl 0383.68065
and ,[34] Data-driven inductive inference of finite-state automata. Int. J. Pattern Recogn. 8 (1994) 305-322.
,[35] A bibliographical study of grammatical inference. Pattern Recogn. 38 (2005) 1332-1348.
,[36] Current trends in grammatical inference. in Advances in Pattern Recognition, Joint IAPR International Workshops SSPR+SPR'2000, edited by F.J. Ferri et al. Lect. Notes Comput. Sci. 1876 (2000) 28-31. | Zbl 0996.68569
,[37] Induction of integrated view for XML data with heterogenous DTDs. In Proceedings of the 10th International Conference on Information and Knowledge Management CIKM, ACM, (2001) 151-158.
and ,[38] Learnable Classes of Categorial Grammars. Ph.D., CSLI (1998). | MR 1638140 | Zbl 0937.68130
,[39] Shapes, shocks, and deformations, I. Int. J. Comput. Vision 15 (1995) 189-224.
, and ,[40] A method for inferring context-free grammars. Inform. Comput. 31 (1976) 129-146. | Zbl 0333.68065
and ,[41] Inference of -testable tree languages, in Proc. IAPR Workshop on Structural and Syntactical Pattern Recognition. World Scientific (1992) 109-120.
,[42] How to invent characterizable methods for regular languages. in 4th Workshop on Algorithmic Learning Theory ALT'93, edited by K.P. Jantke et al. Lect. Notes Artif. Int. 744 (1993) 209-222. | Zbl 0925.68365
,[43] The inference of tree languages from finite samples: an algebraic approach. Theoret. Comput. Sci. 129 (1994) 337-367. | Zbl 0822.68055
and ,[44] Learning approximately regular languages with reversible languages. Theoret. Comput. Sci. 174 (1997) 251-257. | Zbl 0908.68146
and ,[45] On the Myhill-Nerode theorem for trees. EATCS Bull. 47 (1992) 170-173. | Zbl 0757.68083
,[46] Your Wish is My Command: Programming by Example. Morgan Kaufmann (2001).
, editor.[47] Error correcting tree language inference. Pattern Recogn. Lett. 23 (2002) 1-12. | Zbl 0993.68057
and ,[48] Syntactic pattern recognition by error correcting analysis on tree automata, in Advances in Pattern Recognition, Joint IAPR International Workshops SSPR+SPR'2000, edited by F.J. Ferri et al. Lect. Notes Comput. Sci. 1876 (2000) 133-142. | Zbl 0996.68532
and ,[49] Error correcting analysis for tree languages. Int. J. Pattern Recogn. 14 (2000) 357-368.
, and ,[50] Inference of reversible tree languages. IEEE T. Syst. Man Cy. 34 (2004) 1658-1665.
, and ,[51] Error-correcting tree automata for syntactic pattern recognition. IEEE T. Comput. 27 (1978) 1040-1053. | Zbl 0392.68076
and ,[52] Learning of ordered tree languages with height-bounded variables using queries. in Algorithmic Learning Theory, 15th International Conference, ALT, edited by S. Ben-David, J. Case and A. Maruoka. Lect. Notes Comput. Sci. 3244 (2004) 425-439. | Zbl 1110.68402
and ,[53] Automata, logic, and XML. In Computer Science Logic; 16th International Workshop, CSL 2002, edited by J. Bradfield. Lect. Notes Comput. Sci. 2471 (2002) 2-26. | Zbl 1020.68044
,[54] Inference of regular grammars via skeletons. IEEE T. Syst. Man Cy. 17 (1987) 982-992.
and ,[55] Inference of even linear grammars and its application to picture description languages. Pattern Recogn. 21 (1988) 55-62. | Zbl 0653.68077
and ,[56] Probabilistic -testable tree languages. in Grammatical Inference: Algorithms and Applications, 5th International Colloquium (ICGI 2000), edited by A.L. Oliveira. Lect. Notes Artif. Int. 1891 (2000) 221-228. | Zbl 0974.68514
, and ,[57] Stochastic -testable tree languages and applications, in Grammatical Inference: Algorithms and Applications, 6th International Colloquium, ICGI 2002, edited by P. Adriaans, H. Fernau and M. van Zaanen. Lect. Notes Artif. Int. 2484 (2002) 199-212. | Zbl 1028.68608
, and ,[58] Handbook of Formal Languages, Volume III. Springer, Berlin (1997). | Zbl 0866.68057
and , eds.[59] Learning context-free grammars from structural data in polynomial time. Theoret. Comput. Sci. 76 (1990) 223-242. | Zbl 0704.68067
,[60] Efficient learning of context-free grammars from positive structural examples. Inform. Comput. 97 (1992) 23-60. | Zbl 0764.68146
,[61] Learning context-free grammars from partially structured examples, in Grammatical Inference: Algorithms and Applications, 5th International Colloquium (ICGI 2000), edited by A.L. Oliveira. Lect. Notes Artif. Int. 1891 (2000) 229-240. | Zbl 0974.68530
and ,[62] Learning a subclass of linear languages from positive structural information. in Proceedings of the Fourth International Colloquium on Grammatical Inference (ICGI-98), edited by V. Honavar and G. Slutski. Lect. Notes Artif. Int. 1433 (1998) 162-174.
and ,[63] Shock graphs and shape matching. Int. J. Comput. Vision 30 (1999) 1-24.
, , and ,[64] Developing Spoken Dialog Systems using Grammatical Inference. Ph.D. Thesis, The University of Newcastle (AUS) (2005).
,[65] The Boisdale algorithm - an induction method for a subclass of unification grammar from positive data, in Grammatical Inference: Algorithms and Applications; 7th International Colloquium ICGI, edited by G. Paliouras and Y. Sakakibara. Lect. Notes Artif. Int. 3264 (2004) 235-247. | Zbl 1111.68481
and ,[66] A note on grammatical inference of slender context-free languages, in Proceedings of the Third International Colloquium on Grammatical Inference (ICGI-96): Learning Syntax from Sentences, edited by L. Miclet and C. de la Higuera. Lect. Notes Artif. Int. 1147 (1996) 117-125.
and ,[67] Error-correcting parsers for formal languages. IEEE T. Comput. 27 (1978) 605-616. | Zbl 0379.68050
and ,[68] Tree -grammar models for natural language modelling and parsing, in Structural, Syntactic, and Statistical Pattern Recognition SSPR and SPR 2002, edited by T. Caelli, A. Amin, R.P.W. Duin, M. Kamel and D. de Ridder. Lect. Notes Comput. Sci. 2396 (2002) 56-63. | Zbl 1073.68842
, , and ,[69] Parsing with probabilistic strictly locally testable tree languages. IEEE T. Pattern Anal., 27 (2005) 1040-1050.
, and ,[70] Grammars with generalized contextfree rules and their tree automata. in Proceedings of CLIN '99; Selected Papers (1999) 223-233, see http://www-uilots.let.uu.nl/publications/clin1999/papers.html
,[71] Inductive inference of context-free languages based on context-free expressions. Int. J. Comput. Math. 24 (1988) 115-140. | Zbl 0658.68095
,[72] On learning systolic languages. in Proceedings of the 3rd Workshop on Algorithmic Learning Theory (ALT '92), edited by K.P. Jantke, S. Doshita, K. Furukawa and T. Nishida. Lect. Notes Artif. Int. 743 (1992) 41-52. | Zbl 0925.68368
,[73] Polynomial-time identification of very simple grammars from positive data. Theoret. Comput. Sci. 298 (2003) 179-206. | Zbl 1038.68074
,