Correcting spelling errors by modelling their causes
Deorowicz, Sebastian ; Ciura, Marcin
International Journal of Applied Mathematics and Computer Science, Tome 15 (2005), p. 275-285 / Harvested from The Polish Digital Mathematics Library

This paper accounts for a new technique of correcting isolated words in typed texts. A language-dependent set of string substitutions reflects the surface form of errors that result from vocabulary incompetence, misspellings, or mistypings. Candidate corrections are formed by applying the substitutions to text words absent from the computer lexicon. A minimal acyclic deterministic finite automaton storing the lexicon allows quick rejection of nonsense corrections, while costs associated with the substitutions serve to rank the remaining ones. A comparison of the correction lists generated by several spellcheckers for two corpora of English spelling errors shows that our technique suggests the right words more accurately than the others.

Publié le : 2005-01-01
EUDML-ID : urn:eudml:doc:207742
@article{bwmeta1.element.bwnjournal-article-amcv15i2p275bwm,
     author = {Deorowicz, Sebastian and Ciura, Marcin},
     title = {Correcting spelling errors by modelling their causes},
     journal = {International Journal of Applied Mathematics and Computer Science},
     volume = {15},
     year = {2005},
     pages = {275-285},
     language = {en},
     url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv15i2p275bwm}
}
Deorowicz, Sebastian; Ciura, Marcin. Correcting spelling errors by modelling their causes. International Journal of Applied Mathematics and Computer Science, Tome 15 (2005) pp. 275-285. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv15i2p275bwm/

[000] Atkinson K. (2002): Spell checker test kernel results. - Available at http://aspell.net/test

[001] Atkinson K. (2003): GNU Aspell. - Available at http://aspell.sourceforge.net/

[002] Baeza-Yates R. and Navarro G. (1998): Fast approximate string matching in a dictionary. - Proc. 5th Int. Symp. String Processing and Information Retrieval, SPIRE'98,Santa Crus de la Sierra, Bolivia, pp. 14-22.

[003] Berkel B. van and Smedt K.D. (1988): Triphone analysis: A combined method for the correction of orthographical and typographical errors. -Proc. 2nd Conf. Applied Natural Language Processing, Austin, pp. 77-83.

[004] Brill E. and Moore R.C. (2000): An improved error model for noisy channel spelling correction. -Proc. 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 286-293.

[005] Carrasco R. and Forcada M. (2002): Incremental construction and maintenance of minimal finite-state automata. - Comput. Linguistics, Vol. 28, No. 2, pp. 207-216. | Zbl 1232.68080

[006] Church K.W. and Gale W.A. (1991): Probability scoring for spelling correction. -Statist. Comput., Vol. 1, No. 1, pp. 93-103.

[007] Ciura M.G. and Deorowicz S. (2001): How to squeeze a lexicon. -Soft. Pract. Exper., Vol. 31, No. 11, pp. 1077-1090. | Zbl 0987.68782

[008] Czech Z.J., Havas G. and Majewski B.H. (1997): Perfect hashing. -Theoret. Comput. Sci., Vol. 182, No. 1-2, pp. 1-143.

[009] Daciuk J., Mihov S., Watson B.W. and Watson R.E. (2000): Incremental construction of minimal acyclic finite-state automata. -Comput. Linguistics, Vol. 26, No. 1, pp. 3-16. | Zbl 1232.68081

[010] Damerau F.J. (1964): A technique for computer detection and correction of spelling errors. -Comm. ACM, Vol. 7, No. 3, pp. 171-176.

[011] Damerau F.J. (1990): Evaluating computer-generated domain-vocabularies. -Inf. Process. Manag., Vol. 26, No. 6, pp. 791-801.

[012] Damerau F.J. and Mays E. (1989): An examination of undetected typing errors. -Inf. Process. Manag., Vol. 25, No. 6, pp. 659-664.

[013] Darragh J.J., Cleary J.G. and Witten I.H. (1993): Bonsai: A compact representation of trees. -Softw. Pract. Exper., Vol. 23, No. 3, pp. 277-291.

[014] Gadd T.N. (1990): PHONIX: The algorithm. -Program: Automat. Library Inf. Syst.,Vol. 24, No. 4, pp. 363-366.

[015] Grudin J. (1983): Error patterns in skilled and novice transcription typing, In: Cognitive Aspects of Skilled Typewriting, (W.E. Copper, Ed.). -New York: Springer-Verlag, pp. 121-143.

[016] Hodge V.J. and Austin J. (2003): A comparison of standard spell checking algorithms and a novel binary neural approach. - IEEE Trans. Knowl. Data Eng., Vol. 15, No. 5, pp. 1073-1081.

[017] Jassem W. (1983): The Phonology of Modern English. -Warsaw: Polish Scientific Publishers.

[018] Knuth D.E. (1973): The Art of Computer Programming (Vol. 3. Sorting and Searching Algorithms). - Reading, MA: Addison-Wesley. | Zbl 0302.68010

[019] Kuenning G.H. (2003): International ispell. - Available at http://www.cs.hmc.edu/~geoff/ispell.html.

[020] Kukich K. (1988): Variations on a back-propagation name recognition net. - Proc. Advanced Technology Conference, U.S. Postal Service, Wash. D.C., USA, pp. 722-735.

[021] Kukich K. (1992): Techniques for automatically correcting words in text. - ACM Comput. Surveys, Vol. 24, No. 4, pp. 377-439.

[022] Levenshtein V.I. (1966): Binary codes capable of correcting deletions, insertions and reversals. - Soviet Physics Doklady, Vol. 10, pp. 707-710.

[023] Maly K. (1976): Compressed tries. - Comm. ACM, Vol. 19, No. 7, pp. 409-415. | Zbl 0329.68037

[024] Mateescu D. (2003): English phonetics and phonological theory. - University of Bucharest, Romania. Available at http://www.unibuc.ro/eBooks/filologie/mateescu

[025] Merriam-Webster (2002): A dictionary of prefixes, suffixes, and combining forms from Webster's third new international dictionary, unabridged. - Merriam-Webster. Available at http://www.spellingbee.com/pre_suf_comb.pdf

[026] Mihov S. and Schulz K.U. (2004): Fast approximate search in large dictionaries. - Comput. Linguistics, Vol. 30, No. 4, pp. 451-477. | Zbl 1234.68424

[027] Mitton R. (1987): Spelling checkers, spelling correctors, and the misspellings of poor spellers. - Inf. Process. Manag., Vol. 23, No. 5, pp. 495-505.

[028] Mitton R. (1996): Spellchecking by computer. - J. Simplif. Spell. Soc., Vol. 20. No. 1, pp. 4-11.

[029] Morrison D.R. (1968): PATRICIA-Practical Algorithm to Retrieve Information Coded in Alphanumeric. - J. ACM, Vol. 15, No. 4, pp. 514-534.

[030] Odell M.K. and Russell R.C. (1918): U.S. Patent Numbers, 1,261,167 (1918) and 1,435,663 (1922). - U.S. Patent Office, Washington, D.C.

[031] Oflazer K. (1996): Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. - Comput. Linguistics, Vol. 22, No. 1, pp. 73-89.

[032] Peterson J.L. (1986): A note on undetected typing errors. - Comm. ACM, Vol. 29, No. 7, pp. 633-637.

[033] Philips L. (1990): Hanging on the metaphone. - Comput. Lang. Mag., Vol. 7, No. 12, pp. 38-44.

[034] Philips L. (2000): The double metaphone search algorithm. - CC++ Users Journal, Vol. 18, No. 6.

[035] Pollock J.J. and Zamora A. (1983): Collection and characterization of spelling errors in scientific and scholary text. - J. Amer. Soc. Inf. Sci., Vol. 34, No. 1, pp. 51-58.

[036] Pollock J.J. and Zamora A. (1984): Automatic spelling correction in scientific and scholarly text. - Comm. ACM, Vol. 27, No. 4, pp. 358-368.

[037] Savary A. (2001): Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction. - Lect. Notes Comput. Sci., Vol. 2494, pp. 251-260. | Zbl 1015.68555

[038] Schulz K.U. and Mihov S. (2002): Fast string correction with Levenshtein-automata. - Int. J. Docum. Anal. Recognit., Vol. 5, No. 1, pp. 67-85. | Zbl 1039.68137

[039] Toutanova K. and Moore R.C. (2002): Pronunciation modelling for improved spelling correction. - Proc. 40th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 144-151.

[040] Wagner R.A. (1974): Order-n correction for regular languages. - Comm. ACM, Vol. 17, No. 5, pp. 265-268. | Zbl 0276.68011

[041] Wikipedia (2003): Wikipedia: List of common misspellings. - Available at http://en2.wikipedia.org/wiki/Wikipedia:List_of_common_misspellings

[042] Yannakoudakis E.J. and Fawthrop D. (1983a): An intelligent spelling corrector. - Inf. Process. Manag., Vol. 19, No. 12, pp. 101-108.

[043] Yannakoudakis E.J. and Fawthrop D. (1983b): The rules of spelling errors. - Inf. Process. Manag., Vol. 19, No. 2, pp. 87-99.