L'attribuzione dei testi gramsciani: metodi e modelli matematici
Basile, Chiara ; Benedetto, Dario ; Caglioti, Emanuele ; Degli Esposti, Mirko
La Matematica nella Società e nella Cultura. Rivista dell'Unione Matematica Italiana, Tome 3 (2010), p. 235-269 / Harvested from Biblioteca Digitale Italiana di Matematica

In questo lavoro illustriamo un metodo matematico per affrontare iproblemi di attribuzione di autore, sviluppato in vista della nuova "Edizione Nazionale degli scritti di Antonio Gramsci". Il metodo è basato su alcune importanti idee della matematica moderna, che offrono interessanti prospettive nell'analisi dei testi.

In this paper we discuss a mathematical approach to authorship attribution we have developed in view of the new "Edizione Nazionale degli scritti di Antonio Gramsci". The techniques we use are based on some important ideas of modern mathematics, which provide interesting perspectives on the analysis of texts.

Publié le : 2010-08-01
@article{RIUMI_2010_1_3_2_235_0,
     author = {Chiara Basile and Dario Benedetto and Emanuele Caglioti and Mirko Degli Esposti},
     title = {L'attribuzione dei testi gramsciani: metodi e modelli matematici},
     journal = {La Matematica nella Societ\`a e nella Cultura. Rivista dell'Unione Matematica Italiana},
     volume = {3},
     year = {2010},
     pages = {235-269},
     zbl = {1250.94026},
     mrnumber = {2767070},
     language = {it},
     url = {http://dml.mathdoc.fr/item/RIUMI_2010_1_3_2_235_0}
}
Basile, Chiara; Benedetto, Dario; Caglioti, Emanuele; Degli Esposti, Mirko. L'attribuzione dei testi gramsciani: metodi e modelli matematici. La Matematica nella Società e nella Cultura. Rivista dell'Unione Matematica Italiana, Tome 3 (2010) pp. 235-269. http://gdmltest.u-ga.fr/item/RIUMI_2010_1_3_2_235_0/

[1] Basile, C. - Benedetto, D. - Caglioti, E. - Degli Esposti, M., An example of mathematical authorship attribution, Journal of Mathematical Physics, 49, 1-20 (2008). | MR 2484342 | Zbl 1159.81302

[2] Benedetto, D. - Caglioti, E. - Loreto, V., Language Trees and Zipping, Phys. Rev. Lett. 88, n. 4, 048702-1, 048702-4 (2002).

[3] Bennett, W. R., Scientific and engineering problem-solving with the computer, Prentice-Hall, Inc.Englewood Cliffs, New Jersey (1976).

[4] Cavalli-Sforza, L. L. - Menozzi, P. - Piazza, A., Storia e geografia dei geni umani, Milano, Adelphi 2000.

[5] Clement, R. - Sharp, D., Ngram and Bayesian Classification of Documents for Topic and Authorship, Lit. Ling. Comp. 18, n. 4 423 (2003).

[6] De Morgan, A., in Memoirs of Augustus de Morgan by his wife Sophia Elizabeth de Morgan with Selections from his Letters, (Longman's Green and Co., London, 1851/1882).

[7] Grassberger, P., Data compression and entropy estimates by non-sequential recursive pair substitution, ArXiv:physics/0207023

[8] Grieve, J. W., Quantitative Authorship Attribution: a History and an Evaluation of Techniques. http://hdl.handle.net/1892/2055, Lit. Ling. Comp.22, 251 (2007).

[9] Juola, P., Cross-entropy and linguistic typology, Proceeding of New Methods in Language Processing 3, Sidney, 1998.

[10] Juola, P., Authorship Attribution, Foundations and Trends in Information Retrieval, vol. 1, no. 3, 233-334 (2006).

[11] Khmelev, D. V. - Kukushkina, O. V. - Polikarpov, A. A. - Khmelev, D. V., Using literal and grammatical statistics for authorship attribution, Problemy Peredachi Informatsii, 37 (2), 2000, pagg. 96-108, translated in English in Problems of Information Transmission, 37 (2001) 172-184. | MR 2099901 | Zbl 1008.62118

[12] Keselj, V. - Peng, F. - Cercone, N. - Thomas, C., N-gram-based Author Profiles for Authorship Attribution, Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING'03, Dalhousie University, Halifax, Nova Scotia, Canada, August 2003, pagg. 255-264.

[13] Keselj, V. - Cercone, N., CNG Method with Weighted Voting Ad-hoc Authorship Attribution Competition (AAAC), June 2004. Part of ALLC/ACH 2004 conference.

[14] Khmelev, D. V. - Tweedie, F. J., Using Markov Chains for Identification of Writers, Lit. Ling. Comp. 16, 3: 299-307 (2001).

[15] Markov, A. A., Primer statisticheskogo issledovanija nad tekstom "Evgenija Onegina" illjustrirujuschij svjaz' ispytanij v tsep. (An example of statistical study on the text of "Eugene Onegin" illustrating the linking of events to a chain.), Izvestija Imp. Akademii naukVI, 153-162 (1913).

[16] Markov, A. A., Ob odnom primeneni statisticheskogo metoda. (On some application of statistical method), Izvestija Imp. Akademii nauk serijaVI, 4: 239-42 (1916).

[17] Mendenhall, T. C., The characteristic curves of composition, Science, vol. IX, 237-249 (1887).

[18] Pierce, J. R., La Teoria dell'Informazione, Milano, Mondadori, 1963.

[19] Puglisi, A. - Benedetto, D. - Caglioti, E. - Loreto, V. - Vulpiani, A., Data compression and learning in time sequences analysis, Phys. D 180, no. 1-2, 92-107 (2003). | MR 1984306 | Zbl 1094.68567

[20] Shannon, C. E., A Mathematical Theory of Communication, The Bell System Technical Journal27, 1948, p. 623. | MR 26286

[21] Teahan, W. J., Text classification and segmentation using minimum cross-entropy, Proceedings of the International Conference on Content-based Multimedia Information Access (RIAO 2000), pages 943-961. C.I.D.-C.A.S.I.S, Paris, 2000.

[22] Witten, I. H. - Moffat, A. - Bell, T. C., Managing Gigabytes, second edition, Morgan Kaufmann Publishers, 1999.

[23] Wyner, A. D., Typical sequences and all that: Entropy, Pattern Matching and Data Compression, 1994 Shannon Lecture, IEEE Information Theory Society Newsletter, July 1995.

[24] Ziv, J. - Lempel, A., A universal algorithm for sequential data compression, IEEE Transactions on Information Theory, IT-23 no. 3, pagg. 337-343 (1977). | MR 530215 | Zbl 0379.94010

[25] Ziv, J. - Merhav, N., A measure of relative entropy between individual sequences with application to universal classification, IEEE Transactions of Information Theory, 39 (4), 1993, pagg. 1270-1279. | MR 1267157 | Zbl 0801.94004