The digitization of papers born in the print-only era is vital for the health of the mathematical record. Many large scale retrodigitization projects are underway and, at this point, probably more that half of the mathematical history has been finished. Many smaller journals and books remain to be done. This paper gives a framework within which these may also be completed. It uses the digitization of the Canadian Journal of Mathematics (53,000 pages), completed as a one-man project over a few months, as the working example. The project described herein not only may be used as a model for similar efforts but also indicates some interesting problems yet to be solved.
@article{702534, title = {Small Scale Retrodigitization}, booktitle = {Towards Digital Mathematics Library. Birmingham, United Kingdom, July 27th, 2008}, series = {GDML\_Books}, publisher = {Masaryk University}, address = {Brno}, year = {2008}, pages = {103-113}, zbl = {1170.68486}, url = {http://dml.mathdoc.fr/item/702534} }
Doob, Michael. Small Scale Retrodigitization, dans Towards Digital Mathematics Library. Birmingham, United Kingdom, July 27th, 2008, GDML_Books, (2008), pp. 103-113. http://gdmltest.u-ga.fr/item/702534/
, The home web site for ArXiv is http://arxiv.org/ and is hosted by the Cornell University Library. The history of ArXiv is given in the article at http://en.wikipedia.org/wiki/ArXiv.
, http://www.ceic.math.ca/Publications/retro_bestpractices.pdf.
, has had some encouraging results using Perl scripts developed by his working group at Cornell. His software has only been circulated informally. | Zbl 0527.16007
, The project location is http://code.google.com/p/tesseract-ocr.
, Described at http://en.wikipedia.org/wiki/Tesseract_(software)and announced at http://google-code-updates.blogspot.com/2006/08/announcing-tesseract-ocr.html.
, The main site is at http://www.imagemagick.org/script/index.php.
, A full description of this project is at http://minidml.mathdoc.fr/.
NUMDAM
, See http://en.wikipedia.org/wiki/OCRopus.
, The home page for this software is http://www.pdfhacks.com/pdftk/.
, Documentation for the hyperref package can be found both at http://en.wikibooks.org/wiki/LaTeX/Packages/Hyperref and at http://www.tug.org/applications/hyperref/.
, http://www-sop.inria.fr/apics/tralics specifically translates LaTeX to XML.
, http://www.unicode.org/charts contains a list of the standard character names.
Automatic reference linking in distributed digital libraries, , CVPRW 2003, Conference on Computer Vision and Pattern Recognition Workshop, paper #26, Volume 3 (Workshop on Document Image Analysis and Retrieval), 5 pp. (2003). (2003)
Measuring Journals, . Notices of the AMS, 1049–1053, (2006). (2006) | Zbl 1142.00304