Text retrieval using Latent Semantic Indexing (LSI) with truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. However, the expensive complexity involved in computing truncated SVD constitutes a major drawback of the LSI method. In this paper, we demonstrate how matrix rank approximation can influence the effectiveness of information retrieval systems. Besides, we present an implementation of the LSI method based on an eigenvalue analysis for rank approximation without computing truncated SVD, along with its computational details. Significant improvements in computational time while maintaining retrieval accuracy are observed over the tested document collections.
@article{bwmeta1.element.bwnjournal-article-amcv16i4p551bwm, author = {Kumar, Cherukuri and Srinivas, Suripeddi}, title = {Latent Semantic Indexing using eigenvalue analysis for efficient information retrieval}, journal = {International Journal of Applied Mathematics and Computer Science}, volume = {16}, year = {2006}, pages = {551-558}, zbl = {1122.68047}, language = {en}, url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv16i4p551bwm} }
Kumar, Cherukuri; Srinivas, Suripeddi. Latent Semantic Indexing using eigenvalue analysis for efficient information retrieval. International Journal of Applied Mathematics and Computer Science, Tome 16 (2006) pp. 551-558. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv16i4p551bwm/
[000] April K. and Pottenger W.M. (2006): A framework for understanding latent semantic indexing performance. - J. Inf. Process. Manag., Vol. 42, No. 1, pp. 56-73.
[001] Aswani Kumar Ch., Gupta A., Batool M. and Trehan S. (2005): An information retrieval model based on latent semantic indexing with intelligent preprocessing. - J. Inf. Knowl. Manag., Vol. 4, No. 4, pp. 1-7.
[002] Aswani Kumar Ch. and Srinivas S. (2006): On the effect of rank approximation on information retrieval. - Proc. Int. Conf. s Systemics Cybernetics and Informatics, Hyderabad, India, pp. 876-880.
[003] Balinski J. and Danilowicz C. (2005): Ranking method based on inter document distances. - Inf. Process. Manag., Vol. 41, No. 4,pp. 759-775. | Zbl 1080.68583
[004] Bass D. and Behrens C. (2003): Distributed LSI: Scalable concept based information retrieval with high semantic resolution. - Proc. 2003 Text Mining Workshop, San Francisco, CA, USA, pp. 72-82.
[005] Bast H. and Weber I. (2005): Insights from viewing ranked retrieval as rank aggregation. - Proc. Workshop s Challenges in Web Information Retrieval and Integration, WIRI05, Tokyo, Japan, pp. 232-239.
[006] Berry M.W. (1992): Large scale singular value computations. - Int. J. Super Comput. Appli., Vol. 6, No. 1, pp. 13-49.
[007] Berry M.W. and Dumasis S.T. (1995): Using linear algebra for intelligent information retrieval. - SIAM Rev., Vol. 37, No. 4, pp. 573-995.
[008] Berry M.W., Drmac Z. and Jessup E.R. (1999): Matrices, vector spaces, and information retrieval. - SIAM Rev., Vol. 41, No. 2, pp. 335-362. | Zbl 0924.68069
[009] Berry M.W. and Shakhina A.P. (2005): Computing sparse reduced-rank approximation to sparse matrices. - ACM Trans. Math. Software, 2005, Vol. 31, No. 2, pp. 252-269. | Zbl 1070.65539
[010] Cheng X.Z. and Lafferty J.(2006): A risk minimization framework for information retrieval. - Inf. Process. Manag., Vol. 42, No. 1, pp. 31-55. | Zbl 1101.68544
[011] Deerwester S. (1990): Indexing by latent semantic analysis. - J. Ameri. Soci. Inf. Sci., Vol. 41, No. 6, pp. 391-407.
[012] Fan J., Ravi K., Littman M.L. and Santosh V. (1999): Efficient singular value decomposition via document samplings. - Tech. Rep. CS-1999-5, Dept. Computer Science, Duke University, North Carolina.
[013] Gao J. and Zhang J. (2003): Sparsification strategies in latent semantic indexing. - Proc. 2003 Mining Workshop, San Francisco, CA, USA, pp. 93-103.
[014] Gao J. and Zhang J. (2005): Clustered SVD strategies in latent semantic indexing. - Inf. Process. Manag., Vol. 41, No. 5, pp. 1051-1063. | Zbl 1101.68538
[015] Golub G.H. and Van Loan C.F. (1996): Matrix Computations. -Baltimore: The John Hopkins University Press.
[016] Hua Y. (2000): Searching beyond SVD for rank reduction. - Proc. IEEE Workshop s Sensor Array and Multi Channel Signal Processing, Cambridge, MA, USA, pp. 395-397.
[017] Husbands P., Simon H. and Ding C. (2001): On the use of singular value decomposition for text retrieval. - SIAM Comput. Inf. Retrieval, pp. 145-156. | Zbl 0995.68045
[018] Husbands P. and Ding C. (2005): Term norm distribution and its effects on latent semantic indexing. - Inf. Process. Manag., Vol. 41, No. 4, pp. 777-787.
[019] Kontostathis A. and Pottenger W.M. (2002a): Detecting patterns in the latent semantic indexing term-term matrix. - Proc. Workshop s Foundations of Data Mining and Discovery, IEEE Int. Conf. Data Mining, (ICDM 02), Maeybaski, Japan.
[020] Kontostathis A. and Pottenger W.M. (2002b): A mathematical view of latent semantic indexing: Tracing term co-occurrences. - Tech. Report, LU-CSE-02-006, Lehigh University.
[021] Landauer T.K., Foltz P.W. and Laham D. (1998): Introduction to latent semantic analysis. - Discourse Processes, Vol. 25, pp. 259-284.
[022] Park H. and Elden L. (2003): Matrix rank reduction for data analysis and feature extraction. - Tech. Rep., Dept. Computer Science and Engineering, University of Minnesota.
[023] Praks P., Dvorsky J., Snasel V. and Cernohorsky J.D. (2003): On SVD free latent semantic indexing for image retrieval for application in a hard real time environment. - Proc. IEEE Int. Conf.Industrial Technology, Maribor, Slovenia, pp. 466-471.
[024] Schisler M.L. (2004): Latent semantic indexing: using eigenvalues and the singular value decomposition in lexicographical search techniques. - Tech. Rep., University of Missouri.
[025] Tang J.T. (2003): Application of principle direction divisive partitioning and SVD in information retrieval. - Masters Proj. Rep., Dept. Computer Science, University of Kentucky, Lexington, KY.
[026] Yates R.B. and Neto B.R. (1999): Modern Information Retrieval. -New Delhi: Pearson Education.
[027] Ye Y.Q. (2000): Comparing matrix methods in text-based information retrieval. - Tech. Rep., School of Mathematical Sciences, Peking University.
[028] Zang Z., Zha H. and Simon H. (2002): Low rank approximations with sparse factors: Basic algorithms and error analysis. - SIAM J. Matrix Anal. Appl., Vol. 23, No. 3, pp. 706-727. | Zbl 1003.65041