Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a MapReduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources.
Publié le : 2014-06-27
Classification:  Information retrieval, latent semantic indexing, MapReduce, load balancing, genetic algorithms
@article{cai995,
     author = {Yang Liu; School of Electrical Engineering and Information, Sichuan University and Maozhen Li; School of Engineering and Design, Brunel University, Uxbridge, UB8 3PH and Mukhtaj Khan; School of Engineering and Design, Brunel University, Uxbridge, UB8 3PH and Man Qi; Department of Computing, Canterbury Christ Church University, Canterbury, Kent, CT1 1QU},
     title = {A MapReduce Based Distributed LSI for Scalable Information Retrieval},
     journal = {Computing and Informatics},
     volume = {33},
     number = {1},
     year = {2014},
     language = {en},
     url = {http://dml.mathdoc.fr/item/cai995}
}
Yang Liu; School of Electrical Engineering and Information, Sichuan University; Maozhen Li; School of Engineering and Design, Brunel University, Uxbridge, UB8 3PH; Mukhtaj Khan; School of Engineering and Design, Brunel University, Uxbridge, UB8 3PH; Man Qi; Department of Computing, Canterbury Christ Church University, Canterbury, Kent, CT1 1QU. A MapReduce Based Distributed LSI for Scalable Information Retrieval. Computing and Informatics, Tome 33 (2014) no. 1, . http://gdmltest.u-ga.fr/item/cai995/