LATENT SEMANTIC INDEXING (LSI) AND ITS APPLICATIONS IN CHINESE TEXT PROCESSING

周水庚,关佶红,胡运发
DOI: https://doi.org/10.3969/j.issn.1000-1220.2001.02.031
2001-01-01
Abstract:Information retrieval is essentially semantic retrieval. However,most classic information retrieval systems represent the contents of documents and queries with a set of index terms, which can lead to poor retrieval performance. Latent semantic index (LSI)is a new algebraic model for information retrieval, which maps documents and queries vectors into a lower dimensional space by singular value decomposition, so that the inherent vagueness associated with a retrieval process based on keyword sets is considerably reduced and semantic association among the documents is highlighted consequently. Theoretic analyses and experimental results all show that LSI can improve retrieval performance significantly. This paper introduces the fundamental principles of LSI and explores its applications in Chinese text processing, including Chinese text retrieval, Chinese text classification and clustering etc.
What problem does this paper attempt to address?