Affordable Access

Publisher Website

Term norm distribution and its effects on Latent Semantic Indexing

Authors
Journal
Information Processing & Management
0306-4573
Publisher
Elsevier
Publication Date
Volume
41
Issue
4
Identifiers
DOI: 10.1016/j.ipm.2004.03.006
Keywords
  • Information Retrieval
  • Lsi
  • Trec

Abstract

Abstract Latent Semantic Indexing (LSI) uses the singular value decomposition to reduce noisy dimensions and improve the performance of text retrieval systems. Preliminary results have shown modest improvements in retrieval accuracy and recall, but these have mainly explored small collections. In this paper we investigate text retrieval on a larger document collection (TREC) and focus on distribution of word norm (magnitude). Our results indicate the inadequacy of word representations in LSI space on large collections. We emphasize the query expansion interpretation of LSI and propose an LSI term normalization that achieves better performance on larger collections.

There are no comments yet on this publication. Be the first to share your thoughts.