Title: An Overview of the Vector Space Model for Text-based Information Retrieval
Abstract: The vector space model (VSM) for information retrieval of a text-based document set is a way to convert text-based documents to real-valued vectors. Under this model, we arrive at a weighted term-document matrix that contains word-based information from the set of documents. Then, we use the singular value decomposition to factor the term-document matrix, obtaining key information from the matrix, such as its rank and an orthonormal basis for its columnspace. We discuss VSM-based interpretations of these standard matrix components, and we provide a glimpse of VSM updating techniques. Finally, we review examples, and we look at results generated from original code showing the VSM in action.