Bilkent University
Department of Computer Engineering
CS 590/690 SEMINAR

 

Semantic-Aware Sparse Indexing: Enhancing Traditional Retrieval with Contextual Term Importance

 

Ekrem Polat
Master Student
(Supervisor: Prof.Dr.Özgür Ulusoy)
Computer Engineering Department
Bilkent University

Abstract: Sparse retrieval methods, particularly BM25, have long been favored for their efficiency and scalability in information retrieval systems. However, traditional sparse indexing mechanisms rely heavily on term frequency as a proxy for term importance within a document — an assumption that often fails to capture the true semantic relevance of terms. A term's frequency does not necessarily reflect its contextual importance or its contribution to the overall meaning of the document. In this work, we introduce a novel sparse indexing framework that integrates semantic information into the indexing process through the use of embedding-based representations. By leveraging contextual signals captured in the embedding space, our approach computes a semantic relevance score for each term, enabling the index to reflect not just occurrence, but meaning. This shift from frequency-based to context-aware indexing enhances the expressiveness of sparse representations while preserving their inherent efficiency. This embedding-driven perspective offers a promising direction for scalable and semantically-informed information retrieval, bridging the gap between traditional sparse techniques and modern contextual understanding.

 

DATE: April 14, Thursday @ 13:50 Place: EA 502