We make the suggestion that instead of implementing custom index structures and query evaluation algorithms, IR researchers should simply store document representations in a column-oriented relational database and implement ranking models using SQL. For rapid prototyping, this is particularly advantageous since researchers can explore new scoring functions and features by simply issuing SQL queries, without needing to write imperative code. We demonstrate the feasibility of this approach by an implementation of conjunctive BM25 using two modern column stores. Experiments on a web collection show that a retrieval engine built in this manner achieves effectiveness and efficiency on par with custom-built retrieval engines, but provides many additional advantages, including cleaner query semantics, a simpler architecture, built-in support for error analysis, and the ability to exploit advances in database technology "for free".
Additional Metadata
Keywords Relational Databases, BM25
THEME Information (theme 2)
Publisher ACM
Project Web Archives Retrieval Tools
Conference Annual ACM SIGIR Conference
Citation
Mühleisen, H.F, Samar, T, Lin, J.J.P, & de Vries, A.P. (2014). Old dogs are great at new tricks: column stores for ir prototyping. In SIGIR. ACM.