Old dogs are great at new tricks: column stores for ir prototyping
Presented at the Annual ACM SIGIR Conference, Gold Coast , QLD, Australia
We make the suggestion that instead of implementing custom index structures and query evaluation algorithms, IR researchers should simply store document representations in a column-oriented relational database and implement ranking models using SQL. For rapid prototyping, this is particularly advantageous since researchers can explore new scoring functions and features by simply issuing SQL queries, without needing to write imperative code. We demonstrate the feasibility of this approach by an implementation of conjunctive BM25 using two modern column stores. Experiments on a web collection show that a retrieval engine built in this manner achieves effectiveness and efficiency on par with custom-built retrieval engines, but provides many additional advantages, including cleaner query semantics, a simpler architecture, built-in support for error analysis, and the ability to exploit advances in database technology "for free".
|Relational Databases, BM25|
|Information (theme 2)|
|Web Archives Retrieval Tools|
|Annual ACM SIGIR Conference|
Mühleisen, H.F, Samar, T, Lin, J.J.P, & de Vries, A.P. (2014). Old dogs are great at new tricks: column stores for ir prototyping. In SIGIR. ACM.