Efficient and Flexible Information Retrieval Using MonetDB/X100
Today's large-scale IR systems are not implemented using general-purpose database systems, as the latter tend to be significantly less efficient than custom-built IR engines. This paper demonstrates how recent developments in hardwareconscious database architecture may however satisfy IR needs. The advantage is flexibility of experimentation, as implementing a retrieval system on top of a DBMS boils down to relational query formulation, rather than system programming. We demonstrate in the context of the TeraByte TREC efficiency task that our experimental MonetDB/X100 database system provides highly competitive results both regarding precision and speed. We analyze the two innovations in MonetDB/X100 that most contributed to this successful application of DB technology in IR, namely vectorized incache processing and the use of two new light-weight compression schemes that work between the RAM and CPU cache memory levels.
|G. Weikum , J. Hellerstein , M. Stonebraker|
|Ambient Multimedia Databases|
|Biennial Conference on Innovative Data Systems Research|
Héman, S, Zukowski, M, de Vries, A.P, & Boncz, P.A. (2007). Efficient and Flexible Information Retrieval Using MonetDB/X100. In G Weikum, J Hellerstein, & M Stonebraker (Eds.), Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (pp. 96–101).