Distance matters! Cumulative proximity expansions for ranking documents

Vuurens, Jeroen; de Vries, Arjen

doi:10.1007/s10791-014-9243-x

J.B.P. Vuurens (Jeroen) and A.P. de Vries (Arjen)

2014-07-01

Distance matters! Cumulative proximity expansions for ranking documents

Information Retrieval , Volume 17 - Issue 4 p. 380- 406

In the information retrieval process, functions that rank documents according to their estimated relevance to a query typically regard query terms as being independent. However, it is often the joint presence of query terms that is of interest to the user, which is overlooked when matching independent terms. One feature that can be used to express the relatedness of co-occurring terms is their proximity in text. In past research, models that are trained on the proximity information in a collection have performed better than models that are not estimated on data. We analyzed how co-occurring query terms can be used to estimate the relevance of documents based on their distance in text, which is used to extend a unigram ranking function with a proximity model that accumulates the scores of all occurring term combinations. This proximity model is more practical than existing models, since it does not require any co-occurrence statistics, it obviates the need to tune additional parameters, and has a retrieval speed close to competing models. We show that this approach is more robust than existing models, on both Web and newswire corpora, and on average performs equal or better than existing proximity models across collections.

Additional Metadata
THEME	Information (theme 2)
Persistent URL	doi.org/10.1007/s10791-014-9243-x
Journal	Information Retrieval
Organisation	Human-Centered Data Analytics
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Vuurens, J., & de Vries, A. (2014). Distance matters! Cumulative proximity expansions for ranking documents. Information Retrieval Journal, 17(4), 380–406. doi:10.1007/s10791-014-9243-x

View at Publisher

Free Full Text ( Final Version , 467kb )

Additional Files
Fulltext Final Version
Publisher Version

Distance matters! Cumulative proximity expansions for ranking documents

Publication

Publication

Address

CWI researchers

Questions or comments?

Distance matters! Cumulative proximity expansions for ranking documents

Publication

Publication

Workflow

Workflow

Add Content