Modern relevance models consider a wide range of criteria in order to identify those documents that are expected to satisfy the user's information need. With growing dimensionality of the underlying relevance spaces the need for sophisticated score combination and estimation schemes arises. In this paper, we investigate the use of copulas, a model family from the domain of robust statistics, for the formal estimation of the probability of relevance in high-dimensional spaces. Our experiments are based on the MSLR-WEB10K and WEB30K datasets, two annotated, publicly available samples of hundreds of thousands of real Web search impressions, and suggest that copulas can significantly outperform linear combination models for high-dimensional problems. Our models achieved a performance on par with that of state-of-the-art machine learning approaches.
Additional Metadata
THEME Information (theme 2)
Publisher ACM
Persistent URL dx.doi.org/10.1145/2661829.2661925
Conference ACM Conference on Information and Knowledge Management
Citation
Eickhoff, C, & de Vries, A.P. (2014). Modelling Complex Relevance Spaces with Copulas. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (pp. 1831–1834). ACM. doi:10.1145/2661829.2661925