Modern relevance models consider a wide range of criteria in order to identify those documents that are expected to satisfy the user's information need. With growing dimensionality of the underlying relevance spaces the need for sophisticated score combination and estimation schemes arises. In this paper, we investigate the use of copulas, a model family from the domain of robust statistics, for the formal estimation of the probability of relevance in high-dimensional spaces. Our experiments are based on the MSLR-WEB10K and WEB30K datasets, two annotated, publicly available samples of hundreds of thousands of real Web search impressions, and suggest that copulas can significantly outperform linear combination models for high-dimensional problems. Our models achieved a performance on par with that of state-of-the-art machine learning approaches.
Information (theme 2)
ACM
dx.doi.org/10.1145/2661829.2661925
ACM Conference on Information and Knowledge Management
Human-centered Data Analysis

Eickhoff, C, & de Vries, A.P. (2014). Modelling Complex Relevance Spaces with Copulas. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (pp. 1831–1834). ACM. doi:10.1145/2661829.2661925