2011-03-01
Result Diversification Based on Query-Specific Cluster Ranking
Publication
Publication
Journal of the American Society for Information Science and Technology , Volume 62 - Issue 3 p. 550- 571
Result diversification is a retrieval strategy for dealing
with ambiguous or multi-faceted queries by providing
documents that cover as many facets of the query as
possible. We propose a result diversification framework
based on query-specific clustering and cluster ranking, in
which diversification is restricted to documents belonging
to clusters that potentially contain a high percentage
of relevant documents. Empirical results show that
the proposed framework improves the performance of
several existing diversification methods. The framework
also gives rise to a simple yet effective cluster-based
approach to result diversification that selects documents
from different clusters to be included in a ranked
list in a round robin fashion.We describe a set of experiments
aimed at thoroughly analyzing the behavior of the
two main components of the proposed diversification
framework, ranking and selecting clusters for diversification.
Both components have a crucial impact on
the overall performance of our framework, but ranking
clusters plays a more important role than selecting clusters.
We also examine properties that clusters should
have in order for our diversification framework to be
effective. Most relevant documents should be contained
in a small number of high-quality clusters, while there
should be no dominantly large clusters. Also, documents
from these high-quality clusters should have a diverse
content. These properties are strongly correlated with
the overall performance of the proposed diversification
framework.
Additional Metadata | |
---|---|
, | |
Journal of the American Society for Information Science and Technology | |
Organisation | Human-Centered Data Analytics |
He, J., Meij, E., & de Rijke, M. (2011). Result Diversification Based on Query-Specific Cluster Ranking. Journal of the American Society for Information Science and Technology, 62(3), 550–571. |