Combining implicit and explicit topic representations for result diversification

He, Jiyin; Hollink, Vera; de Vries, Arjen

J. He (Jiyin), V. Hollink (Vera) and A.P. de Vries (Arjen)

2012-08-01

Combining implicit and explicit topic representations for result diversification

Presented at the Annual ACM SIGIR Conference, Portland

Result diversification deals with ambiguous or multi-faceted queries by providing documents that cover as many subtopics of a query as possible. Various approaches to subtopic modeling have been proposed. Subtopics have been extracted internally, e.g., from retrieved documents, and externally, e.g., from Web resources such as query logs. Internally modeled subtopics are often implicitly represented, e.g., as latent topics, while externally modeled subtopics are often explicitly represented, e.g., as reformulated queries. We propose a framework that: i) combines both implicitly and explicitly represented subtopics; and ii) allows flexible combination of multiple external resources in a transparent and unified manner. Specifically, we use a random walk based approach to estimate the similarities of the explicit subtopics mined from a number of heterogeneous resources: click logs, anchor text, and web n-grams. We then use these similarities to regularize the latent topics extracted from the top-ranked documents, i.e., the internal (implicit) subtopics. Empirical results show that regularization with explicit subtopics extracted from the right resource leads to improved diversification results, indicating that the proposed regularization with (explicit) external resources forms better (implicit) topic models. Click logs and anchor text are shown to be more effective resources than web n-grams under current experimental settings. Combining resources does not always lead to better results, but achieves a robust performance. This robustness is important for two reasons: it cannot be predicted which resources will be most effective for a given query, and it is not yet known how to reliably determine the optimal model parameters for building implicit topic models.

Additional Metadata
Keywords	Multi-source, Subtopics, Result diversification, Random walk
THEME	Information (theme 2)
Publisher	ACM
Conference	Annual ACM SIGIR Conference
Organisation	Human-Centered Data Analytics
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	He, J., Hollink, V., & de Vries, A. (2012). Combining implicit and explicit topic representations for result diversification. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval 2012 (35) (pp. 851–860). ACM.

Free Full Text ( Final Version , 603kb )

Combining implicit and explicit topic representations for result diversification

Publication

Publication

Address

CWI researchers

Questions or comments?

Combining implicit and explicit topic representations for result diversification

Publication

Publication

Workflow

Workflow

Add Content