Skip to main content
Log in

Topic modelling of clickthrough data in image search

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper we explore the benefits of latent variable modelling of clickthrough data in the domain of image retrieval. Clicks in image search logs are regarded as implicit relevance judgements that express both user intent and important relations between selected documents. We posit that clickthrough data contains hidden topics and can be used to infer a lower dimensional latent space that can be subsequently employed to improve various aspects of the retrieval system. We use a subset of a clickthrough corpus from the image search portal of a news agency to evaluate several popular latent variable models in terms of their ability to model topics underlying queries. We demonstrate that latent variable modelling reveals underlying structure in clickthrough data and our results show that computing document similarities in the latent space improves retrieval effectiveness compared to computing similarities in the original query space. These results are compared with baselines using visual and textual features. We show performance substantially better than the visual baseline, which indicates that content-based image retrieval systems that do not exploit query logs could improve recall and precision by taking this historical data into account.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Conversely, short-term or intra-query learning refers to learning based on relevance feedback judgements for the current query only.

  2. The Likert scale is the rating scale used to record user-item preferences in collaborative filtering [14].

  3. http://www.belga.be

  4. http://www.kyb.tuebingen.mpg.de/bs/people/pgehler/code/index.html

  5. http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm

References

  1. Baeza-Yates R, Tiberi A (2007) Extracting semantic relations from query logs. In: Proceedings of ACM KDD’07. ACM, New York, NY, USA, pp 76–85. doi:10.1145/1281192.1281204

    Google Scholar 

  2. Berry MW, Browne M (2005) Email surveillance using non-negative matrix factorization. Comput Math Organ Theory 11(3):249–264. doi:10.1007/s10588-005-5380-5. URL: http://www.springerlink.com/content/p474382p18457228/

    Article  MATH  Google Scholar 

  3. Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: Int. conf. on database theory, pp 217–235

  4. Bingham E, Kaban A, Fortelius M (2009) The aspect Bernoulli model: multiple causes of presences and absences. Pattern Anal Appl 12(1):55–78. doi:10.1007/s10044-007-0096-4

    Article  MathSciNet  Google Scholar 

  5. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  6. Craswell N, Szummer M (2007) Random walks on the click graph. In: Proceedings of ACM SIGIR’07, pp 239–246

  7. Deerwester S, Dumais S, Landauer T, Furnas G, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 4:391–407

    Article  Google Scholar 

  8. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Stat Methodol 39(1):1–38. doi:10.2307/2984875

    MathSciNet  MATH  Google Scholar 

  9. Gaussier E, Goutte C (2005) Relation between PLSA and NMF and implications. In: Proceedings of ACM SIGIR’05, pp 601–602. doi:10.1145/1076034.1076148

  10. van Gemert J, Geusebroek JM, Veenman C, Snoek C, Smeulders A (2006) Robust scene categorisation by learning image statistics in context. In: Proceedings of SLAM’06, p 105

  11. Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci U S A 101(Suppl 1): 5228–5235. doi:10.1073/pnas.0307752101

    Article  Google Scholar 

  12. He X, King O, Ma WY, Li M, Zhang HJ (2003) Learning a semantic space from user’s relevance feedback for image retrieval. IEEE Trans Circuits Syst Video Technol 13(1):39–48. doi:10.1109/TCSVT.2002.808087

    Article  Google Scholar 

  13. Heisterkamp D (2002) Building a latent-semantic index of an image database from patterns of relevance feedback. In: Proceedings of the 16th international conference on pattern recognition, pp 134–137. citeseer.ist.psu.edu/heisterkamp02building.html

  14. Herlocker JL, Konstan JA, Borchers A, Riedl J (1999) An algorithmic framework for performing collaborative filtering. In: Proceedings of ACM SIGIR’99, pp 230–237. doi:10.1145/312624.312682

  15. Hiemstra D, Rode H, van Os R, Flokstra J (2006) PFTijah: text search in an XML database system. In: Proceedings of OSIR’06, pp 12–17. http://doc.utwente.nl/66798/

  16. Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of uncertainty in artificial intelligence. citeseer.ist.psu.edu/hofmann99probabilistic.html

  17. Jansen BJ (2009) Understanding user-web interactions via web analytics. In: Synthesis lectures on information concepts, retrieval, and services, Morgan & Claypool

  18. Jansen BJ, Spink A, Saracevic T (1999) The use of relevance feedback on the web: implications for web IR system design. In: Proceedings of WebNet’99, pp 500–555

  19. Joachims T (2003) Evaluating retrieval performance using clickthrough data. In: Franke J, Nakhaeizadeh G, Renz I (eds) Text mining, pp 79–96

  20. Joachims T, Granka L, Pang B, Hembrooke H, Gay G (2005) Accurately interpreting clickthrough data as implicit feedback. In: Proceedings of ACM SIGIR’05, pp 154–161

  21. Kelly D, Teevan J (2003) Implicit feedback for inferring user preference: a bibliography. SIGIR Forum 37(2):18–28

    Article  Google Scholar 

  22. Koren Y (2009) Collaborative filtering with temporal dynamics. In: Proceedings of ACM SIGKDD’09, pp 447–456. doi:10.1145/1557019.1557072

  23. Kraaij W (2004) Variations on language modeling for information retrieval. PhD thesis, Centre for Telematics and Information Technology, University of Twente

  24. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791. doi:10.1038/44565

    Article  Google Scholar 

  25. Lin C, Xue GR, Zeng HJ, Yu Y (2005) Using probabilistic latent semantic analysis for personalized web search. In: Proceedings of the 7th asia-pacific web conference. LNCS, vol 3399, pp 707–717. http://dblp.uni-trier.de/db/conf/apweb/apweb2005.html#LinXZY05

  26. Macdonald C, Ounis I (2009) Usefulness of quality click-through data for training. In: Proceedings of the workshop on web search click data, pp 75–79. doi:10.1145/1507509.1507521

  27. Müller H, Pun T, Squire D (2004) Learning from user behavior in image retrieval: application of market basket analysis. Int J Comput Vis 56(1–2):65–77

    Article  Google Scholar 

  28. Poblete B, Bustos B, Mendoza M, Barrios JM (2010) Visual-semantic graphs: using queries to reduce the semantic gap in web image retrieval. In: Proceedings of CIKM’10, 26–30 October, Toronto, Canada. ACM Press, New York, NY

    Google Scholar 

  29. Porteous I, Newman D, Ihler A, Asuncion A, Smyth P, Welling M (2008) Fast collapsed gibbs sampling for latent dirichlet allocation. In: Proceedings of ACM SIGKDD’08. ACM, New York, NY, USA, pp 569–577. doi:10.1145/1401890.1401960

    Google Scholar 

  30. Smith G, Ashman H (2009) Evaluating implicit judgements from image search interactions. In: Proceedings of WebSci’09: society on-line. http://journal.webscience.org/148/

  31. Steyvers M, Griffiths T (2005) Probabilistic topic models. Latent Semantic analysis: a road to meaning. Laurence Erlbaum

  32. Szekely E, Bruno E, Marchand-Maillet S (2010) High-dimensional multimodal distribution embedding. In: IEEE ICDM 2010 workshop on visual analytics and knowledge discovery (VAKD’10), Sydney, Australia

  33. Tsikrika T, Diou C, de Vries AP, Delopoulos A (2009) Image annotation using clickthrough data. In: Proceedings of CIVR’09

Download references

Acknowledgements

This research was funded by the Swiss National Science Foundataion (SNF) through IM2 (Interactive Multimedia Information Management) and by EU-FP7-ICT.1.5 NoE PetaMedia. The authors would also like to thank the Belga News Agency for the use of the query logs.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donn Morrison.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morrison, D., Tsikrika, T., Hollink, V. et al. Topic modelling of clickthrough data in image search. Multimed Tools Appl 66, 493–515 (2013). https://doi.org/10.1007/s11042-012-1038-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1038-8

Keywords

Navigation