State-of-the-art content sharing platforms often require users to assign tags to pieces of media in order to make them easily retrievable. Since this task is sometimes perceived as tedious or boring, annotations can be sparse. Commenting on the other hand is a frequently used means of expressing user opinion towards shared media items. This work makes use of time series analyses in order to infer potential tags and indexing terms for audio-visual content from user comments. In this way, we mitigate the vocabulary gap between queries and document descriptors. Additionally, we show how large-scale encyclopaedias such as Wikipedia can aid the task of tag prediction by serving as surrogates for high-coverage natural language vocabulary lists. Our evaluation is conducted on a corpus of several million real-world user comments from the popular video sharing platform YouTube, and demonstrates signi cant improvements in retrieval performance.
P. Serdyukov , P. Braslavski , S.O. Kuznetsov (Sergei) , J. Kamps , S.M. Ruger , E. Agichtein , I. Segalovich , E. Yilmaz (Emine)
European Conference on Information Retrieval
Human-Centered Data Analytics

Eickhoff, C., Li, W., & de Vries, A. (2013). Exploiting User Comments for Audio-Visual Content Indexing and Retrieval. In P. Serdyukov, P. Braslavski, S. Kuznetsov, J. Kamps, S. M. Ruger, E. Agichtein, … E. Yilmaz (Eds.), Advances in Information Retrieval - 35th European Conference on IR Research, ECIR 2013, Moscow, Russia, March 24-27, 2013. (pp. 38–49). Springer.