Recently, various crowdsourcing initiatives showed that targeted efforts of user communities result in massive amounts of tags. For example, the Netherlands Institute for Sound and Vision collected a large number of tags with the video labeling game \emph{Waisda?}. To successfully utilize these tags, a better understanding of their characteristics is required. The goal of this paper is twofold: (i) to investigate the vocabulary that users employ when describing videos and compare it to the vocabularies used by professionals; and (ii) to establish which aspects of the video are typically described and what type of tags are used for this. We report on an analysis of the tags collected with \emph{Waisda?}. With respect to the first goal, we compared the the tags with a typical domain thesaurus used by professionals, as well as with a more general vocabulary. With respect to the second goal, we compare the tags to the video subtitles to determine how many tags are derived from the audio signal. In addition, we perform a qualitative study in which a tag sample is interpreted in terms of an existing annotation classification framework. The results suggest that the tags complement the metadata provided by professional cataloguers, the tags describe both the audio and the visual aspects of the video, and the users primarily describe objects in the video using general descriptions.

,
ACM Press
International Conference on Knowledge Capture
Human-Centered Data Analytics

Gligorov, R., Hildebrand, M., van Ossenbruggen, J., Schreiber, G., & Aroyo, L. (2011). On the Role of User-generated Metadata in Audio Visual Collections. In Proceedings of the International Conference on Knowledge Capture 2011 (pp. 145–151). ACM Press.