Automatic image annotation using supervised learning is performed by concept classifiers trained on labelled example images. This work proposes the use of clickthrough data collected from search logs as a source for the automatic generation of concept training data, thus avoiding the expensive manual annotation effort. We investigate and evaluate this approach using a collection of 97,628 photographic images. The results indicate that the contribution of search log based training data is positive despite their inherent noise; in particular, the combination of manual and automatically generated training data outperforms the use of manual data alone. It is therefore possible to use clickthrough data to perform large-scale image annotation with little manual annotation effort or, depending on performance, using only the automatically generated training data. An extensive presentation of the experimental results and the accompanying data can be accessed at http://olympus.ee.auth.gr/~diou/civr2009/.
International Journal on Multimedia Tools and Applications
Image Indexing and reTrievAL in the Large Scale
Human-Centered Data Analytics

Tsikrika, T., Diou, C., de Vries, A., & Delopoulos, A. (2010). Reliability and effectiveness of clickthrough data for automatic image annotation. International Journal on Multimedia Tools and Applications, 55(1), 27–52.