2004
Combining multiple representations on the TRECVID search task
Publication
Publication
Presented at the
IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada
This paper presents a (preliminary) analysis of the evaluation results obtained on the TRECVID 2003 search task. We study in particular the effects of combining multiple representations on retrieval: multiple representations of video content (speech and visual) and of the user information need (multiple visual examples). We conclude from our multi-modal retrieval experiments the following working hypothesis: even though the ASR run is usually better than the visual run, matching against both modalities ensures robustness against choosing the wrong content representation. For the same reason, using multiple visual examples to represent the user information need is preferable over using a single designated example only.
Additional Metadata | |
---|---|
IEEE Signal Processing Society | |
IEEE International Conference on Acoustics, Speech, and Signal Processing | |
Organisation | Database Architectures |
de Vries, A., Westerveld, T., & Ianeva, T. (2004). Combining multiple representations on the TRECVID search task. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing 2004 (ICASSP) (pp. 1–4). IEEE Signal Processing Society. |