Describing Images using Inferred Visual Dependency Representations

Elliott, Desmond; de Vries, Arjen

The Visual Dependency Representation (VDR) is an explicit model of the spatial relationships between objects in an image. In this paper we present an approach to training a VDR Parsing Model without the extensive human supervision used in previous work. Our approach is to find the objects mentioned in a given description using a state-of-the-art object detector, and to use successful detections to produce training data. The description of an unseen image is produced by first predicting its VDR over automatically detected objects, and then generating the text with a template-based generation model using the predicted VDR. The performance of our approach is comparable to a state-of-the-art multimodal deep neural network in images depicting actions.

Additional Metadata
THEME	Information (theme 2)
Publisher	Association of Computational Linguistiscs
Conference	Association of Computational Linguistics
Organisation	Human-Centered Data Analytics
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Elliott, D., & de Vries, A. (2015). Describing Images using Inferred Visual Dependency Representations. In Proceedings of Association of Computational Linguistics 2015 (ACL 0). Association of Computational Linguistiscs.

Free Full Text ( Author Manuscript , 8mb )

Additional Files
Publisher Version

Describing Images using Inferred Visual Dependency Representations

Publication

Publication

Address

CWI researchers

Questions or comments?

Describing Images using Inferred Visual Dependency Representations

Publication

Publication

Workflow

Workflow

Add Content