Capturing contentiousness: Constructing the contentious terms in context corpus

Brate, Ryan; Nesterov, Andrei; Vogelmann, Valentin; van Ossenbruggen, Jacco; Hollink, Laura; van Erp, Marieke

doi:10.1145/3460210.3493553

R. Brate (Ryan), A. Nesterov (Andrei), V. Vogelmann (Valentin), J.R. van Ossenbruggen (Jacco), L. Hollink (Laura) and M. van Erp (Marieke)

2021-12-02

Capturing contentiousness: Constructing the contentious terms in context corpus

Presented at the 11th ACM International Conference on Knowledge Capture, K-CAP 2021 (December 2021), Virtual, Online

Recent initiatives by cultural heritage institutions in addressing outdated and offensive language used in their collections demonstrate the need for further understanding into when terms are problematic or contentious. This paper presents an annotated dataset of 2,715 unique samples of terms in context, drawn from a historical newspaper archive, collating 21,800 annotations of contentiousness from expert and crowd workers. We describe the contents of the corpus by analysing inter-rater agreement and differences between experts and crowd workers. In addition, we demonstrate the potential of the corpus for automated detection of contentiousness. We show that a simple classifier applied to the embedding representation of a target word provides a better than baseline performance in predicting contentiousness. We find that the term itself and the context play a role in whether a term is considered contentious.

Additional Metadata
Keywords	Datasets, Bias, Crowdsourcing, Knowledge capture
Persistent URL	doi.org/10.1145/3460210.3493553
Project	Culturally aware AI
Conference	11th ACM International Conference on Knowledge Capture, K-CAP 2021
Grant	This work was funded by the The Netherlands Organisation for Scientific Research (NWO); grant id KIVI.2019.005 - Culturally aware AI (AI:CULT)
Organisation	Human-Centered Data Analytics
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Brate, R., Nesterov, A., Vogelmann, V., van Ossenbruggen, J., Hollink, L., & van Erp, M. (2021). Capturing contentiousness: Constructing the contentious terms in context corpus. In Proceedings of the International Conference on Knowledge Capture (pp. 17–24). doi:10.1145/3460210.3493553

View at Publisher

Free Full Text ( Final Version , 1mb )

Capturing contentiousness: Constructing the contentious terms in context corpus

Publication

Publication

Address

Publishing at CWI

Questions or comments?

Capturing contentiousness: Constructing the contentious terms in context corpus

Publication

Publication

Workflow

Workflow

Add Content