Utilizing a transparency-driven environment toward trusted automatic genre classification: A case study in journalism history
With the growing abundance of unlabeled data in real-world tasks, researchers have to rely on the predictions given by black-boxed computational models. However, it is an often neglected fact that these models may be scoring high on accuracy for the wrong reasons. In this paper, we present a practical impact analysis of enabling model transparency by various presentation forms. For this purpose, we developed an environment that empowers non-computer scientists to become practicing data scientists in their own research field. We demonstrate the gradually increasing understanding of journalism historians through a real-world use case study on automatic genre classification of newspaper articles. This study is a first step towards trusted usage of machine learning pipelines in a responsible way.
|Genre classification, Journalism history, Machine learning, Transparency|
|News Genres: Advancing Media History by Transparent Automatic Genre Classification|
|IEEE International Conference on e-Science|
|Organisation||Human-centered Data Analysis|
Bilgin, A, Hollink, L, van Ossenbruggen, J.R, Tjong Kim Sang, E, Smeenk, K, Harbers, F, & Broersma, M. (2018). Utilizing a transparency-driven environment toward trusted automatic genre classification: A case study in journalism history. In IEEE 14th International Conference on eScience, e-Science 2018 (pp. 486–496). doi:10.1109/eScience.2018.00137