Towards Transparent Linguistic Analysis of Dutch Newspaper Article Genres using Machine Learning

Bilgin, Aysenur; Tjong Kim Sang, Erik; Smeenk, Kim; Klaver, Tom; Hollink, Laura; van Ossenbruggen, Jacco; Harbers, Frank; Broersma, Marcel

A. Bilgin (Aysenur), E. Tjong Kim Sang (Erik), K. Smeenk (Kim), T. Klaver (Tom), L. Hollink (Laura), J.R. van Ossenbruggen (Jacco), F. Harbers (Frank) and M. Broersma (Marcel)

2019-01-31

Towards Transparent Linguistic Analysis of Dutch Newspaper Article Genres using Machine Learning

Systematic study of genre in newspapers sheds light on the development of journalism discourse. The genre conventions that can be discerned in a newspaper text signal the underlying discursive norms and practices of journalism as a profession. Historical newspapers are increasingly becoming available thanks to digital newspaper archives (in the Netherlands available through Delpher.nl), providing the opportunity for large-scale empirical research. However, the digital archives do not contain fine-grained genre information that is required for this purpose. Therefore, we use machine learning to automatically assign genre labels to newspaper articles.

Machine learning facilitates substantial improvements to the outcomes of existing research by providing increased amounts of enriched data. However, the decision-making process of the machine learning pipeline needs to be verified. Our previous findings (Bilgin et al., 2018) show that accuracy scores alone are not enough to assess the performance of these pipelines and that making an informed choice not only empowers optimal study of the historical development of genre, but also increases the trustworthiness of the results. This work shows that employing a transparent approach driven by model interpretability facilitates fair comparison as well as validation of the underlying decision-making criteria of the machine learning pipelines. The criteria are presented in the form of important features, creating insights on interactions between genre-related linguistic features and bag-of-words features.

Additional Metadata
Project	News Genres: Advancing Media History by Transparent Automatic Genre Classification
Organisation	Human-Centered Data Analytics
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Bilgin, A., Tjong Kim Sang, E., Smeenk, K., Klaver, T., Hollink, L., van Ossenbruggen, J., Harbers, F.& Broersma, M. (2019). Towards Transparent Linguistic Analysis of Dutch Newspaper Article Genres using Machine Learning. In Computational Linguistics.

Free Full Text ( Final Version , 21kb )

See Also
inProceedings Utilizing a transparency-driven environment toward trusted automatic genre classification: A case study in journalism history A. Bilgin (Aysenur), L. Hollink (Laura), J.R. van Ossenbruggen (Jacco), E. Tjong Kim Sang (Erik), K. Smeenk (Kim), F. Harbers (Frank) and M. Broersma (Marcel)
techReport Utilizing a transparency-driven environment toward trusted automatic genre classification: A case study in journalism history A. Bilgin (Aysenur), L. Hollink (Laura), J.R. van Ossenbruggen (Jacco), E. Tjong Kim Sang (Erik), K. Smeenk (Kim), F. Harbers (Frank) and M. Broersma (Marcel)

Towards Transparent Linguistic Analysis of Dutch Newspaper Article Genres using Machine Learning

Publication

Publication

inProceedings
Utilizing a transparency-driven environment toward trusted automatic genre classification: A case study in journalism history

techReport
Utilizing a transparency-driven environment toward trusted automatic genre classification: A case study in journalism history

Address

CWI researchers

Questions or comments?

Towards Transparent Linguistic Analysis of Dutch Newspaper Article Genres using Machine Learning

Publication

Publication

inProceedings Utilizing a transparency-driven environment toward trusted automatic genre classification: A case study in journalism history

techReport Utilizing a transparency-driven environment toward trusted automatic genre classification: A case study in journalism history

Workflow

Workflow

Add Content

inProceedings
Utilizing a transparency-driven environment toward trusted automatic genre classification: A case study in journalism history

techReport
Utilizing a transparency-driven environment toward trusted automatic genre classification: A case study in journalism history