An interpretable method for automated classification of spoken transcripts and written text

Wahde, Mattias; Della Vedova, Marco; Virgolin, Marco; Suvanto, Minerva

doi:10.1007/s12065-023-00851-1

M. Wahde (Mattias), M.L. Della Vedova (Marco), M. Virgolin (Marco) and M. Suvanto (Minerva)

2023-05-04

An interpretable method for automated classification of spoken transcripts and written text

We investigate the differences between spoken language (in the form of radio show transcripts) and written language (Wikipedia articles) in the context of text classification. We present a novel, interpretable method for text classification, involving a linear classifier using a large set of n- gram features, and apply it to a newly generated data set with sentences originating either from spoken transcripts or written text. Our classifier reaches an accuracy less than 0.02 below that of a commonly used classifier (DistilBERT) based on deep neural networks (DNNs). Moreover, our classifier has an integrated measure of confidence, for assessing the reliability of a given classification. An online tool is provided for demonstrating our classifier, particularly its interpretable nature, which is a crucial feature in classification tasks involving high-stakes decision-making. We also study the capability of DistilBERT to carry out fill-in-the-blank tasks in either spoken or written text, and find it to perform similarly in both cases. Our main conclusion is that, with careful improvements, the performance gap between classical methods and DNN-based methods may be reduced significantly, such that the choice of classification method comes down to the need (if any) for interpretability.

Additional Metadata
Keywords	Text classification, Natural language processing, Interpretable methods
Persistent URL	doi.org/10.1007/s12065-023-00851-1
Journal	Evolutionary Intelligence
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Wahde, M., Della Vedova, M., Virgolin, M., & Suvanto, M. (2023). An interpretable method for automated classification of spoken transcripts and written text. Evolutionary Intelligence. doi:10.1007/s12065-023-00851-1

View at Publisher

Free Full Text ( Final Version , 1mb )

See Also
dataset Data set for (binary) text classification, involving spoken utterances and written text M. Wahde (Mattias), M.L. Della Vedova (Marco), M. Virgolin (Marco) and M. Suvanto (Minerva)

An interpretable method for automated classification of spoken transcripts and written text

Publication

Publication

dataset
Data set for (binary) text classification, involving spoken utterances and written text

Address

CWI researchers

Questions or comments?

An interpretable method for automated classification of spoken transcripts and written text

Publication

Publication

dataset Data set for (binary) text classification, involving spoken utterances and written text

Workflow

Workflow

Add Content

dataset
Data set for (binary) text classification, involving spoken utterances and written text