Supervised machine learning tasks require human-labeled data. Crowdsourcing allows scaling up the labeling process, but the quality of the labels obtained can vary. To address this limitation, we propose methods for predicting label quality based on worker trajectories, i.e., on the sequence of documents workers explore during their crowdsourcing tasks. Trajectories represent a lightweight and non-intrusive form of worker behavior signal. We base our analysis on previously collected datasets composed of thousands of assessment data records including information such as workers’ trajectories, workers’ assessments, and experts’ assessments. We model such behavior sequences as embeddings, to facilitate their management. Then, we: (1) use supervised methods to predict worker performance using a given ground truth; (2) perform an unsupervised analysis to provide insight into crowdsourcing quality when no gold standard is available. We test several supervised approaches which all beat the baseline we propose. Also, we identify significant differences between trajectory clusters in terms of assessments and worker performance. The trajectory-based analysis is a promising direction for non-intrusive worker performance evaluation.

, ,
doi.org/10.1007/978-3-031-34444-2_6
Lecture Notes in Computer Science
The eye of the beholder: Transparent pipelines for assessing online information quality
23rd International Conference on Web Engineering, ICWE 2023
Human-Centered Data Analytics

Ceolin, D., Roitero, K., & Guo, F. (2023). Predicting crowd workers performance: An information quality case. In International Conference on Web Engineering (pp. 75–90). doi:10.1007/978-3-031-34444-2_6