Labeled data is a prerequisite for successfully applying machine learning techniques to a wide range of problems. Recently, crowd-sourcing has shown to provide effective solutions to many labeling tasks. However, tasks in specialist domains are difficult to map to Human Intelligence Tasks (or HITs) that can be solved adequately by "the crowd". The question addressed in this paper is whether these specialist tasks can be cast in such a way, that accurate results can still be obtained through crowd-sourcing. We study a case where the goal is to identify fish species in images extracted from videos taken by underwater cameras, a task that typically requires profound domain knowledge in marine biology and hence would be difficult, if not impossible, for the crowd. We show that by carefully converting the recognition task to a visual similarity comparison task, the crowd achieves agreement with the experts comparable to the agreement achieved among experts. Further, non-expert users can learn and improve their performance during the labeling process, e.g., from the system feedback.
Information (theme 2), Life Sciences (theme 5)
Conference on Open Research Areas in Information Retrieval
Human-centered Data Analysis

He, J, van Ossenbruggen, J.R, & de Vries, A.P. (2013). Do you need experts in the crowd? A case study in image annotation for marine biology. In Proceedings of Conference on Open Research Areas in Information Retrieval 2013 (OAIR 10) (pp. 57–60). ACM.