Obtaining High-Quality Relevance Judgments Using Crowdsourcing
IEEE Internet Computing , Volume 16 - Issue 5 p. 20- 27
The performance of information retrieval (IR) systems is commonly evaluated using a test set with known relevance. Crowdsourcing is one method for learning the relevant documents to each query in the test set. However, the quality of relevance learned through crowdsourcing can be questionable, because it uses workers of unknown quality with possible spammers among them. To detect spammers, the authors' algorithm compares judgments between workers; they evaluate their approach by comparing the consistency of crowdsourced ground truth to that obtained from expert annotators and conclude that crowdsourcing can match the quality obtained from the latter.