The performance of information retrieval (IR) systems is commonly evaluated using a test set with known relevance. Crowdsourcing is one method for learning the relevant documents to each query in the test set. However, the quality of relevance learned through crowdsourcing can be questionable, because it uses workers of unknown quality with possible spammers among them. To detect spammers, the authors' algorithm compares judgments between workers; they evaluate their approach by comparing the consistency of crowdsourced ground truth to that obtained from expert annotators and conclude that crowdsourcing can match the quality obtained from the latter.
I.E.E.E. Computer Society
IEEE Internet Computing
Human-Centered Data Analytics

Vuurens, J., & de Vries, A. (2012). Obtaining High-Quality Relevance Judgments Using Crowdsourcing. IEEE Internet Computing, 16(5), 20–27.