Cultural heritage institutions increasingly make their collections digitally available. Consequently, users of digital archives need to familiarize themselves with new kinds of different digital tools. This is particularly true for humanities scholars who include results of their analyses in their publications. Judging whether insights derived from these analyses constitute a real trend or whether a potential conclusion is just an artifact of the tools used, can be difficult. To correct errors in data, human input is in many cases still indispensable. Since experts are expensive, we conducted a study showing how crowdsourcing tasks can be designed to allow lay users to contribute information at the expert level to increase the number and quality of descriptions of collection items. However, to improve the quality of their data effectively, data custodians need to understand the (search) tasks their users perform and the level of trustworthiness they expect from the results.
Through interviews with historians, we studied their use of digital archives and classified typical research tasks and their requirements for data quality. Most archives provide, at best, very generic information about the data quality of their digitized collections. Humanities scholars, however, need to be able to assess how data quality and inherent bias within tools influence their research tasks. Therefore, they need specific information on the data quality of the subcollection used and the biases the tools provided may have introduced into the analyses. We studied whether access to a historic newspaper archive is biased, and which types of documents benefit from, or are disadvantaged, by the bias. Using real and simulated search queries and page view data of real users, we investigated how well typical retrievability studies reflect the users' experience. We discovered large differences in the characteristics of the query sets and in the results for different parameter settings of the experiments. Within digital archives, OCR errors are a prevalent data quality issue. Since these are relatively easy to spot, it has caused some concern about the trustworthiness of results based on digitized documents. We evaluated the impact of OCR quality on retrieval tasks, and studied the effect of manually improving (parts of) a collection on retrievability bias. The insights we gained helped us understanding researchers' needs better.

Our work provides a small number of examples, which demonstrate that data quality and tool bias are real concerns to the Digital Humanities community. To address these challenges, intense multidisciplinary exchange is required:
• Humanities scholars need to enhance the awareness that software tools and data sets are not free of bias and develop skills to detect and evaluate biases and their impact on research tasks. Guidelines should be developed that help scholars to perform tool criticism.
• Tool developers need to be more transparent and provide sufficient information about their tools to allow the task-based evaluation of their tools' performance.
• Data custodians need to make as much information about their collection available as possible. This should include which tools were used in the digitization process, in addition to both the limitations of the provided data and infrastructure used. The goal should be a mutual understanding of each others' assumptions, approaches and requirements and more transparency concerning the use of tools in the processing of data. This will help scholars to develop effective methods of digital tool criticism to critically assess the impact of existing tools on their (re-)search results and to communicate on an equal footing with tool developers on how to develop future versions, which better suit their needs.