Machine learning is widely used for mining collections, such as images, sounds, or texts, by classifying their elements into categories. Automatic classification based on supervised learning requires groundtruth datasets for modeling the elements to classify, and for testing the quality of the classification. Because collecting groundtruth is tedious, a method for estimating the potential errors in large datasets based on limited groundtruth is needed. We propose a method that improves classification quality by using limited groundtruth data to extrapolate the potential errors in larger datasets. It significantly improves the counting of elements per class. We further propose visualization designs for understanding and evaluating the classification uncertainty. They support end-users in considering the impact of potential misclassifications for interpreting the classification output. This work was developed to address the needs of ecologists studying fish population abundance using computer vision, but generalizes to a larger range of applications. Our method is largely applicable for a variety of Machine learning technologies, and our visualizations further support their transfer to end-users.

doi.org/10.1007/s00530-015-0479-0
Multimedia Systems
Human-Centered Data Analytics

Boom, B., Beauxis-Aussalet, E., Hardman, L., & Fisher, R. (2016). Uncertainty‑aware estimation of population abundance using machine learning. Multimedia Systems, 22, 737–749. doi:10.1007/s00530-015-0479-0