Machine Learning is widely used for mining collections, such as images, sounds, or texts, by classifying their elements into categories. Automatic classication based on supervised learning requires groundtruth datasets for modeling the elements to classify, and for testing the quality of the classication. Because collecting groundtruth is tedious, a method for estimating the potential errors in large datasets based on limited groundtruth is needed. We propose a method that improves classication quality by using limited groundtruth data to extrapolate the potential errors in larger datasets. It signicantly improves the counting of elements per class. We further propose visualization designs for understanding and evaluating the classication uncertainty. They support end-users in considering the impact of potential misclassications for interpreting the classication output. This work was developed to address the needs of ecologists studying sh population abundance using computer vision, but generalizes to a larger range of applications. Our method is largely applicable for a variety of Machine Learning technologies, and our visualizations further support their transfer to end-users.
Additional Metadata
Keywords Supervised Machine Learning, Uncertainty Visualization, Logistic Regression
THEME Information (theme 2), Life Sciences (theme 5)
Publisher Springer
Journal ACM Multimedia Systems Journal
Project Supporting humans in knowledge gathering and question answering w.r.t. marine and environmental monitoring through analysis of multiple video streams
Boom, B.J, Beauxis-Aussalet, E.M.A.L, Hardman, L, & Fisher, R.B. (2015). Uncertainty-Aware Estimation of Population Abundance using Machine Learning. ACM Multimedia Systems Journal.