Estimating the F1 score for learning from positive and unlabeled examples

Tabatabaei, Seyed Amin; Klein, Jan; Hoogendoorn, Mark

doi:10.1007/978-3-030-64583-0_15

Semi-supervised learning can be applied to datasets that contain both labeled and unlabeled instances and can result in more accurate predictions compared to fully supervised or unsupervised learning in case limited labeled data is available. A subclass of problems, called Positive-Unlabeled (PU) learning, focuses on cases in which the labeled instances contain only positive examples. Given the lack of negatively labeled data, estimating the general performance is difficult. In this paper, we propose a new approach to approximate the F1 score for PU learning. It requires an estimate of what fraction of the total number of positive instances is available in the labeled set. We derive theoretical properties of the approach and apply it to several datasets to study its empirical behavior and to compare it to the most well-known score in the field, LL score. Results show that even when the estimate is quite off compared to the real fraction of positive labels the approximation of the F1 score is significantly better compared with the LL score.

Additional Metadata
Stakeholder	Elsevier B.V.
Persistent URL	doi.org/10.1007/978-3-030-64583-0_15
Series	Lecture Notes in Computer Science
Organisation	Stochastics
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Tabatabaei, S. A., Klein, J., & Hoogendoorn, M. (2021). Estimating the F1 score for learning from positive and unlabeled examples. In LOD 2020: Machine Learning, Optimization, and Data Science (pp. 1–12). doi:10.1007/978-3-030-64583-0_15

View at Publisher

Full Text ( Author Manuscript , 1mb )

Estimating the F1 score for learning from positive and unlabeled examples

Publication

Publication

Address

CWI researchers

Questions or comments?

Estimating the F1 score for learning from positive and unlabeled examples

Publication

Publication

Workflow

Workflow

Add Content