Poster
Realistic Evaluation of Deep Partial-Label Learning Algorithms
Wei Wang · Dong-Dong Wu · Jindong Wang · Gang Niu · Min-Ling Zhang · Masashi Sugiyama
Hall 3 + Hall 2B #478
Partial-label learning (PLL) is a weakly supervised learning problem in whicheach example is associated with multiple candidate labels and only one is thetrue label. In recent years, many deep PLL algorithms have been developed toimprove model performance. However, we find that some early developedalgorithms are often underestimated and can outperform many later algorithmswith complicated designs. In this paper, we delve into the empiricalperspective of PLL and identify several critical but previously overlookedissues. First, model selection for PLL is non-trivial, but has never beensystematically studied. Second, the experimental settings are highlyinconsistent, making it difficult to evaluate the effectiveness of thealgorithms. Third, there is a lack of real-world image datasets that can becompatible with modern network architectures. Based on these findings, wepropose PLENCH, the first Partial-Label learning bENCHmark to systematicallycompare state-of-the-art deep PLL algorithms. We investigate the modelselection problem for PLL for the first time, and propose novel model selectioncriteria with theoretical guarantees. We also create Partial-Label CIFAR-10(PLCIFAR10), an image dataset of human-annotated partial labels collected fromAmazon Mechanical Turk, to provide a testbed for evaluating the performance ofPLL algorithms in more realistic scenarios. Researchers can quickly andconveniently perform a comprehensive and fair evaluation and verify theeffectiveness of newly developed algorithms based on PLENCH. We hope thatPLENCH will facilitate standardized, fair, and practical evaluation of PLLalgorithms in the future.
Live content is unavailable. Log in and register to view live content