CLERF: Contrastive LEaRning for Full-Range Head Pose Estimation
Abstract
We propose a novel framework for representation learning in head pose estimation (HPE) that overcomes the challenges posed by sparse head pose data, which previously made triplet sampling infeasible. Leveraging recent advances in 3D-aware generative adversarial networks (3D GANs), we generate anchor-positive-negative triplets and perform contrastive learning on extensively augmented data, including geometric transformations. This enables the network to learn robust, geometry-aware representations that improve HPE accuracy. We observe that existing HPE models struggle when test images are slightly rotated or flipped, while our method maintains strong performance. Experiments show that our framework matches state-of-the-art models on standard test sets and outperforms them on augmented and full-range poses. Our model handles full-range HPE, accurately predicting head poses across the entire rotation spectrum, including upside-down orientations, and outperforms existing full-yaw range methods.