Poster
in
Workshop: Privacy Regulation and Protection in Machine Learning
Cache Me If You Can: The Case For Retrieval Augmentation in Federated Learning
Aashiq Muhamed · Pratiksha Thaker · Mona Diab · Mona Diab · Virginia Smith
We propose retrieval augmentation (RA) as an enhancement to federated learning (FL) that can improve privacy protection and ensure regulatory compliance. FL, primarily designed for data privacy preservation, faces challenges with conventional parametric models which are susceptible to privacy breaches and potentially non-compliant with regulations such as data erasure mandates. RA addresses these issues by integrating a retrieval-based method during the inference phase, achieving "perfect secrecy" by limiting server access to private documents and reducing barriers to compliance. This study conducts a thorough evaluation of RA's efficacy within the FL paradigm, positioning it as a preferable alternative to traditional parametric models within analogous memory constraints. We characterize potential applications that may benefit from RA in FL, showing in particular that it is well-suited for knowledge-intensive, few-shot environments—offering scalable inference-time operations, source attribution, and the ability to dynamically update and unlearn knowledge for compliance. We present a new modeling framework, named Raffle, to investigate RA for FL applications with labeled and unlabeled data. Implementing Raffle in homogeneous settings for few-shot question answering, we explore the influence on client participation dynamics and the importance of passage index composition for effective generalization.