Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Privacy Regulation and Protection in Machine Learning

Differentially Private Best Subset Selection Via Integer Programming

Kayhan Behdin · Peter Prastakos · Rahul Mazumder


Abstract: We study the Best Subset Selection (BSS) problem under Differential Privacy (DP), where one seeks to find a subset of features and a linear estimator supported on the said subset that minimize the least squares error. Due to the combinatorial structure, solving BSS has been traditionally challenging. However, thanks to recent advancements in Mixed Integer Programming (MIP), solving large-scale (non-private) BSS with millions of variables has become feasible in minutes. Despite this, the application of such algorithmic developments to BSS under DP has remained limited. In this paper, we propose a new DP estimator for model selection in BSS. Particularly, inspired by the exponential mechanism, we design a new sampling procedure which makes use of MIP to explore the (non-convex) objective landscape of BSS in a structured fashion. This circumvents the need for combinatorial exhaustive search as required by the exponential mechanism. This allows us to solve BSS with hundreds of features and $(\varepsilon,0)$-DP guarantees in minutes, whereas standard (exhaustive search based) exponential mechanism would require tens of days of runtime and tens of terabytes of storage. To our knowledge, our work is a first to employ techniques from integer programming to design a differentially private algorithm for BSS, specially under pure differential privacy.

Chat is not available.