Poster
in
Workshop: Workshop on Distributed and Private Machine Learning
On Privacy and Confidentiality of Communications in Organizational Graphs
Masoumeh Shafieinejad · Huseyin Inan · Marcello Hasegawa · Robert Sim
Machine learned models trained on organizational communication data, such as emails in an enterprise, carry unique risks of breaching confidentiality, even if the model is intended only for internal use. This work shows how confidentiality is distinct from privacy in an enterprise context, and aims to formulate an approach to preserving confidentiality while leveraging principles from differential privacy (DP). Works that apply DP techniques to natural language processing tasks usually assume independently distributed data, and overlook potential correlation among the records. Ignoring this correlation results in a fictional promise of privacy while, conversely, extending DP techniques to include group privacy is over-cautious and severely impacts model utility. We introduce a middle-ground solution, proposing a model that captures the correlation in the social network graph, and incorporates this correlation in the privacy calculations through Pufferfish privacy principles.