Abstract

How to allow deep learning on your data without revealing your data

Sanjeev Arora
Princeton University

In various settings involving multiple parties with private data (as in Federated Learning) there is a need to allow deep learning on their private data without any party needing to reveal their data to other parties. There have been well-publicized attacks showing how to compromise privacy in current Federated Learning frameworks.

We describe InstaHide and TextHide, methods to "encrypt" images and text (or gradients based upon them) that seek to provide such security. The main idea (inspired by MixUp data augmentation technique) is similar to classical subset sum encryption. We also describe a new attack by Carlini et al 2020 on InstaHide showing that the subset sum framework is not the right way to argue security in this setting. The attack is a clever mix of combinatorial algorithms and deep learning. However, its running time and other empirical properties still do not yet compromise InstaHide and TextHide in the intended applications, including Federated Learning.