‘Differential Privacy,’ or How Apple Finds the Most Popular Emojis Without Reading Your Texts

On Apple’s Machine Learning Journal, the company’s Differential Privacy Team explains how they can gain insight into the overall user experience without ever having data that is traceable to an individual user. The relative popularity of different emojis is one application, but similar approaches are used to add new words to autocorrect databases and to figure out which websites use the most computational resources.

The ability to amass, store, manipulate and analyze information from millions of people at once has opened a vast frontier of new research methods. But, whether these methods are used in the service of new business models or new scientific findings, they also raise questions for the individuals whose information comprises these “big data” sets. Can anyone really guarantee that these individuals’ information will remain private?

As a member of The Warren Center for Network & Data Sciences, Aaron Roth, Class of 1940 Bicentennial Term Associate Professor in the Department of Computer and Information Science is trying to answer that question.

Roth is one of the computer science researchers who first developed the idea of “differential privacy,” which is an algorithmic approach to data analysis that makes formal guarantees about how much it can reveal about individual members of a given data set. The “different” in its name refers to the fact that differentially private analyses should remain functionally identical when applied to two different datasets: one with and one without the data from any single individual.

Last year, Apple announced that it would be incorporating differential privacy into its operating systems. Now, on in its Machine Learning Journal, the company’s Differential Privacy Team explains how they can gain insight into the overall user experience without ever having data that is traceable to an individual user. The relative popularity of different emojis is one application, but similar approaches are used to add new words to autocorrect databases and to figure out which websites use the most computational resources.

Rob Verger has in-depth analysis of the paper, along with commentary from Roth, at Popular Science.