What does it mean to be differentially private?


Paul Francis at IAPP: “Back in June 2016, Apple announced it will use differential privacy to protect individual privacy for certain data that it collects. Though already a hot research topic for over a decade, this announcement introduced differential privacy to the broader public. Before that announcement, Google had already been using differential privacy for collecting Chrome usage statistics. And within the last month, Uber announced that they too are using differential privacy.

If you’ve done a little homework on differential privacy, you may have learned that it provides provable guarantees of privacy and concluded that a database that is differentially private is, well, private — in other words, that it protects individual privacy. But that isn’t necessarily the case. When someone says, “a database is differentially private,” they don’t mean that the database is private. Rather, they mean, “the privacy of the database can be measured.”

Really, it is like saying that “a bridge is weight limited.” If you know the weight limit of a bridge, then yes, you can use the bridge safely. But the bridge isn’t safe under all conditions. You can exceed the weight limit and hurt yourself.

The weight limit of bridges is expressed in tons, kilograms or number of people. Simplifying here a bit, the amount of privacy afforded by a differentially private database is expressed as a number, by convention labeled ε (epsilon). Lower ε means more private.

All bridges have a weight limit. Everybody knows this, so it sounds dumb to say, “a bridge is weight limited.” And guess what? All databases are differentially private. Or, more precisely, all databases have an ε. A database with no privacy protections at all has an ε of infinity. It is pretty misleading to call such a database differentially private, but mathematically speaking, it is not incorrect to do so. A database that can’t be queried at all has an ε of zero. Private, but useless.

In their paper on differential privacy for statistics, Cynthia Dwork and Adam Smith write, “The choice of ε is essentially a social question. We tend to think of ε as, say, 0.01, 0.1, or in some cases, ln 2 or ln 3.” The natural logarithm of 3 (ln 3) is around 1.1….(More)”.