Abstract:
The goal of private data analysis is to release aggregate statistics
about a data set while protecting the privacy of the individuals
whose information the data set contains.
We introduce a new generic framework for private data analysis, where
functions f of the data are released with instance-based additive
noise. That is, the noise magnitude is determined not only by the
function we want to release, but also by the database itself. One of
the challenges we address is to guarantee that the noise magnitude
itself should not leak information about the database. For that we
calibrate the noise magnitude to the *smooth sensitivity* of f on
the database x - a measure of how variable f is in the neighborhood
of the instance x. To our knowledge, this is the first formal
analysis of the effect of instance-based noise in the context of
data privacy.
To apply the framework, one should compute or approximate the smooth
sensitivity of f on x. We show how to do this efficiently for several
functions including the median and the cost of the minimum spanning
tree. We also give a generic procedure based on sampling that allows
one to release f(x) accurately on many databases x even when no
efficient algorithm for approximating smooth sensitivity of f is
known or when f is given as a black box.
Joint work with Sofya Raskhodnikova and Adam Smith, STOC 2007.