Known Vulnerabilities#
This page describes known vulnerabilities in Tumult Core that we intend to fix.
Stability imprecision bug#
Tumult Core is susceptible to the class of vulnerabilities described in Section
6 of [Mir12]. In particular, when summing floating point numbers, the
claimed sensitivity may be smaller than the true sensitivity. This vulnerability
affects the Sum
transformation when the domain of the
measure_column is SparkFloatColumnDescriptor
. Measurements that
involve a Sum
transformation on floating point numbers may have a
privacy loss that is larger than the claimed privacy loss.
Floating point overflow/underflow#
Tumult Core is susceptible to privacy leakage from floating point overflow and underflow. Users should not perform operations that may cause overflow/underflow.
Tumult Core does have some basic measures to protect users from certain
floating point overflow and underflow vulnerabilities: for Sum
aggregations, values must be clamped to \([-2^{970}, 2^{970}]\), so an overflow
or underflow can only happen when the number of values summed is more than
\(2^{52}\). We expect users to hit performance bottlenecks before this
happens.
Parallel composition#
Measurement
s in Tumult Core provide privacy guarantees in the form of privacy
functions and relations (See Tumult Core Privacy Guarantee for more details). Let \(M\) be
a Core measurement and \(f\) be its privacy function. If \(f(k) = \epsilon\) then for any pair of inputs that are
at most \(k\)-apart under the input metric, the distance between the outputs of the measurement
under the output measure is at most \(\epsilon\). Crucially, this is the exclusive guarantee provided
by a Core measurement; because of the generality of the Core privacy framework, it is possible to construct measurements with certain combinations of input domain, input metric, and output measure where this guarantee does not extend to group privacy. In particular, it is possible to construct a measurement for
inputs that are \(c*k\) apart, the outputs are more than \(c*\epsilon\) apart.
The ParallelComposition
measurement allows different Measurement
s to be evaluated on different
partitions of an input. Given a list \(M_1, M_2, ..., M_n\) of measurements, the privacy function \(F(d)\) for the
parallel composition, \(M\), of these measurements is evaluated as the max of the privacy functions for
each of the \(n\) measurements evaluated at \(d\).
where \(F\) corresponds to the privacy function of \(M\) and \(F_i\) corresponds to the privacy function of \(M_i\).
An implicit assumption is made in this privacy function:
privacy functions for all PureDP
measurements are linear (or quadratic for RhoZCDP
measurements).
While this assumption holds for standard notions of pure differential privacy and zCDP – where the domain is the domain of all datasets with some schema and neighboring datasets are obtained by adding or removing a record – as a result of group privacy, this not true of privacy functions for Core measurements in general. Consider the following example. Let \(M_1\) and \(M_2\) denote 2 core measurements composed together in a ParallelComposition measurement \(M\). To verify that inputs that differ in 2 records produce outputs that are \(\epsilon\)-indistinguishable under PureDP, we need to verify that \(F(2) \leq \epsilon\) holds. The parallel composition measurement verifies this by checking that: \(F_1(2) \leq \epsilon \wedge F_2(2) \leq \epsilon\). Let \(X\) and \(X'\) be two inputs to \(M\) that differ in 2 records. Let \(x_1\) and \(x_2\) be the partitions of \(X\) corresponding to \(M_1\) and \(M_2\), and let \(x_1'\) and \(x_2'\) be the corresponding partitions of \(X'\). Consider the possible combinations of distances between the corresponding partitions:
\(d(x_1, x_1') = 0\) and \(d(x_2, x_2') = 2\)
\(d(x_1, x_1') = 1\) and \(d(x_2, x_2') = 1\)
\(d(x_1, x_1') = 2\) and \(d(x_2, x_2') = 0\)
In order to verify that \(F(2) \leq \epsilon\) holds, we need to verify that all of the following hold:
\(F_1(0) \leq 0 \wedge F_2(2) \leq \epsilon\)
\(F_1(1) \leq \epsilon_1 \wedge F_2(1) \leq \epsilon_2\) for some \(\epsilon_1, \epsilon_2 \geq 0\) such that \(\epsilon_1 + \epsilon_2 = \epsilon\)
\(F_1(2) \leq \epsilon \wedge F_2(0) \leq 0\)
Currently, ParallelComposition
‘s privacy function assumes that the privacy functions of PureDP measurements are
linear, and infers all 3 of these from \(F_1(2) \leq \epsilon \wedge F_2(2) \leq \epsilon\). However, it is possible to
add new Core measurements whose privacy functions violate this assumption of linearity. When composing such measurements, ParallelComposition
will not actually provide the privacy guarantee it claims. Note that no existing Core component violates the implicit assumption made by
the privacy function of ParallelComposition
.
Evaluating privacy and stability relations using SymPy#
Tumult Core uses SymPy to evaluate privacy and stability relations for all measurements and transformations. Real numbers are represented using symbolic expressions, and inequalities relevant to evaluating the privacy relation of a measurement for a given pair of inputs are solved using SymPy. In most cases, these inequalities are evaluated analytically; however, this isn’t always possible and SymPy occassionally resorts to numerical methods of approximating the expressions. In such cases, SymPy does not provide any correctness guarantees. In some cases, SymPy may fail to solve the inequality altogether, making it impossible to verify the privacy properties of Core measurements.