Known Vulnerabilities#

This page describes known vulnerabilities in Tumult Core that we intend to fix.

Stability imprecision bug#

Tumult Core is susceptible to the class of vulnerabilities described in Section 6 of [Mir12]. In particular, when summing floating point numbers, the claimed sensitivity may be smaller than the true sensitivity. This vulnerability affects the Sum transformation when the domain of the measure_column is SparkFloatColumnDescriptor. Measurements that involve a Sum transformation on floating point numbers may have a privacy loss that is larger than the claimed privacy loss.

Floating point overflow/underflow#

Tumult Core is susceptible to privacy leakage from floating point overflow and underflow. Users should not perform operations that may cause overflow/underflow.

Tumult Core does have some basic measures to protect users from certain floating point overflow and underflow vulnerabilities: for Sum aggregations, values must be clamped to \([-2^{970}, 2^{970}]\), so an overflow or underflow can only happen when the number of values summed is more than \(2^{52}\). We expect users to hit performance bottlenecks before this happens.

Parallel composition#

Measurements in Tumult Core provide privacy guarantees in the form of privacy functions and relations (See Tumult Core Privacy Guarantee for more details). Let \(M\) be a Core measurement and \(f\) be its privacy function. If \(f(k) = \epsilon\) then for any pair of inputs that are at most \(k\)-apart under the input metric, the distance between the outputs of the measurement under the output measure is at most \(\epsilon\). Crucially, this is the exclusive guarantee provided by a Core measurement; because of the generality of the Core privacy framework, it is possible to construct measurements with certain combinations of input domain, input metric, and output measure where this guarantee does not extend to group privacy. In particular, it is possible to construct a measurement for inputs that are \(c*k\) apart, the outputs are more than \(c*\epsilon\) apart.

The ParallelComposition measurement allows different Measurements to be evaluated on different partitions of an input. Given a list \(M_1, M_2, ..., M_n\) of measurements, the privacy function \(F(d)\) for the parallel composition, \(M\), of these measurements is evaluated as the max of the privacy functions for each of the \(n\) measurements evaluated at \(d\).

\[F(d) = \max_{i=1}^n F_i(d)\]

where \(F\) corresponds to the privacy function of \(M\) and \(F_i\) corresponds to the privacy function of \(M_i\).

An implicit assumption is made in this privacy function: privacy functions for all PureDP measurements are linear (or quadratic for RhoZCDP measurements).

While this assumption holds for standard notions of pure differential privacy and zCDP – where the domain is the domain of all datasets with some schema and neighboring datasets are obtained by adding or removing a record – as a result of group privacy, this not true of privacy functions for Core measurements in general. Consider the following example. Let \(M_1\) and \(M_2\) denote 2 core measurements composed together in a ParallelComposition measurement \(M\). To verify that inputs that differ in 2 records produce outputs that are \(\epsilon\)-indistinguishable under PureDP, we need to verify that \(F(2) \leq \epsilon\) holds. The parallel composition measurement verifies this by checking that: \(F_1(2) \leq \epsilon \wedge F_2(2) \leq \epsilon\). Let \(X\) and \(X'\) be two inputs to \(M\) that differ in 2 records. Let \(x_1\) and \(x_2\) be the partitions of \(X\) corresponding to \(M_1\) and \(M_2\), and let \(x_1'\) and \(x_2'\) be the corresponding partitions of \(X'\). Consider the possible combinations of distances between the corresponding partitions:

  1. \(d(x_1, x_1') = 0\) and \(d(x_2, x_2') = 2\)

  2. \(d(x_1, x_1') = 1\) and \(d(x_2, x_2') = 1\)

  3. \(d(x_1, x_1') = 2\) and \(d(x_2, x_2') = 0\)

In order to verify that \(F(2) \leq \epsilon\) holds, we need to verify that all of the following hold:

  1. \(F_1(0) \leq 0 \wedge F_2(2) \leq \epsilon\)

  2. \(F_1(1) \leq \epsilon_1 \wedge F_2(1) \leq \epsilon_2\) for some \(\epsilon_1, \epsilon_2 \geq 0\) such that \(\epsilon_1 + \epsilon_2 = \epsilon\)

  3. \(F_1(2) \leq \epsilon \wedge F_2(0) \leq 0\)

Currently, ParallelComposition‘s privacy function assumes that the privacy functions of PureDP measurements are linear, and infers all 3 of these from \(F_1(2) \leq \epsilon \wedge F_2(2) \leq \epsilon\). However, it is possible to add new Core measurements whose privacy functions violate this assumption of linearity. When composing such measurements, ParallelComposition will not actually provide the privacy guarantee it claims. Note that no existing Core component violates the implicit assumption made by the privacy function of ParallelComposition.

Evaluating privacy and stability relations using SymPy#

Tumult Core uses SymPy to evaluate privacy and stability relations for all measurements and transformations. Real numbers are represented using symbolic expressions, and inequalities relevant to evaluating the privacy relation of a measurement for a given pair of inputs are solved using SymPy. In most cases, these inequalities are evaluated analytically; however, this isn’t always possible and SymPy occassionally resorts to numerical methods of approximating the expressions. In such cases, SymPy does not provide any correctness guarantees. In some cases, SymPy may fail to solve the inequality altogether, making it impossible to verify the privacy properties of Core measurements.