# Understanding sensitivity#

This topic guide goes into detail on the concept of *sensitivity* in
differential privacy.

Sensitivity is the maximum impact that a protected change can have on a query’s results. It directly impacts how much noise must be added to achieve differential privacy: the bigger the sensitivity, the more noise needs to be added to the result.

With Tumult Analytics, the type of protected change depends on whether the goal is
to hide a fixed number of rows, using
`AddMaxRows`

, or arbitrarily many
rows sharing the same privacy identifier, using
`AddRowsWithID`

.

A simple example of sensitivity is the explanation of clamping bounds in tutorial three: larger clamping bounds mean that a single row can have more influence, and more noise needs to be added. However, the sensitivity is not always so straightforward to estimate. In this topic guide, we will examine how different types of inputs and transformations to a query can affect its sensitivity. Understand this relationship will help you choose what transformations to use to ensure accurate results while maintaining strong privacy guarantees.

## Queries on tables using `AddMaxRows`

#

Queries on tables using the `AddMaxRows`

or
`AddOneRow`

protected change
protect the addition or removal of rows in the table. This means that any
operation which changes the number of rows requires a corresponding increase to the
protected change. A larger protected change corresponds to a higher sensitivity for a query,
which means more noise needs to be added to the query result. Specifically, sensitivity
scales *linearly* with the protected change.

A few operations can increase the sensitivity of a query in this way:
`flat maps`

,
`public joins`

, and
`private joins`

.

### Flat maps#

A `flat_map`

maps each
input row to zero or more new rows. Consider the example from
tutorial five, where each input row is mapped to up
to three new rows, using to the `max_rows=3`

parameter. On a per-row basis, this operation might look like this:

In this example, the input table was initialized with the
`AddOneRow`

protected change,
which is equivalent to
`AddMaxRows`

with
`max_rows=1`

. However, because the flat map can produce up to three rows for each
input row, the protected change needs to be increased threefold to `max_rows=3`

,
which results in a corresponding threefold increase in sensitivity for the query.

Note

The sensitivity of a query is not affected by the number of rows *actually*
produced by a flat map, but only by the *maximum* number
of rows produced by the flat map. In the example above, the sensitivity would be
the same if all the input rows only had 1 or 2 genres, and no input row produced 3 output rows.

### Public joins#

Suppose we have two tables, `People`

(private table) and `States`

(public table),
which share a common column, `zipcode`

. A public join between these tables might look
like:

The join output contains one row for each match between the two tables. In this example,
Susie’s ZIP code happens to cross state boundaries: the `zipcode`

value 37752 appears
twice in the `States`

table! This means that Susie’s name and age appear in two rows
in the output table. To hide her contribution to the joined table, we need to increase
the protected change from `max_rows=1`

to `max_rows=2`

. More generally, if the
protected change protects \(n\) rows in the private table, and each join key value
appears in at most \(m\) rows in the public table, then the sensitivity of the join
is \(n * m\).

Note

Like with flatmaps, the sensitivity increase doesn’t depend on the *contents* of the
private table. It only depends on the contents of the public table, i.e. the
number of rows in the public table with each value of the join key.

### Private joins#

With private joins, *both* tables are private. This means that, unlike with a public
table in a public join, we cannot use the contents of either table directly to determine
the sensitivity: doing so would reveal information about individuals within the tables,
thus violating the privacy guarantee.

Suppose we have two tables, a `Users`

table and a `Purchases`

table, which share a
common column, `user_id`

. Each are initialized with a protected change of `AddMaxRows(max_rows=1)`

:

Since both tables contain sensitive information, we cannot look at
the data directly to calculate the sensitivity. Therefore, we need to truncate both tables by specifying a
`TruncationStrategy`

for
each. The sensitivity computation is more complicated than before:

\(\text{sensitivity} = (T_{left} * S_{right} * M_{right}) + (T_{right} * S_{left} * M_{left})\)

where:

\(T_{left}\) and \(T_{right}\) are the truncation thresholds, i.e.

`max_rows`

, for the left and right tables, respectively. When using`DropNonUnique`

, these values are always 1.\(S_{left}\) and \(S_{right}\) are factors called the

stabilityof each`TruncationStrategy`

. These values are always 2 for`DropExcess`

and 1 for`DropNonUnique`

.\(M_{left}\) and \(M_{right}\) are the

`max_rows`

parameters of the protected change on the left and right tables, respectively.

In this example, if we choose a truncation strategy of `DropExcess(max_rows=2)`

for
both tables, they will be truncated to include no more than two rows for each value of
our join key, `user_id`

. The private join might look something like:

In this case, our `DropExcess()`

truncation strategies each had bounds of
`max_rows=2`

, and our tables each had a protected change of
`AddMaxRows(max_rows=1)`

. The sensitivity of the join is then:
\(\text{sensitivity} = 2 * 2 * 1 + 2 * 2 * 1 = 8\).

Note

Even though the `Users`

table did not *actually* contain more than one row per
`user_id`

, the sensitivity is still increased via the
`DropExcess(max_rows=2)`

truncation strategy. Again, this is because we don’t
look at the contents of private tables directly, and instead use the information
given by the `TruncationStrategy`

for each table.

Note

When we know that a table always contains only one row per join key, it’s preferable
to use `DropNonUnique`

, due to the smaller truncation stability. In this case,
using `DropNonUnique`

for the Users table and `DropExcess(max_rows=2)`

for the
Purchases table would have led to a join sensitivity of \(1 * 2 * 1 + 2 * 1 * 1 = 4\).
Using `DropExcess(max_rows=1)`

for the users table would have led to a sensitivity of
\(1 * 2 * 1 + 2 * 2 * 1 = 6\) instead.

As you can see, tracking stability can be complicated!

## Queries on tables using `AddRowsWithID`

#

Queries on tables using the
`AddRowsWithID`

protected change
protect the presence of arbitrarily many rows associated with the same privacy ID. In this case,
transformations don’t change the protected change: you can perform flat maps, public
joins, or private joins, and the protected change is still `AddRowsWithID`

.

However, before running aggregations, we must use the
`enforce`

to specify truncation
bounds via constraints. Constraints can be enforced at any point, but it’s generally
better to specify them immediately before performing aggregations. There are two main
ways to specify constraints: via a `MaxRowsPerID`

constraint, or a combination of `MaxGroupsPerID`

and
`MaxRowsPerGroupPerID`

. See the
Summary section of tutorial 6 for a visualization of these
truncation paths

The sensitivity of a query using the `AddRowsWithID`

protected change is impacted by
the type of constraint(s) used to truncate the tables, as well as the type of noise
added to the data. There are three cases:

Using

`MaxRowsPerID`

, the sensitivity increases linearly with the truncation parameter.Using

`MaxGroupsPerID`

and`MaxRowsPerGroupPerID`

, the sensitivity depends on the type of noise added to the data.With

*Laplace*noise (the default under`PureDP`

), the sensitivity increases like a product of the two`max`

truncation parameters: \(sensitivity = (MaxRowsPerGroupPerID.max) * (MaxGroupsPerID.max)\)With

*Gaussian*noise (the default under`rhoZCDP`

), the sensitivity increases like a product of the`max`

truncation parameter for`MaxRowsPerGroupPerID`

and the square root of the`max`

for`MaxGroupsPerID`

: \(sensitivity = (MaxRowsPerGroupPerID.max) * \sqrt{(MaxGroupsPerID.max)}\)

For this last case, combining `MaxGroupsPerID`

and `MaxRowsPerGroupPerID`

, we
visualize the sensitivity in the diagram below.

Note that the sensitivity determines the noise *multiplier*, but different noise
distributions also have different behaviors: for low sensitivity values and comparable
privacy budgets, Laplace noise tends to have a smaller variance than Gaussian noise. But
for large values of `MaxGroupsPerID`

, the sensitivity used with Gaussian noise will be
much larger than that of Laplace noise, and Gaussian noise will be a better choice.

For a more in-depth comparison of both kinds of noise, you can consult this blog post.

While this topic guide covers the most common cases of sensitivity tracking in Tumult
Analytics, it is certainly not exhaustive. If you have additional questions, feel free
to reach out to us on our Slack server in the
**#library-questions** channel!