Specifying privacy guarantees#

The Session is the main object used to specify formal privacy guarantees on sensitive data. Users specify privacy guarantees at Session initialization time, using one protected change per sensitive table, and an overall privacy budget. Together, these define the formal guarantee that the Session then enforces.

Once the Session is initialized, it then ensures that all future interactions with it satisfy the specified privacy guarantee. In particular, queries evaluated using evaluate() cannot consume more than the specified privacy budget.

A simple introduction to Session initialization and use can be found in the first and second tutorials. More details on the exact privacy promise provided by the Session can be found in the Privacy promise topic guide.

Session#

The Session is the fundamental abstraction used to enforce formal privacy guarantees on sensitive data.

Session

Allows differentially private query evaluation on sensitive data.

Initializing the Session#

Sessions can be initialized using the from_dataframe() method, or using a Builder.

Session.from_dataframe(privacy_budget, ...)

Initializes a DP session from a Spark dataframe.

Session.Builder()

Builder for Session.

Protected changes#

Each private table in a Session needs a protected change, which describes the maximal change in a table that will be protected by the privacy guarantees.

ProtectedChange()

Base class describing the change in a dataset that is protected under DP.

AddOneRow()

Protects the addition or removal of a single row.

AddMaxRows(max_rows)

Protects the addition or removal of any set of max_rows rows.

AddMaxRowsInMaxGroups(grouping_column, ...)

Protects the addition or removal of rows across a finite number of groups.

AddRowsWithID(id_column[, id_space])

Protects the addition or removal of rows with a specific identifier.

Privacy budgets#

Finally, the Session must be initialized with a privacy budget, which quantifies the maximum privacy loss of a differentially private program. There are different kinds of privacy budgets, depending on which variant of differential privacy is used for this quantification.

PrivacyBudget()

Base class for specifying the maximal privacy loss of a Session or a query.

PureDPBudget(epsilon)

A privacy budget under pure differential privacy.

ApproxDPBudget(epsilon, delta)

A privacy budget under approximate differential privacy.

RhoZCDPBudget(rho)

A privacy budget under rho-zero-concentrated differential privacy.

Inspecting Session state#

The Session provides multiple properties and methods allowing users to inspect its state.

Session.private_sources

Returns the IDs of the private sources.

Session.public_sources

Returns the IDs of the public sources.

Session.public_source_dataframes

Returns a dictionary of public source DataFrames.

Session.remaining_privacy_budget

Returns the remaining privacy_budget left in the session.

Session.describe([obj])

Describes this session, or one of its tables, or the result of a query.

Inspecting specific sources#

The schema and properties of each table in a Session can be inspected using the following methods.

Session.get_schema(source_id)

Returns the schema for any data source.

Session.get_column_types(source_id)

Returns the column types for any data source.

Session.get_grouping_column(source_id)

Returns an optional column that must be grouped by in this query.

Session.get_id_column(source_id)

Returns the ID column of a table, if it has one.

Session.get_id_space(source_id)

Returns the ID space of a table, if it has one.

Evaluating queries with the Session#

Once a Session is initialized, users can build queries and evaluate them using the relevant Session methods.