Defining and using programs#

Note

PRO The features described in this page are only available on a paid version of the Tumult Platform. If you would like to hear more, please contact us at info@tmlt.io.

SessionPrograms are used to define structured DP programs that rely on the privacy protection provided by the Session API. By defining a standard interface for creating and running these programs, we can build higher level tools that can interact with them in a consistent way, such as SessionProgramTuner.

The SessionProgram class is an abstract base class that defines the interface for a structured DP program. It is designed to be subclassed to define specific programs.

Every SessionProgram has three minimal requirements:

  • Defines at least one protected input, which is a DataFrame that can only be accessed through the Session API.

  • Defines at least one output, which is a DataFrame that is produced by the program.

  • Defines a session_interaction() method that takes a Session as an argument and returns a dictionary containing the expected outputs.

>>> class MinimalProgram(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame  # DataFrame type annotation is required
...     class Outputs:
...         total_count: DataFrame  # required here too
...     def session_interaction(self, session: Session):
...         count_query = QueryBuilder("protected_df").count()
...         budget = self.privacy_budget  #  session.remaining_privacy_budget also works
...         total_count = session.evaluate(count_query, budget)
...         return {"total_count": total_count}

Once a program is defined, it can be instantiated by using the automatically generated Builder for that class. It has a very similar interface to Session.Builder.

>>> protected_df = spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"])
>>> program = (
...     MinimalProgram.Builder()
...     .with_privacy_budget(PureDPBudget(epsilon=1))
...     .with_private_dataframe("protected_df", protected_df, AddOneRow())
...     .build()
... )

The program can then be run to produce the expected outputs.

>>> program.run()
{'total_count': DataFrame[count: bigint]}

Each instance of a program has the same privacy guarantee as a Session with the same privacy budget, protected DataFrames, and protected changes. Therefore, each instance of a program can only be run once. To run the program again, users must create a new instance of the program (which will consume additional privacy loss budget).

>>> program.run()
Traceback (most recent call last):
...
RuntimeError: run cannot be called more than once

Often, users will also want to define Parameters and/or UnprotectedInputs in their programs. This can be done by adding them to the program class similar to ProtectedInputs and Outputs.

Once a program is structured into a SessionProgram, users can take advantage of tools like SessionProgramTuner. More information can be found in the Tumult Tune tutorials.

Defining and initializing a program#

SessionProgram

Base class for defining a structured DP program that uses the Session API.

SessionProgram.Builder

Automatically generated builder for initializing a SessionProgram.

Inspecting a program#

SessionProgram.privacy_budget

Privacy budget for this program.

SessionProgram.unprotected_inputs

Unprotected inputs for this program.

SessionProgram.parameters

The parameters for this program.

SessionProgram.output_types()

Returns a dictionary associating each program output name with its type.

Running a program#

SessionProgram.run()

Runs the program and return its outputs.