Defining and using programs#
Note
The features described in this page are only available on a paid version of the Tumult Platform. If you would like to hear more, please contact us at info@tmlt.io.
SessionProgram
s are used to define structured DP
programs that rely on the privacy protection provided by the
Session
API. By defining a standard interface for
creating and running these programs, we can build higher level tools that can
interact with them in a consistent way, such as
SessionProgramTuner
.
The SessionProgram
class is an abstract base class that defines
the interface for a structured DP program. It is designed to be subclassed to define
specific programs.
Every SessionProgram
has three minimal requirements:
Defines at least one protected input, which is a DataFrame that can only be accessed through the
Session
API.Defines at least one output, which is a DataFrame that is produced by the program.
Defines a
session_interaction()
method that takes aSession
as an argument and returns a dictionary containing the expected outputs.
>>> class MinimalProgram(SessionProgram):
... class ProtectedInputs:
... protected_df: DataFrame # DataFrame type annotation is required
... class Outputs:
... total_count: DataFrame # required here too
... def session_interaction(self, session: Session):
... count_query = QueryBuilder("protected_df").count()
... budget = self.privacy_budget # session.remaining_privacy_budget also works
... total_count = session.evaluate(count_query, budget)
... return {"total_count": total_count}
Once a program is defined, it can be instantiated by using the automatically
generated Builder
for that class. It has a very
similar interface to Session.Builder
.
>>> protected_df = spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"])
>>> program = (
... MinimalProgram.Builder()
... .with_privacy_budget(PureDPBudget(epsilon=1))
... .with_private_dataframe("protected_df", protected_df, AddOneRow())
... .build()
... )
The program can then be run to produce the expected outputs.
>>> program.run()
{'total_count': DataFrame[count: bigint]}
Each instance of a program has the same privacy guarantee as a Session
with
the same privacy budget, protected DataFrames, and protected changes. Therefore, each
instance of a program can only be run once. To run the program again, users must create
a new instance of the program (which will consume additional privacy loss budget).
>>> program.run()
Traceback (most recent call last):
...
RuntimeError: run cannot be called more than once
Often, users will also want to define Parameters
and/or UnprotectedInputs
in their programs. This
can be done by adding them to the program class similar to
ProtectedInputs
and
Outputs
.
Once a program is structured into a SessionProgram
, users can take advantage
of tools like SessionProgramTuner
. More information can be found in
the Tumult Tune tutorials.
Defining and initializing a program#
Base class for defining a structured DP program that uses the Session API. |
|
Automatically generated builder for initializing a |
Inspecting a program#
Privacy budget for this program. |
|
Unprotected inputs for this program. |
|
The parameters for this program. |
|
Returns a dictionary associating each program output name with its type. |
Running a program#
Runs the program and return its outputs. |