SessionProgram#

from tmlt.analytics import SessionProgram
class tmlt.analytics.SessionProgram(builder)#

Bases: ABC

Base class for defining a structured DP program that uses the Session API.

Example usage can be found in the API reference, and in the Tumult Tune tutorials.

Warning

SessionPrograms should not be directly constructed. Instead, users should create a subclass of SessionProgram, then create an instance of their SessionProgram using the automatically generated Builder attribute of that subclass.

class ProtectedInputs#

Bases: object

Annotation class for protected inputs to a SessionProgram.

The ProtectedInput class enumerates the expected protected DataFrames that will be used in the program. These are the DataFrames that will be protected by differential privacy according to their protected change and the privacy budget provided in the builder.

Each protected DataFrame can be specified in the builder using with_private_dataframe(). They are then accessible in the session_interaction() method as a private source with the same name in the given Session.

Example

>>> class ProgramWithProtectedInputs(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class Outputs:
...         total_count: DataFrame
...     def session_interaction(self, session: Session):
...         print("Private sources:", session.private_sources)
...         count_query = QueryBuilder("protected_df").count()
...         total_count = session.evaluate(count_query, self.privacy_budget)
...         return {"total_count": total_count}
>>> protected_df = spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"])
>>> program = (
...     ProgramWithProtectedInputs.Builder()
...     .with_privacy_budget(PureDPBudget(epsilon=1))
...     .with_private_dataframe("protected_df", protected_df, AddOneRow())
...     .build()
... )
>>> program.run()
Private sources: ['protected_df']
{'total_count': DataFrame[count: bigint]}
class UnprotectedInputs#

Bases: object

An annotation class for unprotected inputs to a SessionProgram.

The UnprotectedInput class enumerates the expected unprotected DataFrames that will be used by the program. These DataFrames are not protected by differential privacy, and can be accessed directly by the program. They are typically used to specify public information used in a public join or in a KeySet.

Each unprotected DataFrame can be specified in the builder using with_public_dataframe(). They are then accessible in the session_interaction() method as a public source with the same name in the given session, or through the unprotected_inputs property.

Example

>>> class ProgramWithUnprotectedInputs(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class UnprotectedInputs:
...         public_df: DataFrame
...     class Outputs:
...         total_count: DataFrame
...     def session_interaction(self, session: Session):
...         print("Public sources:", session.public_sources)
...         assert session.public_source_dataframes == {
...             "public_df": public_df
...         }
...         assert self.unprotected_inputs == {"public_df": public_df}
...         count_query = QueryBuilder("protected_df").count()
...         total_count = session.evaluate(count_query, self.privacy_budget)
...         return {"total_count": total_count}
>>> protected_df = spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"])
>>> public_df = spark.createDataFrame([(1, 2), (3, 4)], ["c", "d"])
>>> program = (
...     ProgramWithUnprotectedInputs.Builder()
...     .with_privacy_budget(PureDPBudget(epsilon=1))
...     .with_private_dataframe("protected_df", protected_df, AddOneRow())
...     .with_public_dataframe("public_df", public_df)
...     .build()
... )
>>> program.run()
Public sources: ['public_df']
{'total_count': DataFrame[count: bigint]}
class Parameters#

Bases: object

Annotation class for parameters to a SessionProgram.

The Parameter class enumerates the expected parameters that will be used by the program. These parameters are arbitrary (typically simple) Python objects that are most often used to configure the behavior of the program, such as setting thresholds, clamping bounds, budget allocations, choosing among algorithms, etc.

Each parameter can be specified in the builder using with_parameter(). They are then accessible for use in the session_interaction() through the parameters property.

Example

>>> class ProgramWithParameters(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class Outputs:
...         a_sum: DataFrame
...     class Parameters:
...         low: int
...         high: int
...     def session_interaction(self, session: Session):
...         low = self.parameters["low"]
...         high = self.parameters["high"]
...         sum_query = QueryBuilder("protected_df").sum("a", low, high)
...         a_sum = session.evaluate(sum_query, self.privacy_budget)
...         return {"a_sum": a_sum}
>>> protected_df = spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"])
>>> program = (
...     ProgramWithParameters.Builder()
...     .with_privacy_budget(PureDPBudget(epsilon=1))
...     .with_private_dataframe("protected_df", protected_df, AddOneRow())
...     .with_parameter("low", 0)
...     .with_parameter("high", 5)
...     .build()
... )
>>> program.run()
{'a_sum': DataFrame[a_sum: bigint]}
class Outputs#

Bases: object

Annotation class for the outputs of a SessionProgram.

These outputs are expected to be returned by the session_interaction() method as a dictionary, where the keys are the names of the outputs and the values are the corresponding DataFrames.

Example

>>> class ProgramWithOutputs(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class Outputs:
...         total_count: DataFrame
...     def session_interaction(self, session: Session):
...         count_query = QueryBuilder("protected_df").count()
...         total_count = session.evaluate(count_query, self.privacy_budget)
...         return {"total_count": total_count}
>>> protected_df = spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"])
>>> program = (
...     ProgramWithOutputs.Builder()
...     .with_privacy_budget(PureDPBudget(epsilon=1))
...     .with_private_dataframe("protected_df", protected_df, AddOneRow())
...     .build()
... )
>>> program.run()
{'total_count': DataFrame[count: bigint]}
abstract session_interaction(session)#

The interaction with the Session that this program performs.

This method should be overridden by subclasses to generate the expected outputs of the program using the given session. The method should return a dictionary of the expected outputs, where the keys are the names of the outputs and the values are the corresponding DataFrames.

Warning

Do not call this method directly. Instead, call the run() method.

Parameters:

session (Session) – The Session to interact with. It will be initialized with the protected and unprotected DataFrames as well as the privacy budget.

Return type:

Dict[str, DataFrame]

Attributes

SessionProgram.parameters

The parameters for this program.

SessionProgram.privacy_budget

Privacy budget for this program.

SessionProgram.unprotected_inputs

Unprotected inputs for this program.

Methods

SessionProgram.output_types

Returns a dictionary associating each program output name with its type.

SessionProgram.run

Runs the program and return its outputs.