SessionProgram#
from tmlt.analytics import SessionProgram
- class tmlt.analytics.SessionProgram(builder)#
Bases:
ABC
Base class for defining a structured DP program that uses the Session API.
Example usage can be found in the API reference, and in the Tumult Tune tutorials.
Warning
SessionProgram
s should not be directly constructed. Instead, users should create a subclass ofSessionProgram
, then create an instance of theirSessionProgram
using the automatically generatedBuilder
attribute of that subclass.- class ProtectedInputs#
Bases:
object
Annotation class for protected inputs to a
SessionProgram
.The ProtectedInput class enumerates the expected protected DataFrames that will be used in the program. These are the DataFrames that will be protected by differential privacy according to their protected change and the privacy budget provided in the builder.
Each protected DataFrame can be specified in the builder using
with_private_dataframe()
. They are then accessible in thesession_interaction()
method as a private source with the same name in the given Session.Example
>>> class ProgramWithProtectedInputs(SessionProgram): ... class ProtectedInputs: ... protected_df: DataFrame ... class Outputs: ... total_count: DataFrame ... def session_interaction(self, session: Session): ... print("Private sources:", session.private_sources) ... count_query = QueryBuilder("protected_df").count() ... total_count = session.evaluate(count_query, self.privacy_budget) ... return {"total_count": total_count} >>> protected_df = spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"]) >>> program = ( ... ProgramWithProtectedInputs.Builder() ... .with_privacy_budget(PureDPBudget(epsilon=1)) ... .with_private_dataframe("protected_df", protected_df, AddOneRow()) ... .build() ... ) >>> program.run() Private sources: ['protected_df'] {'total_count': DataFrame[count: bigint]}
- class UnprotectedInputs#
Bases:
object
An annotation class for unprotected inputs to a
SessionProgram
.The UnprotectedInput class enumerates the expected unprotected DataFrames that will be used by the program. These DataFrames are not protected by differential privacy, and can be accessed directly by the program. They are typically used to specify public information used in a public join or in a
KeySet
.Each unprotected DataFrame can be specified in the builder using
with_public_dataframe()
. They are then accessible in thesession_interaction()
method as a public source with the same name in the given session, or through theunprotected_inputs
property.Example
>>> class ProgramWithUnprotectedInputs(SessionProgram): ... class ProtectedInputs: ... protected_df: DataFrame ... class UnprotectedInputs: ... public_df: DataFrame ... class Outputs: ... total_count: DataFrame ... def session_interaction(self, session: Session): ... print("Public sources:", session.public_sources) ... assert session.public_source_dataframes == { ... "public_df": public_df ... } ... assert self.unprotected_inputs == {"public_df": public_df} ... count_query = QueryBuilder("protected_df").count() ... total_count = session.evaluate(count_query, self.privacy_budget) ... return {"total_count": total_count} >>> protected_df = spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"]) >>> public_df = spark.createDataFrame([(1, 2), (3, 4)], ["c", "d"]) >>> program = ( ... ProgramWithUnprotectedInputs.Builder() ... .with_privacy_budget(PureDPBudget(epsilon=1)) ... .with_private_dataframe("protected_df", protected_df, AddOneRow()) ... .with_public_dataframe("public_df", public_df) ... .build() ... ) >>> program.run() Public sources: ['public_df'] {'total_count': DataFrame[count: bigint]}
- class Parameters#
Bases:
object
Annotation class for parameters to a SessionProgram.
The Parameter class enumerates the expected parameters that will be used by the program. These parameters are arbitrary (typically simple) Python objects that are most often used to configure the behavior of the program, such as setting thresholds, clamping bounds, budget allocations, choosing among algorithms, etc.
Each parameter can be specified in the builder using
with_parameter()
. They are then accessible for use in thesession_interaction()
through theparameters
property.Example
>>> class ProgramWithParameters(SessionProgram): ... class ProtectedInputs: ... protected_df: DataFrame ... class Outputs: ... a_sum: DataFrame ... class Parameters: ... low: int ... high: int ... def session_interaction(self, session: Session): ... low = self.parameters["low"] ... high = self.parameters["high"] ... sum_query = QueryBuilder("protected_df").sum("a", low, high) ... a_sum = session.evaluate(sum_query, self.privacy_budget) ... return {"a_sum": a_sum} >>> protected_df = spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"]) >>> program = ( ... ProgramWithParameters.Builder() ... .with_privacy_budget(PureDPBudget(epsilon=1)) ... .with_private_dataframe("protected_df", protected_df, AddOneRow()) ... .with_parameter("low", 0) ... .with_parameter("high", 5) ... .build() ... ) >>> program.run() {'a_sum': DataFrame[a_sum: bigint]}
- class Outputs#
Bases:
object
Annotation class for the outputs of a
SessionProgram
.These outputs are expected to be returned by the
session_interaction()
method as a dictionary, where the keys are the names of the outputs and the values are the corresponding DataFrames.Example
>>> class ProgramWithOutputs(SessionProgram): ... class ProtectedInputs: ... protected_df: DataFrame ... class Outputs: ... total_count: DataFrame ... def session_interaction(self, session: Session): ... count_query = QueryBuilder("protected_df").count() ... total_count = session.evaluate(count_query, self.privacy_budget) ... return {"total_count": total_count} >>> protected_df = spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"]) >>> program = ( ... ProgramWithOutputs.Builder() ... .with_privacy_budget(PureDPBudget(epsilon=1)) ... .with_private_dataframe("protected_df", protected_df, AddOneRow()) ... .build() ... ) >>> program.run() {'total_count': DataFrame[count: bigint]}
- abstract session_interaction(session)#
The interaction with the Session that this program performs.
This method should be overridden by subclasses to generate the expected outputs of the program using the given session. The method should return a dictionary of the expected outputs, where the keys are the names of the outputs and the values are the corresponding DataFrames.
Warning
Do not call this method directly. Instead, call the
run()
method.
Attributes
The parameters for this program. |
|
Privacy budget for this program. |
|
Unprotected inputs for this program. |
Methods
Returns a dictionary associating each program output name with its type. |
|
Runs the program and return its outputs. |