Tuning programs#

Note

PRO The features described in this page are only available on a paid version of the Tumult Platform. If you would like to hear more, please contact us at info@tmlt.io.

Parameter tuning and optimization in Tumult is done with the SessionProgramTuner class, an abstract base class that defines the interface for tuning SessionPrograms.

To tune a specific program, users should subclass SessionProgramTuner, passing their SessionProgram as the program class argument.

>>> class Program(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class Outputs:
...         b_sum: DataFrame
...     class Parameters:
...         low: int
...         high: int
...     def session_interaction(self, session: Session):
...         low = self.parameters["low"]
...         high = self.parameters["high"]
...         a_values = KeySet.from_dict({"a": ["x", "y"]})
...         sum_query = QueryBuilder("protected_df").groupby(a_values).sum("b", low, high)
...         b_sum = session.evaluate(sum_query, self.privacy_budget)
...         return {"b_sum": b_sum}
>>> class Tuner(SessionProgramTuner, program=Program):
...     @joined_output_metric(name="root_mean_squared_error", output="b_sum", join_columns=["a"])
...     @staticmethod
...     def compute_rmse(joined_output: DataFrame):
...         err = sf.col("b_sum_dp") - sf.col("b_sum_baseline")
...         rmse = joined_output.agg(sf.sqrt(sf.avg(sf.pow(err, sf.lit(2)))).alias("rmse"))
...         return rmse.collect()[0]["rmse"]
...
...     metrics = [
...         MedianRelativeError(
...             output="b_sum",
...             measure_column="b_sum",
...             name=f"mre_{index}",
...             join_columns=["a"],
...         )
...     ]

Just like a SessionProgram, once a subclass of SessionProgramTuner is defined, it can be instantiated using the automatically-generated builder for that class. Unlike a SessionProgram, you can pass Tunable objects to the builder methods instead of concrete values.

>>> protected_df = spark.createDataFrame([("x", 2), ("y", 4)], ["a", "b"])
>>> tuner = (
...     Tuner.Builder()
...     .with_privacy_budget(Tunable("budget"))
...     .with_private_dataframe("protected_df", protected_df, AddOneRow())
...     .with_parameter("low", 0)
...     .with_parameter("high", Tunable("high"))
...     .build()
... )

The run() method can be used to run the program to get the outputs of the DP and baseline programs.

>>> outputs = tuner.run({"budget": PureDPBudget(1), "high": 1})

The error_report() method on the tuner can be used to run the program to get the DP and baseline outputs as well as the metrics defined in the Tuner class.

>>> tuner.error_report({"budget": PureDPBudget(1), "high": 1}).show()  
Error report ran with budget PureDPBudget(epsilon=1) and the following tunable parameters:
budget: PureDPBudget(epsilon=1)
high: 1
and the following additional parameters:
low: 0

Metric results:
+---------+-------------------------+-------------------------------------------------------+
|   Value | Metric                  | Description                                           |
+=========+=========================+=======================================================+
|    0.5  | mre                     | Median relative error for column b_sum of table b_sum |
+---------+-------------------------+-------------------------------------------------------+
|    3.16 | root_mean_squared_error | User-defined metric (no description)                  |
+---------+-------------------------+-------------------------------------------------------+

Another illustrated example of how to use a SessionProgramTuner to tune parameters can be found in the Tuning parameters tutorial.

Defining a SessionProgramTuner#

Classes and methods that can be used or subclassed to define a SessionProgramTuner that can be used to measure error and tune a specific SessionProgram.

SessionProgramTuner

Base class to define tuners to evaluate and optimize DP programs.

Defining baselines#

Baselines can be specified using the baseline_options class variable, or the @baseline decorator.

SessionProgramTuner.baseline_options

Configuration for how baseline outputs are computed.

baseline(name)

Decorator to define a custom baseline in a SessionProgramTuner.

Defining views#

Views can be specified using the views class variable, or the @view decorator.

SessionProgramTuner.views

A list of View on output tables.

View(name, func)

Wrapper to allow users to define a view of the output table.

view(name)

Views of the output table to be used across metrics in place of program outputs.

Defining metrics#

Metrics can be specified using the metrics class variable, or by defining a custom method with a metric decorator like @metric, @single_output_metric, or @joined_output_metric. More information about metrics can be found in the API reference page about metrics.

SessionProgramTuner.metrics

A list of metrics to compute in each error_report.

Initializing a SessionProgramTuner#

User-defined subclasses of SessionProgramTuner can be instantiated with the automatically-generated Builder. Each parameter can be specified with a fixed value or with Tunable, to measure error for different values of this parameter.

SessionProgramTuner.Builder()

The builder for a specific subclass of SessionProgramTuner.

Tunable(name)

Named placeholder for a single input to a SessionProgramTuner.Builder.

Inspecting a SessionProgramTuner#

Methods to get information about an instance of SessionProgramTuner.

SessionProgramTuner.program

A subclass of SessionProgram to be tuned.

SessionProgramTuner.get_baselines()

Return all baselines defined in the class.

SessionProgramTuner.tunables

Returns a list of tunable inputs associated with this tuner.

SessionProgramTuner.get_concrete_program()

Returns the program.

Using a SessionProgramTuner#

Methods to use a SessionProgramTuner to compute outputs and generate error reports, and related classes.

NamedValue(value, name)

A parameter value associated with a human-readable name.

SessionProgramTuner.run([tunable_values])

Computes all outputs for a single run.

SessionProgramTuner.error_report([spec])

Computes a single error report.

SessionProgramTuner.multi_error_report(...)

Runs an error report for each set of values for the Tunables.

RunOutputs

The results of a single run of the DP program and the baselines.

ProtectedInput

A protected input that was used for an ErrorReport.

UnprotectedInput

An unprotected input that was used for an ErrorReport.

ErrorReport

Output of a single error report run.

MultiErrorReport

Output of an error report run across multiple input combinations.

NoPrivacySession#

To compute baselines, the SessionProgramTuner relies on the NoPrivacySession, a class with the same interface as a Session, but which can run queries without any privacy guarantees. Users should generally not use the NoPrivacySession directly.

NoPrivacySession

Session-like class to evaluate queries without privacy guarantees.