tuner#
Interface for tuning SessionPrograms.
Warning
SessionProgramTuner is intended to be used for tuning DP programs. It does not provide any privacy guarantees. It is recommended to use synthetic or historical data for tuning instead of the data that will be used in production.
The SessionProgramTuner
class is an abstract base class that
defines the interface for tuning SessionProgram
s. To tune a specific
program, users should subclass SessionProgramTuner
, passing their
SessionProgram
as the program
class argument.
>>> class Program(SessionProgram):
... class ProtectedInputs:
... protected_df: DataFrame
... class Outputs:
... a_sum: DataFrame
... class Parameters:
... low: int
... high: int
... def session_interaction(self, session: Session):
... low = self.parameters["low"]
... high = self.parameters["high"]
... sum_query = QueryBuilder("protected_df").sum("a", low, high)
... a_sum = session.evaluate(sum_query, self.privacy_budget)
... return {"a_sum": a_sum}
>>> class Tuner(SessionProgramTuner, program=Program):
... baseline_options = {
... "use_clamping_bounds": NoPrivacySession.Options(
... enforce_clamping_bounds=True
... ),
... "ignore_clamping_bounds": NoPrivacySession.Options(
... enforce_clamping_bounds=False
... ),
... }
...
... @baseline("custom_baseline")
... def no_clamping_bounds_baseline(
... self,
... protected_inputs: Dict[str, DataFrame],
... ) -> Dict[str, DataFrame]:
... df = protected_inputs["protected_df"]
... sum_value = df.agg(sf_sum("a").alias('a_sum'))
... return {"a_sum": sum_value}
...
... metrics = [
... RelativeError(
... "a_sum",
... column="a_sum",
... baselines=list(baseline_options.keys()) + ["custom_baseline"],
... ),
... ]
Just like a SessionProgram
, once a subclass of
SessionProgramTuner
is defined, it can be instantiated using the
automatically-generated builder for that class. Unlike a SessionProgram
,
you can pass Tunable
objects to the builder methods instead of
concrete values.
>>> protected_df = spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"])
>>> tuner = (
... Tuner.Builder()
... .with_privacy_budget(Tunable("budget"))
... .with_private_dataframe("protected_df", protected_df, AddOneRow())
... .with_parameter("low", 0)
... .with_parameter("high", Tunable("high"))
... .build()
... )
The outputs()
method can be used to run the program
to get the outputs of the DP and baseline programs.
>>> dp_outputs, baseline_outputs = (
... tuner.outputs({"budget": PureDPBudget(1), "high": 1})
... )
The error_report()
method on the tuner can be used to
run the program to get the DP and baseline outputs as well as the metrics defined in
the Tuner class.
>>> tuner.error_report({"budget": PureDPBudget(1), "high": 1}).show()
Error report ran with budget PureDPBudget(epsilon=1) and the following params:
budget: PureDPBudget(epsilon=1)
high: 1.
Metric results:
- use_clamping_bounds: Relative error for column a_sum of output a_sum: 0.00%
- ignore_clamping_bounds: Relative error for column a_sum of output a_sum: 50.00%
- custom_baseline: Relative error for column a_sum of table a_sum: 50.00%
Functions#
Decorator to define a custom baseline method for |
- baseline(name)#
Decorator to define a custom baseline method for
SessionProgramTuner
.To use the default baseline in addition to this custom baseline, you can separately specify baseline_options.
>>> from tmlt.analytics.session import Session
>>> class Program(SessionProgram): ... class ProtectedInputs: ... protected_df: DataFrame ... class UnprotectedInputs: ... unprotected_df: DataFrame ... class Outputs: ... output_df: DataFrame ... def session_interaction(self, session: Session): ... ... >>> class Tuner(SessionProgramTuner, program=Program): ... @baseline("custom_baseline") ... def custom_baseline( ... self, ... protected_inputs: Dict[str, DataFrame], ... ) -> Dict[str, DataFrame]: ... ... ... @baseline("another_custom_baseline") ... def another_custom_baseline( ... self, ... protected_inputs: Dict[str, DataFrame], ... unprotected_inputs: Dict[str, DataFrame], ... ) -> Dict[str, DataFrame]: ... # If the program has unprotected inputs, the custom baseline method ... # can take them as an argument. ... ... ... baseline_options = { ... "default": NoPrivacySession.Options() ... } # This is required to keep the default baseline
- Parameters
name (str) –
Classes#
Base class for defining an object to tune inputs to a |
|
Named placeholder for a single input to a |
|
Output of a single error report run. |
|
Output of an error report run across multiple input combinations. |
|
An unprotected input that was used for an |
|
A protected input that was used for an |
- class SessionProgramTuner(builder)#
Base class for defining an object to tune inputs to a
SessionProgram
.Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
SessionProgramTuners should not be directly constructed. Instead, users should create a subclass of
SessionProgramTuner
, then construct theirSessionProgramTuner
using the auto-generatedBuilder
attribute of the subclass.- Parameters
builder (SessionProgramTunerBuilder) –
- baseline_options :Optional[Union[Dict[str, tmlt.analytics.no_privacy_session.NoPrivacySession.Options], tmlt.analytics.no_privacy_session.NoPrivacySession.Options]]#
Configuration for how baseline outputs are computed.
By default, a SessionProgramTuner computes both the DP outputs and the baseline outputs for a SessionProgram to compute metrics. The baseline outputs are computed by calling the
session_interaction()
method with aNoPrivacySession
. Thebaseline_options
attribute allows you to override the default options for theNoPrivacySession
used to compute the baseline. You can also specify multiple configurations to compute the baselines with different options. When multiple baseline configurations are specified, the metrics are computed with respect to each of the baseline configurations (unless specified otherwise in the metric definitions).To override the default baseline options (see
Options
), you can set this to anOptions
object.If you want to specify multiple baseline configurations, you can set this to a dictionary mapping baseline names to
Options
.
- metrics :Optional[List[tmlt.analytics.metrics.Metric]]#
A list of metrics to compute in each
error_report
.
- Builder :Type[SessionProgramTunerBuilder]#
The builder for a specific subclass of SessionProgramTuner.
- program :Type[tmlt.analytics.program.SessionProgram]#
A subclass of
SessionProgram
to be tuned.
- __init__(builder)#
Constructor.
Warning
This constructor is not intended to be used directly. Use the automatically generated builder instead. It can be accessed using the
Builder
attribute of the subclass.- Parameters
builder (tmlt.analytics.tuner._tuner.SessionProgramTunerBuilder) –
- property tunables#
Returns a list of tunable inputs associated with this tuner.
- Return type
List[Tunable]
- outputs(tunable_values=None)#
Computes all outputs for a single run.
- Parameters
tunable_values (Optional[Dict[str, Any]]) – A dictionary mapping names of
Tunable
s to concrete values to use for this run. EveryTunable
used in building this tuner must have a value in this dictionary. This can be None only if noTunable
s were used.- Return type
Tuple[Dict[str, pyspark.sql.DataFrame], Dict[str, Dict[str, pyspark.sql.DataFrame]]]
- error_report(tunable_values=None)#
Computes DP outputs, baseline outputs, and metrics for a single run.
- class Tunable#
Named placeholder for a single input to a
SessionProgramTunerBuilder
.Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
When a
Tunable
is passed to aSessionProgramTunerBuilder
, it is replaced with the concrete values for the tunable parameter when buildingSessionProgram
s inside of methods likeerror_report()
andmulti_error_report()
.- name :str#
Name of the tunable parameter.
- class ErrorReport#
Output of a single error report run.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
This class is not intended to be constructed directly. Instead, it is returned by the
error_report()
method.- tunable_values :Dict[str, Any]#
The values of the tunable parameters used for this error report.
- parameters :Dict[str, Any]#
The non-tunable parameters used for this error report.
- protected_inputs :Dict[str, ProtectedInput]#
The protected inputs used for this error report.
- unprotected_inputs :Dict[str, UnprotectedInput]#
The unprotected inputs used for this error report.
- privacy_budget :tmlt.analytics.privacy_budget.PrivacyBudget#
The privacy budget used for this error report.
- dp_outputs :Dict[str, pyspark.sql.DataFrame]#
The differentially private outputs of the program.
- baseline_outputs :Dict[str, Dict[str, pyspark.sql.DataFrame]]#
The outputs of the baseline program.
- metrics :List[tmlt.analytics.metrics._base.MetricOutput]#
The metrics computed on the outputs of the dp and baseline programs.
- format()#
Return a string representation of this object.
- show()#
Prints the error report in a nicely-formatted, human-readable way.
- class MultiErrorReport(reports)#
Output of an error report run across multiple input combinations.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
This class is not intended to be constructed directly. Instead, it is returned by the
multi_error_report()
method.- Parameters
reports (List[ErrorReport]) –
- __init__(reports)#
Constructor.
Warning
This class is not intended to be constructed directly. Instead, it is returned by the
multi_error_report()
method.- Parameters
reports (
List
[ErrorReport
]List
[ErrorReport
]) – An error report for each run.
- property reports#
Return the error reports.
- Return type
List[ErrorReport]
- __iter__()#
Return an iterator over the error reports.
- Return type
Iterator[ErrorReport]
- to_dataframe()#
Return a dataframe representation of the error reports.
The dataframe will have a row for each error report (run) and a column for each tunable and metric.
- Return type
- class UnprotectedInput#
Bases:
NamedTuple
An unprotected input that was used for an
ErrorReport
.- name :str#
The name of the input.
- dataframe :pyspark.sql.DataFrame#
A dataframe containing the unprotected data used for the report.
- class ProtectedInput#
Bases:
NamedTuple
A protected input that was used for an
ErrorReport
.Warning
Note that normally ProtectedInputs are treated as sensitive and would not accessible to the user except through the
Session
API to avoid violating differential privacy. But these error reports are not differentially private, and for this reason it is highly recommended to avoid using sensitive data in error reports, and to instead use synthetic data or other non-sensitive data.For these reasons, the protected inputs used in error reports are attached to the outputs for your convenience, but it is ultimately your responsibility to ensure that truly sensitive data is not used inappropriately.
- name :str#
The name of the input.
- dataframe :pyspark.sql.DataFrame#
A dataframe containing the protected data used for the report.
- protected_change :tmlt.analytics.protected_change.ProtectedChange#
What changes to the protected data the Session should protect.