tuner#

Interface for tuning SessionPrograms.

Warning

SessionProgramTuner is intended to be used for tuning DP programs. It does not provide any privacy guarantees. It is recommended to use synthetic or historical data for tuning instead of the data that will be used in production.

The SessionProgramTuner class is an abstract base class that defines the interface for tuning SessionPrograms. To tune a specific program, users should subclass SessionProgramTuner, passing their SessionProgram as the program class argument.

>>> class Program(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class Outputs:
...         b_sum: DataFrame
...     class Parameters:
...         low: int
...         high: int
...     def session_interaction(self, session: Session):
...         low = self.parameters["low"]
...         high = self.parameters["high"]
...         a_values = KeySet.from_dict({"a": ["x", "y"]})
...         sum_query = QueryBuilder("protected_df").groupby(a_values).sum("b", low, high)
...         b_sum = session.evaluate(sum_query, self.privacy_budget)
...         return {"b_sum": b_sum}
>>> class Tuner(SessionProgramTuner, program=Program):
...     baseline_options = {
...         "use_clamping_bounds": NoPrivacySession.Options(
...             enforce_clamping_bounds=True
...         ),
...         "ignore_clamping_bounds": NoPrivacySession.Options(
...             enforce_clamping_bounds=False
...         ),
...     }
...
...     @baseline("custom_baseline")
...     @staticmethod
...     def no_clamping_bounds_baseline(protected_inputs: Dict[str, DataFrame]) -> Dict[str, DataFrame]:
...         df = protected_inputs["protected_df"]
...         sum_value = df.groupBy("a").agg(sf.sum("b").alias('b_sum'))
...         return {"b_sum": sum_value}
...
...     @joined_output_metric(name="root_mean_squared_error", output="b_sum", join_columns=["a"], baseline="custom_baseline")
...     @staticmethod
...     def compute_rmse(joined_output: DataFrame):
...         err = sf.col("b_sum_dp") - sf.col("b_sum_baseline")
...         rmse = joined_output.agg(sf.sqrt(sf.avg(sf.pow(err, sf.lit(2)))).alias("rmse"))
...         return rmse.collect()[0]["rmse"]
...
...     metrics = [
...         MedianRelativeError(
...             output="b_sum",
...             measure_column="b_sum",
...             name=f"mre_{index}",
...             join_columns=["a"],
...             baseline=baseline,
...         )
...         for index, baseline in enumerate(
...             list(baseline_options.keys()) + ["custom_baseline"]
...         )
...     ] # This is required to use the built-in metrics

Just like a SessionProgram, once a subclass of SessionProgramTuner is defined, it can be instantiated using the automatically-generated builder for that class. Unlike a SessionProgram, you can pass Tunable objects to the builder methods instead of concrete values.

>>> protected_df = spark.createDataFrame([("x", 2), ("y", 4)], ["a", "b"])
>>> tuner = (
...     Tuner.Builder()
...     .with_privacy_budget(Tunable("budget"))
...     .with_private_dataframe("protected_df", protected_df, AddOneRow())
...     .with_parameter("low", 0)
...     .with_parameter("high", Tunable("high"))
...     .build()
... )

The run() method can be used to run the program to get the outputs of the DP and baseline programs.

>>> outputs = tuner.run({"budget": PureDPBudget(1), "high": 1})

The error_report() method on the tuner can be used to run the program to get the DP and baseline outputs as well as the metrics defined in the Tuner class.

>>> tuner.error_report({"budget": PureDPBudget(1), "high": 1}).show()  
Error report ran with budget PureDPBudget(epsilon=1) and the following tunable parameters:
budget: PureDPBudget(epsilon=1)
high: 1
and the following additional parameters:
low: 0

Metric results:
+---------+-------------------------+------------------------+-------------------------------------------------------+
|   Value | Metric                  | Baseline               | Description                                           |
+=========+=========================+========================+=======================================================+
|    0    | mre                     | use_clamping_bounds    | Median relative error for column b_sum of table b_sum |
+---------+-------------------------+------------------------+-------------------------------------------------------+
|    0.5  | mre                     | ignore_clamping_bounds | Median relative error for column b_sum of table b_sum |
+---------+-------------------------+------------------------+-------------------------------------------------------+
|    0.5  | mre                     | custom_baseline        | Median relative error for column b_sum of table b_sum |
+---------+-------------------------+------------------------+-------------------------------------------------------+
|    0    | root_mean_squared_error | use_clamping_bounds    | User-defined metric (no description)                  |
+---------+-------------------------+------------------------+-------------------------------------------------------+
|    3.16 | root_mean_squared_error | ignore_clamping_bounds | User-defined metric (no description)                  |
+---------+-------------------------+------------------------+-------------------------------------------------------+
|    3.16 | root_mean_squared_error | custom_baseline        | User-defined metric (no description)                  |
+---------+-------------------------+------------------------+-------------------------------------------------------+

Another illustrated example of how to use a SessionProgramTuner to tune parameters can be found in the Tuning parameters tutorial.

Functions#

`baseline()`	Decorator to define a custom baseline method for `SessionProgramTuner`.
`metric()`	Decorator to define a custom metric method for `SessionProgram`.
`view()`	Views of the output table to be used across metrics in place of program outputs.
`single_output_metric()`	Decorator to define a custom metric method for `SessionProgram`.
`joined_output_metric()`	Decorator to define a custom metric method for `SessionProgram`.

baseline(name)#

Decorator to define a custom baseline method for SessionProgramTuner.

To use the “default” baseline in addition to this custom baseline, you need to separately specify “default”: NoPrivacySession.Options() in baseline_options class variable.

Parameters:: name (str) – A name for the custom baseline.

>>> from tmlt.analytics import Session

>>> class Program(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class UnprotectedInputs:
...         unprotected_df: DataFrame
...     class Outputs:
...         output_df: DataFrame
...     def session_interaction(self, session: Session):
...         ...
>>> class Tuner(SessionProgramTuner, program=Program):
...     @baseline("custom_baseline")
...     @staticmethod
...     def custom_baseline(
...         protected_inputs: Dict[str, DataFrame],
...     ) -> Dict[str, DataFrame]:
...         ...
...     @baseline("another_custom_baseline")
...     @staticmethod
...     def another_custom_baseline(
...         protected_inputs: Dict[str, DataFrame],
...         unprotected_inputs: Dict[str, DataFrame],
...     ) -> Dict[str, DataFrame]:
...         # If the program has unprotected inputs or parameters, the custom
...         # baseline method can take them as an argument.
...         ...
...     baseline_options = {
...         "default": NoPrivacySession.Options()
...     }  # This is required to keep the default baseline

metric(name, description=None, grouping_columns=None, measure_column=None, empty_value=None)#

Decorator to define a custom metric method for SessionProgram.

This decorator corresponds to Metric, and is the most generic custom metric decorator. If you can use the joined_output_metric() or single_output_metric() decorators instead, they will likely be easier to use.

The decorated function must have the following parameters:

dp_outputs: a dictionary of DataFrames containing the program’s outputs.
baseline_outputs: a dictionary mapping baseline names to dictionaries of output DataFrames.

It may also have the following optional parameters:

result_column_name: if the function returns a DataFrame, the metric results should be in a column with this name
unprotected_inputs: A dictionary containing the program’s unprotected inputs.
parameters: A dictionary containing the program’s parameters.

If the metric does not have grouping columns, the function must return a numeric value, a boolean, or a string. If the metric has grouping columns, then it must return a DataFrame. This DataFrame should contain the grouping columns, and exactly one additional column containing the metric value for each group. This column’s type should be numeric, boolean, or string.

To use the built-in metrics in addition to this custom metric, you can separately specify metrics class variable.

Parameters:

name (str) – A name for the metric.
description (Optional[str]) – A description of the metric.
grouping_columns (Optional[List[str]]) – If specified, the metric should group the outputs by the given columns, and calculate the metric for each group.
measure_column (Optional[str]) – If specified, the column in the outputs to measure.
empty_value (Optional[Any]) – If all dp and baseline outputs are empty, the metric will return this value.

>>> from tmlt.analytics import Session
>>> from tmlt.tune import MedianAbsoluteError
>>> from pyspark.sql import DataFrame
>>> from typing import Dict

>>> class Program(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class UnprotectedInputs:
...         unprotected_df: DataFrame
...     class Outputs:
...         output_df: DataFrame
...     def session_interaction(self, session: Session):
...         return {"output_df": dp_output}
>>> class Tuner(SessionProgramTuner, program=Program):
...     @metric(name="custom_metric")
...     @staticmethod
...     def custom_metric(
...         dp_outputs: Dict[str, DataFrame],
...         baseline_outputs: Dict[str, Dict[str, DataFrame]]
...     ):
...         # If the program has unprotected inputs and/or parameters, the custom
...         #  metric method can take them as an argument.
...         ...
...     metrics = [
...         MedianAbsoluteError(
...             output="output_df",
...             join_columns=["join_column"],
...             measure_column="Y"
...         ),
...     ]  # You can mix custom and built-in metrics.

view(name)#

Views of the output table to be used across metrics in place of program outputs.

Alternatively, you can specify as list using tmlt.analytics.tuner.SessionProgramTuner.views class variable.

If the program has outputs, unprotected inputs or parameters, the view method can pass outputs, unprotected_inputs and/or parameters as arguments and the value gets auto-populated from the program.

Parameters:: name (str) – A name for the output view.

>>> from tmlt.analytics import Session
>>> from tmlt.tune import MedianRelativeError

>>> class Program(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class UnprotectedInputs:
...         unprotected_df: DataFrame
...     class Outputs:
...         output_df: DataFrame
...     def session_interaction(self, session: Session):
...         ...
>>> class Tuner(SessionProgramTuner, program=Program):
...     @view("output_view")
...     @staticmethod
...     def custom_view1(
...         outputs: Dict[str, DataFrame],
...     ) -> DataFrame:
...         ...
...     @view("another_output_view")
...     @staticmethod
...     def custom_view2(
...         outputs: Dict[str, DataFrame],
...         unprotected_inputs: Dict[str, DataFrame],
...     ) -> DataFrame:
...         ...
...     metrics = [
...         MedianRelativeError(
...             output="output_view",
...             join_columns=["x"],
...             measure_column="a_sum"
...         ),
...     ] # The view can be used instead of output when metric is defined

single_output_metric(name, description=None, baseline=None, output=None, grouping_columns=None, measure_column=None, empty_value=None)#

Decorator to define a custom metric method for SessionProgram.

This decorator corresponds to SingleOutputMetric. If you can use the joined_output_metric() decorator instead, it will likely be easier to use.

The decorated function must have the following parameters:

dp_output: the chosen DP output DataFrame.
baseline_outputs: the chosen baseline output DataFrame.

It may also have the following optional parameters:

result_column_name: if the function returns a DataFrame, the metric results should be in a column with this name
unprotected_inputs: A dictionary containing the program’s unprotected inputs.
parameters: A dictionary containing the program’s parameters.

If the metric does not have grouping columns, the function must return a numeric value, a boolean, or a string. If the metric has grouping columns, then it must return a DataFrame. This DataFrame should contain the grouping columns, and exactly one additional column containing the metric value for each group. This column’s type should be numeric, boolean, or string.

To use the built-in metrics in addition to this custom metric, you can separately specify metrics class variable.

Parameters:

name (str) – A name for the metric.
description (Optional[str]) – A description of the metric.
baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).
output (Optional[str]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).
grouping_columns (Optional[List[str]]) – If specified, the metric should group the outputs by the given columns, and calculate the metric for each group.
measure_column (Optional[str]) – If specified, the column in the outputs to measure.
empty_value (Optional[Any]) – If all dp and baseline outputs are empty, the metric will return this value.

>>> from tmlt.analytics import Session
>>> from tmlt.tune import MedianAbsoluteError
>>> from pyspark.sql import DataFrame
>>> from typing import Dict

>>> class Program(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class UnprotectedInputs:
...         unprotected_df: DataFrame
...     class Outputs:
...         output_df: DataFrame
...     def session_interaction(self, session: Session):
...         return {"output_df": dp_output}
>>> class Tuner(SessionProgramTuner, program=Program):
...     @single_output_metric(name="custom_metric")
...     @staticmethod
...     def custom_metric(
...         dp_output: DataFrame,
...         baseline_output: DataFrame
...     ):
...         # If the program has unprotected inputs and/or parameters, the custom
...         #  metric method can take them as an argument.
...         ...
...     metrics = [
...         MedianAbsoluteError(
...             output="output_df",
...             join_columns=["join_column"],
...             measure_column="Y"
...         ),
...     ]  # You can mix custom and built-in metrics.

joined_output_metric(name, join_columns, description=None, baseline=None, output=None, grouping_columns=None, measure_column=None, empty_value=None, join_how='inner', dropna_columns=None, indicator_column_name=None)#

Decorator to define a custom metric method for SessionProgram.

This decorator corresponds to JoinedOutputMetric.

The decorated function must have the following parameters:

joined_output: A DataFrame created by joining the selected DP and baseline outputs.

It may also have the following optional parameters:

result_column_name: if the function returns a dataframe, the metric results should be in a column with this name
unprotected_inputs: A dictionary containing the program’s unprotected inputs.
parameters: A dictionary containing the program’s parameters.

The function should return a single numeric value if there are no grouping columns, or a dataframe with one column for each grouping column, and one numeric result column with the specified name.

If the metric does not have grouping columns, the function must return a numeric value, a boolean, or a string. If the metric has grouping columns, then it must return a DataFrame. This DataFrame should contain the grouping columns, and exactly one additional column containing the metric value for each group. This column’s type should be numeric, boolean, or string.

To use the built-in metrics in addition to this custom metric, you can separately specify metrics class variable.

Parameters:

name (str) – A name for the metric.
join_columns (List[str]) – The columns to join on.
description (Optional[str]) – A description of the metric.
baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).
output (Optional[str]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).
grouping_columns (Optional[List[str]]) – If specified, the metric should group the outputs by the given columns, and calculate the metric for each group.
measure_column (Optional[str]) – If specified, the column in the outputs to measure.
empty_value (Optional[Any]) – If all dp and baseline outputs are empty, the metric will return this value.
join_how (str) – The type of join to perform. Must be one of “left”, “right”, “inner”, “outer”. Defaults to “inner”.
dropna_columns (Optional[List[str]]) – If specified, rows with nulls in these columns will be dropped.
indicator_column_name (Optional[str]) – If specified, we will add a column with the specified name to the joined data that contains either “dp”, “baseline”, or “both” to indicate where the values in the row came from.

>>> from tmlt.analytics import Session
>>> from tmlt.tune import MedianAbsoluteError
>>> from pyspark.sql import DataFrame
>>> from typing import Dict

>>> class Program(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class UnprotectedInputs:
...         unprotected_df: DataFrame
...     class Outputs:
...         output_df: DataFrame
...     def session_interaction(self, session: Session):
...         return {"output_df": dp_output}
>>> class Tuner(SessionProgramTuner, program=Program):
...     @joined_output_metric(name="custom_metric", join_columns=["join_column"])
...     @staticmethod
...     def custom_metric(
...         joined_output: DataFrame,
...     ):
...         # If the program has unprotected inputs and/or parameters, the custom
...         #  metric method can take them as an argument.
...         ...
...     metrics = [
...         MedianAbsoluteError(
...             output="output_df",
...             join_columns=["join_column"],
...             measure_column="Y"
...         ),
...     ]  # You can mix custom and built-in metrics.

Classes#

`RunOutputs`	The results of a single run of the DP program and the baselines.
`SessionProgramTuner`	Base class for defining an object to tune inputs to a `SessionProgram`.
`Tunable`	Named placeholder for a single input to a `Builder`.
`View`	Wrapper to allow users to define a view of the output table.
`ErrorReport`	Output of a single error report run.
`MultiErrorReport`	Output of an error report run across multiple input combinations.
`UnprotectedInput`	An unprotected input that was used for an `ErrorReport`.
`ProtectedInput`	A protected input that was used for an `ErrorReport`.

class RunOutputs#

The results of a single run of the DP program and the baselines.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

tunable_values: Dict[str, Any] | None#: The tunable values used for this run.

dp_outputs: Dict[str, pyspark.sql.DataFrame]#: The outputs of the DP program.

baseline_outputs: Dict[str, Dict[str, pyspark.sql.DataFrame]]#: The outputs of the baselines.

class SessionProgramTuner(builder)#

Base class for defining an object to tune inputs to a SessionProgram.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

SessionProgramTuners should not be directly constructed. Instead, users should create a subclass of SessionProgramTuner, then construct their SessionProgramTuner using the auto-generated Builder attribute of the subclass.

Parameters:: builder (SessionProgramTuner)

class Builder#

The builder for a specific subclass of SessionProgramTuner.

with_private_dataframe(source_id, dataframe, protected_change)#

Add a tunable private dataframe to the builder.

Parameters:

source_id (str)
dataframe (Union[pyspark.sql.DataFrame, Tunable])
protected_change (Union[tmlt.analytics.protected_change.ProtectedChange, Tunable])

Return type:

SessionProgramTuner

with_public_dataframe(source_id, dataframe)#

Add a tunable public dataframe to the builder.

Parameters:

source_id (str)
dataframe (Union[pyspark.sql.DataFrame, Tunable])

Return type:

SessionProgramTuner

with_parameter(name, value)#

Set the value of a parameter.

Parameters:

name (str)
value (Any)

with_cache()#: Enables caching for the object being built.

build()#

Returns an instance of the matching SessionProgramTuner subtype.

Return type:: SessionProgramTuner

with_id_space(id_space)#

Adds an identifier space.

This defines a space of identifiers that map 1-to-1 to the identifiers being protected by a table with the AddRowsWithID protected change. Any table with such a protected change must be a member of some identifier space.

Parameters:: id_space (str)

with_privacy_budget(privacy_budget)#

Set the privacy budget for the object being built.

Parameters:: privacy_budget (Union[tmlt.analytics.privacy_budget.PrivacyBudget, Tunable])

baseline_options: Dict[str, tmlt.analytics.no_privacy_session.NoPrivacySession.Options] | tmlt.analytics.no_privacy_session.NoPrivacySession.Options | None = None#

Configuration for how baseline outputs are computed.

By default, a SessionProgramTuner computes both the DP outputs and the baseline outputs for a SessionProgram to compute metrics. The baseline outputs are computed by calling the session_interaction() method with a NoPrivacySession. The baseline_options attribute allows you to override the default options for the NoPrivacySession used to compute the baseline. You can also specify multiple configurations to compute the baselines with different options. When multiple baseline configurations are specified, the metrics are computed with respect to each of the baseline configurations (unless specified otherwise in the metric definitions).

To override the default baseline options (see Options), you can set this to an Options object.

If you want to specify multiple baseline configurations, you can set this to a dictionary mapping baseline names to Options.

metrics: Sequence[tmlt.analytics.metrics.Metric] | None = None#: A list of metrics to compute in each error_report.

views: Sequence[View] | None = None#: A list of View on output tables.

program: Type[tmlt.analytics.program.SessionProgram] | None#: A subclass of SessionProgram to be tuned.

property tunables: List[Tunable]#

Returns a list of tunable inputs associated with this tuner.

Return type:: List[Tunable]

classmethod get_concrete_program()#

Returns the program. Throws an error if the program is none.

Return type:: Type[tmlt.analytics.program.SessionProgram]

classmethod get_baselines()#

Return all baselines defined in the class.

If no baseline options or custom baselines are specified, returns the dictionary mapping ‘default` baseline with default NoPrivacySession configuration options. Otherwise, retrieves baselines specified using the baseline_options class variable and @baseline decorator and returns a dictionary mapping baseline names to either Options objects or callables for custom baselines.

Parameters:: cls – The class to search for custom baselines.
Raises:: ValueError – If baseline options are not distinct or not of appropriate type.
Return type:: Dict[str, Union[tmlt.analytics.no_privacy_session.NoPrivacySession.Options, Callable]]

run(tunable_values=None)#

Computes all outputs for a single run.

Does not compute views nor metrics.

Parameters:: tunable_values (Optional[Dict[str, Any]]) – A dictionary mapping names of Tunables to concrete values to use for this run. Every Tunable used in building this tuner must have a value in this dictionary. This can be None only if no Tunables were used.
Return type:: RunOutputs

error_report(spec=None)#

Computes a single error report.

An error report can be computed by specifying a concrete value for each Tunable in the tuner (this can be None if no Tunable was used), or by passing the output of a previous run(). In the former case, DP outputs and baseline outputs are first computed, before computing the views and the metrics. In the latter case, the contents of the RunOutputs will be used to compute the views and the metrics.

Parameters:: spec (Optional[Union[Dict[str, Any], RunOutputs]]) – Either a dictionary specifying a concrete value for each Tunable in this tuner, or the output of a previous run().
Return type:: tmlt.analytics.tuner._error_report.ErrorReport

multi_error_report(tunable_values_list)#

Runs the error_report for each set of values for the Tunables.

Parameters:: tunable_values_list (List[Dict[str, Any]])
Return type:: tmlt.analytics.tuner._error_report.MultiErrorReport

class Tunable#

Named placeholder for a single input to a Builder.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

When a Tunable is passed to a Builder, it is replaced with the concrete values for the tunable parameter when building SessionProgram s inside of methods like error_report() and multi_error_report().

name: str#: Name of the tunable parameter.

class View(name, func)#

Wrapper to allow users to define a view of the output table.

If the program has outputs, unprotected inputs or parameters, the view method can pass outputs, unprotected_inputs and/or parameters arguments and the value gets auto-populated from the program.

>>> from tmlt.analytics import Session
>>> from tmlt.tune import MedianRelativeError

>>> class Program(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class UnprotectedInputs:
...         unprotected_df: DataFrame
...     class Outputs:
...         output_df: DataFrame
...     def session_interaction(self, session: Session):
...         ...
>>> class Tuner(SessionProgramTuner, program=Program):
...     @view("output_view")
...     @staticmethod
...     def custom_view1(
...         outputs: Dict[str, DataFrame],
...     ) -> DataFrame:
...         ...
...
...     def create_custom_view2(arbitrary_param):
...         def custom_view2(outputs: Dict[str, DataFrame]) -> DataFrame:
...             assert arbitrary_param == ["a", "b"]
...             return outputs["output_df"].groupby(arbitrary_param).sum()
...         return custom_view2
...
...     views = [
...               View(
...                    name="another_output_view",
...                    func=create_custom_view2(["a", "b"])
...               )
...     ]
...     metrics = [
...         MedianRelativeError(
...             output="another_output_view",
...             measure_column="a_sum",
...             join_columns=["a"],
...         ),
...     ] # The view can be used instead of output when metric is defined

Parameters:

name (str)
func (Union[Callable, staticmethod])

property name: str#

Returns the name of the metric.

Return type:: str

property func: Callable#

Function that returns rhe actual view.

Return type:: Callable

__init__(name, func)#

Constructor.

Parameters:

name (str) – A name for the metric.
func (Union[Callable, staticmethod]) – View method to be applied.

class ErrorReport#

Output of a single error report run.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

This class is not intended to be constructed directly. Instead, it is returned by the error_report() method.

Attributes#
`tunable_values`	The value of each tunable used for this error report.
`parameters`	The value of each parameter used in this error report. This includes
`protected_inputs`	The protected inputs used for this error report.
`unprotected_inputs`	The unprotected inputs used for this error report.
`privacy_budget`	The privacy budget used for this error report.
`dp_outputs`	The differentially private outputs of the program.
`baseline_outputs`	The outputs of the baseline program.
`metric_results`	The metrics computed on the outputs of the dp and baseline programs.

Methods#
`dataframes()`	Returns a DataFrame for each metric, keyed by the result column name.
`dataframe()`	Returns a DataFrame representation of the error report.
`format()`	Return a string representation of this object.
`show()`	Prints the error report in a nicely-formatted, human-readable way.

tunable_values: Dict[str, Any]#: The value of each tunable used for this error report.

parameters: Dict[str, Any]#: The value of each parameter used in this error report. This includes both tunable and non-tunable parameters.

protected_inputs: Dict[str, ProtectedInput]#: The protected inputs used for this error report.

unprotected_inputs: Dict[str, UnprotectedInput]#: The unprotected inputs used for this error report.

privacy_budget: tmlt.analytics.privacy_budget.PrivacyBudget#: The privacy budget used for this error report.

dp_outputs: Dict[str, pyspark.sql.DataFrame]#: The differentially private outputs of the program.

baseline_outputs: Dict[str, Dict[str, pyspark.sql.DataFrame]]#: The outputs of the baseline program.

metric_results: List[tmlt.analytics.metrics._base.MetricResult]#: The metrics computed on the outputs of the dp and baseline programs.

dataframes()#

Returns a DataFrame for each metric, keyed by the result column name.

Return type:: Dict[str, pandas.DataFrame]

dataframe()#

Returns a DataFrame representation of the error report.

The DataFrame will have a column for each parameter, tunable, and metric. If all metrics have the same grouping columns, the dataframe will have one row per combination of grouping columns values. If the metrics have different groupings columns, this method will throw an error.

If some combinations of grouping columns values are associated with only some of the metrics, the missing metrics appear as null values in the output DataFrame.

Return type:: pandas.DataFrame

format()#: Return a string representation of this object.

show()#: Prints the error report in a nicely-formatted, human-readable way.

class MultiErrorReport(reports)#

Output of an error report run across multiple input combinations.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

This class is not intended to be constructed directly. Instead, it is returned by the multi_error_report() method.

Parameters:: reports (List[ErrorReport])

property reports: List[ErrorReport]#

Return the error reports.

Return type:: List[ErrorReport]

__init__(reports)#

Constructor.

Warning

This class is not intended to be constructed directly. Instead, it is returned by the multi_error_report() method.

Parameters:: reports (List[ErrorReport]) – An error report for each run.

__iter__()#

Return an iterator over the error reports.

Return type:: Iterator[ErrorReport]

dataframes()#

Return the result dataframes for each metric combined across runs.

Each metric produces a separate DataFrame. The DataFrames are keyed by the result column names.

Return type:: Dict[str, pandas.DataFrame]

dataframe()#

Return a DataFrame representation of the error reports.

The DataFrame will have a column for each parameter and metric. If all metrics have the same grouping columns, the DataFrame will include these grouping columns. If not all metrics have the same (or no) grouping column, this method will throw an error.

If some combinations of grouping columns values are associated with only some of the metrics, the missing metrics appear as null values in the output DataFrame.

Return type:: pandas.DataFrame

class UnprotectedInput#

Bases: NamedTuple

An unprotected input that was used for an ErrorReport.

name: str#: The name of the input.

dataframe: pyspark.sql.DataFrame#: A DataFrame containing the unprotected data used for the report.

class ProtectedInput#

Bases: NamedTuple

A protected input that was used for an ErrorReport.

Warning

Note that normally ProtectedInputs are treated as sensitive and would not accessible to the user except through the Session API to avoid violating differential privacy. But these error reports are not differentially private, and for this reason it is highly recommended to avoid using sensitive data in error reports, and to instead use synthetic data or other non-sensitive data.

For these reasons, the protected inputs used in error reports are attached to the outputs for your convenience, but it is ultimately your responsibility to ensure that truly sensitive data is not used inappropriately.

name: str#: The name of the input.

dataframe: pyspark.sql.DataFrame#: A DataFrame containing the protected data used for the report.

protected_change: tmlt.analytics.protected_change.ProtectedChange#: What changes to the protected data the Session should protect.

Tumult Analytics Pro

tuner#

Functions#

Classes#