_user_defined#

Metric functions for wrapping user-defined functions.

Classes#

`CustomSingleOutputMetric`	Wrapper to allow users to define a metric that operates on a single output table.
`CustomMultiBaselineMetric`	Wrapper to turn a function into a metric using DP and single baseline’s output.

class CustomSingleOutputMetric(func, *, name, description=None, output)#

Bases: tmlt.analytics.metrics._base.SingleBaselineMetric

Wrapper to allow users to define a metric that operates on a single output table.

Turns a function that calculates error on two dataframes (one DP, one baseline) into a Metric.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Example

>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> baseline_outputs = {"O": baseline_df}

>>> def size_difference(dp: DataFrame, baseline: DataFrame):
...     return baseline.count() - dp.count()

>>> metric = CustomSingleOutputMetric(
...     func=size_difference,
...     name="Output size difference",
...     description="Difference in number of rows.",
...     output="O",
... )
>>> result = metric.compute_for_baseline(dp_outputs, baseline_outputs)
>>> result
0
>>> metric.format(result)
'0'

Parameters

func (Callable[[pyspark.sql.DataFrame, pyspark.sql.DataFrame], Any]) –
name (str) –
description (Optional[str]) –
output (str) –

__init__(func, *, name, description=None, output)#

Constructor.

Parameters

func ((DataFrame, DataFrame) → AnyCallable[[DataFrame, DataFrame], Any]) – Function for computing a metric value from DP outputs and a single baseline’s outputs.
name (strstr) – A name for the metric.
description (str | NoneOptional[str] (default: None)) – A description of the metric.
baselines – The name of the baseline program(s) used for the error report. If None, use all baselines. If a string, use only that baseline. If a list, use only those baselines.
output (strstr) – The output to calculate the metric over. This is required, even if the program produces a single output.

property output#

Returns the name of the run output.

Return type: str

property func#

Returns function to be applied.

Return type: Callable

format(value)#

Converts value to human-readable format.

Parameters: value (Any) –

check_compatibility_with_program(program)#

Checks if the metric is compatible with the program.

Parameters: program (Type[tmlt.analytics.program.SessionProgram]) –

compute_for_baseline(dp_outputs, baseline_outputs)#

Returns the metric value given the DP outputs and the baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, pyspark.sql.DataFrame]) –

property name#

Returns the name of the metric.

Return type: str

property description#

Returns the description of the metric.

Return type: str

property baselines#

Returns the baselines used for the metric.

Return type: Optional[Union[str, List[str]]]

__call__(dp_outputs, baseline_outputs)#

Computes the given metric on the given DP and baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

Return type

List[tmlt.analytics.metrics.MetricOutput]

class CustomMultiBaselineMetric(output, func, *, name, description=None, baselines=None)#

Bases: tmlt.analytics.metrics._base.MultiBaselineMetric

Wrapper to turn a function into a metric using DP and single baseline’s output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Example

>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> dp_outputs = {"O": dp_df}
>>> baseline_df1 = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> baseline_df2 = spark.createDataFrame(pd.DataFrame({"A": [6]}))
>>> baseline_outputs = {
...    "O": {"baseline1": baseline_df1, "baseline2": baseline_df2}
... }
>>> _func = lambda dp_outputs, baseline_outputs: {
...    output_key: {
...         baseline_key: AbsoluteError(output_key).compute_on_scalar(
...                 dp_output.first().A, baseline_output.first().A
...         )
...         for baseline_key, baseline_output
...         in baseline_outputs[output_key].items()
...     }
...     for output_key, dp_output in dp_outputs.items()
...  }

>>> metric = CustomMultiBaselineMetric(
...     output="O",
...     func=_func,
...     name="Custom Metric",
...     description="Custom Description",
... )
>>> result = metric.compute_for_multiple_baselines(dp_outputs, baseline_outputs)
>>> result
{'O': {'baseline1': 0, 'baseline2': 1}}

Parameters

output (str) –
func (Callable[[Dict[str, pyspark.sql.DataFrame], Dict[str, Dict[str, pyspark.sql.DataFrame]]], Any]) –
name (str) –
description (Optional[str]) –
baselines (Optional[List[str]]) –

__init__(output, func, *, name, description=None, baselines=None)#

Constructor.

Parameters

output (strstr) – The output to compute the metric for.
func (({str: DataFrame}, {str: {str: DataFrame}}) → AnyCallable[[Dict[str, DataFrame], Dict[str, Dict[str, DataFrame]]], Any]) – Function for computing a metric value from DP outputs and multiple baseline outputs.
name (strstr) – A name for the metric.
description (str | NoneOptional[str] (default: None)) – A description of the metric.
baselines (List[str] | NoneOptional[List[str]] (default: None)) – The name of the baseline program(s) used for the error report. If None, use all baselines. If a string, use only that baseline. If a list, use only those baselines.

property output#

Returns the name of the run output.

Return type: str

property func#

Returns function to be applied.

Return type: Callable

format(value)#

Converts value to human-readable format.

Parameters: value (Any) –

check_compatibility_with_program(program)#

Checks if the metric is compatible with the program.

Parameters: program (Type[tmlt.analytics.program.SessionProgram]) –

compute_for_multiple_baselines(dp_outputs, baseline_outputs)#

Returns the metric value given the DP and multiple baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) –

compute(dp_outputs, baseline_outputs)#

Computes the given metric on the given DP and baseline outputs.

The baseline_outputs will already be filtered to only include the baselines that the metric is supposed to use.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs, after filtering to only include the baselines that the metric is supposed to use.

Return type

List[tmlt.analytics.metrics.MetricOutput]

property name#

Returns the name of the metric.

Return type: str

property description#

Returns the description of the metric.

Return type: str

property baselines#

Returns the baselines used for the metric.

Return type: Optional[Union[str, List[str]]]

__call__(dp_outputs, baseline_outputs)#

Computes the given metric on the given DP and baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

Return type

List[tmlt.analytics.metrics.MetricOutput]

Tumult Analytics Pro

_user_defined#

Classes#