_user_defined#

Metric functions for wrapping user-defined functions.

Classes#

CustomSingleOutputMetric

Wrapper to allow users to define a metric that operates on a single output table.

CustomMultiBaselineMetric

Wrapper to turn a function into a metric using DP and single baseline’s output.

class CustomSingleOutputMetric(func, *, name, description=None, output)#

Bases: tmlt.analytics.metrics._base.SingleBaselineMetric

Wrapper to allow users to define a metric that operates on a single output table.

Turns a function that calculates error on two dataframes (one DP, one baseline) into a Metric.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Example

>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> baseline_outputs = {"O": baseline_df}
>>> def size_difference(dp: DataFrame, baseline: DataFrame):
...     return baseline.count() - dp.count()
>>> metric = CustomSingleOutputMetric(
...     func=size_difference,
...     name="Output size difference",
...     description="Difference in number of rows.",
...     output="O",
... )
>>> result = metric.compute_for_baseline(dp_outputs, baseline_outputs)
>>> result
0
>>> metric.format(result)
'0'
Parameters
__init__(func, *, name, description=None, output)#

Constructor.

Parameters
  • func ((DataFrame, DataFrame) → AnyCallable[[DataFrame, DataFrame], Any]) – Function for computing a metric value from DP outputs and a single baseline’s outputs.

  • name (strstr) – A name for the metric.

  • description (str | NoneOptional[str] (default: None)) – A description of the metric.

  • baselines – The name of the baseline program(s) used for the error report. If None, use all baselines. If a string, use only that baseline. If a list, use only those baselines.

  • output (strstr) – The output to calculate the metric over. This is required, even if the program produces a single output.

property output#

Returns the name of the run output.

Return type

str

property func#

Returns function to be applied.

Return type

Callable

format(value)#

Converts value to human-readable format.

Parameters

value (Any) –

check_compatibility_with_program(program)#

Checks if the metric is compatible with the program.

Parameters

program (Type[tmlt.analytics.program.SessionProgram]) –

compute_for_baseline(dp_outputs, baseline_outputs)#

Returns the metric value given the DP outputs and the baseline outputs.

Parameters
property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

__call__(dp_outputs, baseline_outputs)#

Computes the given metric on the given DP and baseline outputs.

Parameters
Return type

List[tmlt.analytics.metrics.MetricOutput]

class CustomMultiBaselineMetric(output, func, *, name, description=None, baselines=None)#

Bases: tmlt.analytics.metrics._base.MultiBaselineMetric

Wrapper to turn a function into a metric using DP and single baseline’s output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Example

>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> dp_outputs = {"O": dp_df}
>>> baseline_df1 = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> baseline_df2 = spark.createDataFrame(pd.DataFrame({"A": [6]}))
>>> baseline_outputs = {
...    "O": {"baseline1": baseline_df1, "baseline2": baseline_df2}
... }
>>> _func = lambda dp_outputs, baseline_outputs: {
...    output_key: {
...         baseline_key: AbsoluteError(output_key).compute_on_scalar(
...                 dp_output.first().A, baseline_output.first().A
...         )
...         for baseline_key, baseline_output
...         in baseline_outputs[output_key].items()
...     }
...     for output_key, dp_output in dp_outputs.items()
...  }
>>> metric = CustomMultiBaselineMetric(
...     output="O",
...     func=_func,
...     name="Custom Metric",
...     description="Custom Description",
... )
>>> result = metric.compute_for_multiple_baselines(dp_outputs, baseline_outputs)
>>> result
{'O': {'baseline1': 0, 'baseline2': 1}}
Parameters
__init__(output, func, *, name, description=None, baselines=None)#

Constructor.

Parameters
property output#

Returns the name of the run output.

Return type

str

property func#

Returns function to be applied.

Return type

Callable

format(value)#

Converts value to human-readable format.

Parameters

value (Any) –

check_compatibility_with_program(program)#

Checks if the metric is compatible with the program.

Parameters

program (Type[tmlt.analytics.program.SessionProgram]) –

compute_for_multiple_baselines(dp_outputs, baseline_outputs)#

Returns the metric value given the DP and multiple baseline outputs.

Parameters
compute(dp_outputs, baseline_outputs)#

Computes the given metric on the given DP and baseline outputs.

The baseline_outputs will already be filtered to only include the baselines that the metric is supposed to use.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs, after filtering to only include the baselines that the metric is supposed to use.

Return type

List[tmlt.analytics.metrics.MetricOutput]

property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

__call__(dp_outputs, baseline_outputs)#

Computes the given metric on the given DP and baseline outputs.

Parameters
Return type

List[tmlt.analytics.metrics.MetricOutput]