_user_defined#

Metric functions for wrapping user-defined functions.

Classes#

CustomSingleOutputMetric

Wrapper to allow users to define a metric that operates on a single output table.

CustomMultiBaselineMetric

Wrapper to turn a function into a metric using DP and single baseline’s output.

class CustomSingleOutputMetric(func, output, *, name, description=None, baselines=None)#

Bases: tmlt.analytics.metrics._base.SingleBaselineMetric

Wrapper to allow users to define a metric that operates on a single output table.

Turns a function that calculates error on two dataframes (one DP, one baseline) into a Metric.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Example

>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> baseline_outputs = {"O": baseline_df}
>>> def size_difference(dp_outputs: DataFrame, baseline_outputs: DataFrame):
...     return baseline_outputs.count() - dp_outputs.count()
>>> metric = CustomSingleOutputMetric(
...     func=size_difference,
...     name="Output size difference",
...     description="Difference in number of rows.",
...     output="O",
... )
>>> result = metric.compute_for_baseline(dp_outputs, baseline_outputs)
>>> result
0
>>> metric.format(result)
'0'
Parameters
  • func (Callable) –

  • output (str) –

  • name (str) –

  • description (Optional[str]) –

  • baselines (Optional[Union[str, List[str]]]) –

__init__(func, output, *, name, description=None, baselines=None)#

Constructor.

Parameters
  • func (CallableCallable) – Function for computing a metric value from DP outputs and a single baseline’s outputs.

  • output (strstr) – The output to calculate the metric over. This is required, even if the program produces a single output.

  • name (strstr) – A name for the metric.

  • description (str | NoneOptional[str] (default: None)) – A description of the metric.

  • baselines (str | List[str] | NoneUnion[str, List[str], None] (default: None)) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property output#

Returns the name of the run output or view name.

Return type

str

property func#

Returns function to be applied.

Return type

Callable

format(value)#

Converts value to human-readable format.

Parameters

value (Any) –

check_compatibility_with_program(program, output_views)#

Checks if the metric is compatible with the program.

Parameters
compute_for_baseline(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Returns the metric value given the DP outputs and the baseline outputs.

Parameters
property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricOutput]

class CustomMultiBaselineMetric(output, func, *, name, description=None, baselines=None)#

Bases: tmlt.analytics.metrics._base.MultiBaselineMetric

Wrapper to turn a function into a metric using DP and single baseline’s output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Example

>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> dp_outputs = {"O": dp_df}
>>> baseline_df1 = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> baseline_df2 = spark.createDataFrame(pd.DataFrame({"A": [6]}))
>>> baseline_outputs = {
...    "O": {"baseline1": baseline_df1, "baseline2": baseline_df2}
... }
>>> _func = lambda dp_outputs, baseline_outputs: {
...    output_key: {
...         baseline_key: AbsoluteError(output_key).compute_on_scalar(
...                 dp_output.first().A, baseline_output.first().A
...         )
...         for baseline_key, baseline_output
...         in baseline_outputs[output_key].items()
...     }
...     for output_key, dp_output in dp_outputs.items()
...  }
>>> metric = CustomMultiBaselineMetric(
...     output="O",
...     func=_func,
...     name="Custom Metric",
...     description="Custom Description",
... )
>>> result = metric.compute_for_multiple_baselines(dp_outputs, baseline_outputs)
>>> result
{'O': {'baseline1': 0, 'baseline2': 1}}
Parameters
  • output (str) –

  • func (Callable) –

  • name (str) –

  • description (Optional[str]) –

  • baselines (Optional[Union[str, List[str]]]) –

__init__(output, func, *, name, description=None, baselines=None)#

Constructor.

Parameters
  • output (strstr) – The output to compute the metric for.

  • func (CallableCallable) – Function for computing a metric value from DP outputs and multiple baseline outputs.

  • name (strstr) – A name for the metric.

  • description (str | NoneOptional[str] (default: None)) – A description of the metric.

  • baselines (str | List[str] | NoneUnion[str, List[str], None] (default: None)) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property output#

Returns the name of the run output or view name.

Return type

str

property func#

Returns function to be applied.

Return type

Callable

format(value)#

Converts value to human-readable format.

Parameters

value (Any) –

check_compatibility_with_program(program, output_views)#

Checks if the metric is compatible with the program.

Parameters
compute_for_multiple_baselines(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Returns the metric value given the DP and multiple baseline outputs.

Parameters
compute(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

The baseline_outputs will already be filtered to only include the baselines that the metric is supposed to use.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs, after filtering to only include the baselines that the metric is supposed to use.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricOutput]

property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricOutput]