_absolute_error#

Metric functions relating to absolute error.

Classes#

`AbsoluteError`	Computes the absolute error between two scalar values.
`QuantileAbsoluteError`	Computes the quantile of the empirical absolute error.
`MedianAbsoluteError`	Computes the median absolute error.

class AbsoluteError(output, column=None, *, name=None, description=None, baselines=None)#

Bases: tmlt.analytics.metrics._base.ScalarMetric

Computes the absolute error between two scalar values.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

How it works:

The algorithm takes as input two single-row tables: one representing the differentially private (DP) output and the other representing the baseline output.

DP Table (dp): This table contains the output data generated by a differentially private mechanism.

Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.

The scalar values are retrieved from these single-row dataframes. Both values are expected to be numeric (either integers or floats). If not, the algorithm raises a ValueError.
The algorithm computes the absolute error. Absolute error is calculated as the absolute difference between the DP and baseline values using the formula \(abs(dp - baseline)\).

Example

>>> dp_df = spark.createDataFrame(pd.DataFrame({"X": [5]}))
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(pd.DataFrame({"X": [6]}))
>>> baseline_outputs = {"O": baseline_df}

>>> metric = AbsoluteError(output="O")
>>> result = metric.compute_for_baseline(dp_outputs, baseline_outputs)
>>> result
1
>>> metric.format(result)
'1'

Parameters

output (str) –
column (Optional[str]) –
name (Optional[str]) –
description (Optional[str]) –
baselines (Optional[List[str]]) –

__init__(output, column=None, *, name=None, description=None, baselines=None)#

Constructor.

Parameters

output (strstr) – The output to compute the metric for.
column (str | NoneOptional[str] (default: None)) – The column to compute the absolute error over. If the given output has only one column, this argument may be omitted.
name (str | NoneOptional[str] (default: None)) – A name for the metric.
description (str | NoneOptional[str] (default: None)) – A description of the metric.
baselines (List[str] | NoneOptional[List[str]] (default: None)) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

format(value)#: Returns a string representation of this object.

compute_on_scalar(dp_value, baseline_value)#: Computes metric value from DP and baseline values.

property output#

Returns the name of the run output or view name.

Return type: str

property column#

Returns the name of the value column, if it is set.

Return type: Optional[str]

compute_for_baseline(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Returns the metric value given the DP outputs and the baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, pyspark.sql.DataFrame]) –
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) –
program_parameters (Optional[Dict[str, Any]]) –

Return type

Any

property name#

Returns the name of the metric.

Return type: str

property description#

Returns the description of the metric.

Return type: str

property baselines#

Returns the baselines used for the metric.

Return type: Optional[Union[str, List[str]]]

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricOutput]

class QuantileAbsoluteError(output, quantile, measure_column, join_columns, *, name=None, description=None, baselines=None)#

Bases: tmlt.analytics.metrics._base.JoinedOutputMetric

Computes the quantile of the empirical absolute error.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

How it works:

The algorithm takes as input two tables: one representing the differentially private (DP) output and the other representing the baseline output.

DP Table (dp): This table contains the output data generated by a differentially private mechanism.

Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.

The algorithm includes error handling to ensure the validity of the input data. It checks for the existence and numeric type of the measure_column.

The algorithm performs an inner join between the DP and baseline tables based on join_columns. This join must be one-to-one, with each row in the DP table matching exactly one row in the baseline table, and vice versa. This ensures that there is a direct correspondence between the DP and baseline outputs for each entity, allowing for accurate comparison.
After performing the join, the algorithm computes the absolute error for each group. Absolute error is calculated as the absolute difference between the corresponding values in the DP and baseline outputs using the formula \(abs(dp - baseline)\).
The algorithm then calculates the n-th quantile of the absolute error across all groups.

The algorithm handles cases where the quantile computation may result in an empty column, returning a NaN (not a number) value in such scenarios.
Note
- Provided algorithm assumes a one-to-one join scenario.
- Nulls in the measure columns are dropped because the metric cannot handle null values, and the absolute error computation requires valid numeric values in both columns.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [50, 110, 100]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [100, 100, 100]
...         }
...     )
... )
>>> baseline_outputs = {"O": baseline_df}

>>> metric = QuantileAbsoluteError(
...     output="O",
...     quantile=0.5,
...     measure_column="X",
...     join_columns=["A"]
... )
>>> metric.quantile
0.5
>>> metric.join_columns
['A']
>>> result = metric.compute_for_baseline(dp_outputs, baseline_outputs)
>>> result
10.0
>>> metric.format(result)
'10.0'

Methods#
`quantile()`	Returns the quantile.
`measure_column()`	Returns name of the column to compute the quantile of absolute error over.
`format()`	Returns a string representation of this object.
`compute_on_joined_output()`	Computes quantile absolute error value from combined dataframe.
`output()`	Returns the name of the run output or view name.
`join_columns()`	Returns the name of the join columns.
`check_join_key_uniqueness()`	Check if the join keys uniquely identify rows in the joined DataFrame.
`compute_for_baseline()`	Computes metric value.
`name()`	Returns the name of the metric.
`description()`	Returns the description of the metric.
`baselines()`	Returns the baselines used for the metric.
`__call__()`	Computes the given metric on the given DP and baseline outputs.

Parameters

output (str) –
quantile (float) –
measure_column (str) –
join_columns (List[str]) –
name (Optional[str]) –
description (Optional[str]) –
baselines (Optional[List[str]]) –

__init__(output, quantile, measure_column, join_columns, *, name=None, description=None, baselines=None)#

Constructor.

Parameters

output (strstr) – The output to compute the metric for.
measure_column (strstr) – The column to compute the quantile of absolute error over.
quantile (floatfloat) – The quantile to calculate (between 0 and 1).
join_columns (List[str]List[str]) – Columns to join on.
name (str | NoneOptional[str] (default: None)) – A name for the metric.
description (str | NoneOptional[str] (default: None)) – A description of the metric.
baselines (List[str] | NoneOptional[List[str]] (default: None)) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property quantile#

Returns the quantile.

Return type: float

property measure_column#

Returns name of the column to compute the quantile of absolute error over.

Return type: str

format(value)#: Returns a string representation of this object.

compute_on_joined_output(joined_output)#

Computes quantile absolute error value from combined dataframe.

Parameters: joined_output (pyspark.sql.DataFrame) –

property output#

Returns the name of the run output or view name.

Return type: str

property join_columns#

Returns the name of the join columns.

Return type: List[str]

check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters: joined_output (pyspark.sql.DataFrame) –

compute_for_baseline(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes metric value.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, pyspark.sql.DataFrame]) –
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) –
program_parameters (Optional[Dict[str, Any]]) –

property name#

Returns the name of the metric.

Return type: str

property description#

Returns the description of the metric.

Return type: str

property baselines#

Returns the baselines used for the metric.

Return type: Optional[Union[str, List[str]]]

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricOutput]

class MedianAbsoluteError(output, measure_column, join_columns, *, name=None, description=None, baselines=None)#

Bases: QuantileAbsoluteError

Computes the median absolute error.

Equivalent to QuantileAbsoluteError with quantile = 0.5.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Example

>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [100, 100, 100]
...         }
...     )
... )
>>> baseline_outputs = {"O": baseline_df}

>>> metric = MedianAbsoluteError(
...     output="O",
...     measure_column="X",
...     join_columns=["A"]
... )
>>> metric.quantile
0.5
>>> metric.join_columns
['A']
>>> result = metric.compute_for_baseline(dp_outputs, baseline_outputs)
>>> result
10.0
>>> metric.format(result)
'10.0'

Methods#
`quantile()`	Returns the quantile.
`measure_column()`	Returns name of the column to compute the quantile of absolute error over.
`format()`	Returns a string representation of this object.
`compute_on_joined_output()`	Computes quantile absolute error value from combined dataframe.
`output()`	Returns the name of the run output or view name.
`join_columns()`	Returns the name of the join columns.
`check_join_key_uniqueness()`	Check if the join keys uniquely identify rows in the joined DataFrame.
`compute_for_baseline()`	Computes metric value.
`name()`	Returns the name of the metric.
`description()`	Returns the description of the metric.
`baselines()`	Returns the baselines used for the metric.
`__call__()`	Computes the given metric on the given DP and baseline outputs.

Parameters

output (str) –
measure_column (str) –
join_columns (List[str]) –
name (Optional[str]) –
description (Optional[str]) –
baselines (Optional[List[str]]) –

__init__(output, measure_column, join_columns, *, name=None, description=None, baselines=None)#

Constructor.

Parameters

output (strstr) – The output to compute the metric for.
measure_column (strstr) – The column to compute the median of absolute error over.
join_columns (List[str]List[str]) – Columns to join on.
name (str | NoneOptional[str] (default: None)) – A name for the metric.
description (str | NoneOptional[str] (default: None)) – A description of the metric.
baselines (List[str] | NoneOptional[List[str]] (default: None)) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property quantile#

Returns the quantile.

Return type: float

property measure_column#

Returns name of the column to compute the quantile of absolute error over.

Return type: str

format(value)#: Returns a string representation of this object.

compute_on_joined_output(joined_output)#

Computes quantile absolute error value from combined dataframe.

Parameters: joined_output (pyspark.sql.DataFrame) –

property output#

Returns the name of the run output or view name.

Return type: str

property join_columns#

Returns the name of the join columns.

Return type: List[str]

check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters: joined_output (pyspark.sql.DataFrame) –

compute_for_baseline(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes metric value.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, pyspark.sql.DataFrame]) –
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) –
program_parameters (Optional[Dict[str, Any]]) –

property name#

Returns the name of the metric.

Return type: str

property description#

Returns the description of the metric.

Return type: str

property baselines#

Returns the baselines used for the metric.

Return type: Optional[Union[str, List[str]]]

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricOutput]

Tumult Analytics Pro

_absolute_error#

Classes#