QuantileAbsoluteError#

from tmlt.tune import QuantileAbsoluteError
class tmlt.tune.QuantileAbsoluteError(quantile, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Bases: JoinedOutputMetric

Computes the quantile of the empirical absolute error.

This metric matches values in measure_column between the DP output and the baseline output using an inner, 1-to-1 join on join_columns, then computes the absolute error of the DP values, and returns the requested quantile over these absolute errors.

More formally, let \(J\) be all combinations of values of join_columns appearing in both the DP output or the baseline output. For all \(j \in J\), let \({DP}_j\) be the corresponding value of measure_column in the DP output, and \(B_j\) the corresponding value of measure_column in the baseline output. Let \(I\) be the set of indices \(i \in J\) such that \({DP}_i\) and \(B_i\) are valid numeric values (not NaN nor nulls).

The quantile absolute error is defined as the smallest value \(q\) such that:

\[\text{card}\left( \left\{i \in I \text{ such that } \left|{DP}_i-B_i\right| \le q\right\} \right) \ge \texttt{quantile} \cdot \text{card}(I)\]

where \(\text{card}\) denotes the cardinality of a set. If \(I\) is empty, the metric returns NaN.

If grouping_columns is defined, then the DP output and the baseline output are both grouped by these columns, the quantile absolute error is calculated separately for each group, and the metric returns a DataFrame. Otherwise, the metric returns a single number.

In each group (or globally if grouping_column is None), each combination of values of join_columns must appear in at most one row of the DP output and the baseline output. Otherwise, the metric returns an error.

Note

This metric only measures error for rows that can be mapped 1-to-1 between the DP output and the baseline output (according to the values in join_columns). This ignores the error from rows that appear in only one of the two tables; to capture this kind of error, use SuppressionRate and/or SpuriousRate.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [50, 110, 100]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [100, 100, 100]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = QuantileAbsoluteError(
...     quantile=0.5,
...     measure_column="X",
...     join_columns=["A"]
... )
>>> metric.quantile
0.5
>>> metric.join_columns
['A']
>>> result = metric(dp_outputs, baseline_outputs).value
>>> result
10
property quantile: float#

Returns the quantile.

compute_qae(joined_output, result_column_name)#

Computes quantile relative error value from grouped dataframe.