QuantileAbsoluteError#
from tmlt.tune import QuantileAbsoluteError
- class tmlt.tune.QuantileAbsoluteError(quantile, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Bases:
JoinedOutputMetric
Computes the quantile of the empirical absolute error.
This metric matches values in
measure_column
between the DP output and the baseline output using an inner, 1-to-1 join onjoin_columns
, then computes the absolute error of the DP values, and returns the requested quantile over these absolute errors.More formally, let \(J\) be all combinations of values of
join_columns
appearing in both the DP output or the baseline output. For all \(j \in J\), let \({DP}_j\) be the corresponding value ofmeasure_column
in the DP output, and \(B_j\) the corresponding value ofmeasure_column
in the baseline output. Let \(I\) be the set of indices \(i \in J\) such that \({DP}_i\) and \(B_i\) are valid numeric values (not NaN nor nulls).The quantile absolute error is defined as the smallest value \(q\) such that:
\[\text{card}\left( \left\{i \in I \text{ such that } \left|{DP}_i-B_i\right| \le q\right\} \right) \ge \texttt{quantile} \cdot \text{card}(I)\]where \(\text{card}\) denotes the cardinality of a set. If \(I\) is empty, the metric returns NaN.
If
grouping_columns
is defined, then the DP output and the baseline output are both grouped by these columns, the quantile absolute error is calculated separately for each group, and the metric returns a DataFrame. Otherwise, the metric returns a single number.In each group (or globally if
grouping_column
isNone
), each combination of values ofjoin_columns
must appear in at most one row of the DP output and the baseline output. Otherwise, the metric returns an error.Note
This metric only measures error for rows that can be mapped 1-to-1 between the DP output and the baseline output (according to the values in
join_columns
). This ignores the error from rows that appear in only one of the two tables; to capture this kind of error, useSuppressionRate
and/orSpuriousRate
.Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [50, 110, 100] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [100, 100, 100] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = QuantileAbsoluteError( ... quantile=0.5, ... measure_column="X", ... join_columns=["A"] ... ) >>> metric.quantile 0.5 >>> metric.join_columns ['A'] >>> result = metric(dp_outputs, baseline_outputs).value >>> result 10
- compute_qae(joined_output, result_column_name)#
Computes quantile relative error value from grouped dataframe.