SuppressionRate#
from tmlt.tune import SuppressionRate
- class tmlt.tune.SuppressionRate(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Bases:
JoinedOutputMetric
Computes the fraction of values in the baseline output but not in the DP output.
This metric counts how many values of
join_columns
appear in the baseline output but not in the DP output (such values are called suppressed), and return the ratio between this number and the total number of values ofjoin_columns
in the baseline output.More formally, let \(s\) be the number of combinations of values of
join_columns
appearing in the baseline output but not in the DP output, andb
be the number of combinations of values ofjoin_columns
appearing in the baseline output. The metric returns \(s/b\); if \(b=0\), \(s\) must also be 0, and the metric returns 0.If
grouping_columns
is defined, then the DP output and the baseline output are both grouped by these columns, the suppression rate is calculated separately for each group, and the metric returns a DataFrame. Otherwise, the metric returns a single number.In each group (or globally if
grouping_column
isNone
), each combination of values ofjoin_columns
must appear in at most one row of the DP output and the baseline output. Otherwise, the metric returns an error.Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3", "c"], ... "X": [50, 110, 100, 50] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3", "b"], ... "X": [100, 100, 100, 50] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = SuppressionRate( ... join_columns=["A"] ... ) >>> metric.join_columns ['A'] >>> metric(dp_outputs, baseline_outputs).value 0.25
- compute_suppression_rate(joined_output, result_column_name)#
Computes suppression rate given DP and baseline outputs.