JoinedOutputMetric#
from tmlt.tune import JoinedOutputMetric
- class tmlt.tune.JoinedOutputMetric(name, func, join_columns, description=None, baseline=None, output=None, grouping_columns=None, measure_column=None, empty_value=None, join_how='inner', dropna_columns=None, indicator_column_name=None)#
Bases:
SingleOutputMetric
A metric computed from a join between a single DP and baseline output.
The metric is defined using a function
func
. This function must have the following parameters:joined_output
: A DataFrame created by joining the selected DP and baseline outputs.
It may also have the following optional parameters:
result_column_name
: if the function returns a dataframe, the metric results should be in a column with this nameunprotected_inputs
: A dictionary containing the program’s unprotected inputs.parameters
: A dictionary containing the program’s parameters.
If the metric does not have grouping columns, the function must return a numeric value, a boolean, or a string. If the metric has grouping columns, then it must return a DataFrame. This DataFrame should contain the grouping columns, and exactly one additional column containing the metric value for each group. This column’s type should be numeric, boolean, or string.
Example
>>> dp_df = spark.createDataFrame(pd.DataFrame([{"A": 1, "B": "a"}])) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame(pd.DataFrame([{"A": 5}])) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> def size_difference(joined_output: DataFrame, ... result_column_name: str): ... in_dp = (col("indicator") == "both") | (col("indicator") == "dp") ... in_baseline = ((col("indicator") == "both") | ... (col("indicator") == "baseline")) ... dp_count = sf.sum(sf.when(in_dp, sf.lit(1)).otherwise(0)) ... baseline_count = sf.sum(sf.when(in_baseline, sf.lit(1)).otherwise(0)) ... size_difference = joined_output.agg( ... sf.abs(dp_count - baseline_count).alias(result_column_name) ... ) ... return size_difference.head(1)[0][result_column_name]
>>> metric = JoinedOutputMetric( ... func=size_difference, ... name="Output size difference", ... description="Difference in number of rows.", ... join_columns=["A"], ... join_how="outer", ... indicator_column_name="indicator", ... ) >>> result = metric(dp_outputs, baseline_outputs).value >>> result 0
- required_func_parameters()#
Returns the required parameters to the metric function.
- check_join_key_uniqueness(joined_output)#
Check if the join keys uniquely identify rows in the joined DataFrame.
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- metric_function_inputs_empty(function_params)#
Determines if the given inputs are empty.
- Return type:
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (
Mapping
[str
,Optional
[DataFrame
]]) – The differentially private outputs of the program.baseline_outputs (
Mapping
[str
,Mapping
[str
,Optional
[DataFrame
]]]) – The outputs of the baseline programs.unprotected_inputs (
Optional
[Mapping
[str
,DataFrame
]]) – Optional public dataframes used in error computation.parameters (
Optional
[Mapping
[str
,Any
]]) – Optional program specific parameters used in error computation.
- Return type: