_user_defined#
Metric functions for wrapping user-defined functions.
Classes#
Wrapper to allow users to define a metric that operates on a single output table. |
|
Wrapper to turn a function into a metric using DP and single baseline’s output. |
- class CustomSingleOutputMetric(func, *, name, description=None, output)#
Bases:
tmlt.analytics.metrics._base.SingleBaselineMetric
Wrapper to allow users to define a metric that operates on a single output table.
Turns a function that calculates error on two dataframes (one DP, one baseline) into a Metric.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
Example
>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]})) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame(pd.DataFrame({"A": [5]})) >>> baseline_outputs = {"O": baseline_df}
>>> def size_difference(dp: DataFrame, baseline: DataFrame): ... return baseline.count() - dp.count()
>>> metric = CustomSingleOutputMetric( ... func=size_difference, ... name="Output size difference", ... description="Difference in number of rows.", ... output="O", ... ) >>> result = metric.compute_for_baseline(dp_outputs, baseline_outputs) >>> result 0 >>> metric.format(result) '0'
- Parameters
func (Callable[[pyspark.sql.DataFrame, pyspark.sql.DataFrame], Any]) –
name (str) –
description (Optional[str]) –
output (str) –
- __init__(func, *, name, description=None, output)#
Constructor.
- Parameters
func ((
DataFrame
,DataFrame
) →Any
Callable
[[DataFrame
,DataFrame
],Any
]) – Function for computing a metric value from DP outputs and a single baseline’s outputs.description (
str
|None
Optional
[str
] (default:None
)) – A description of the metric.baselines – The name of the baseline program(s) used for the error report. If None, use all baselines. If a string, use only that baseline. If a list, use only those baselines.
output (
str
str
) – The output to calculate the metric over. This is required, even if the program produces a single output.
- property func#
Returns function to be applied.
- Return type
Callable
- format(value)#
Converts value to human-readable format.
- Parameters
value (Any) –
- check_compatibility_with_program(program)#
Checks if the metric is compatible with the program.
- Parameters
program (Type[tmlt.analytics.program.SessionProgram]) –
- compute_for_baseline(dp_outputs, baseline_outputs)#
Returns the metric value given the DP outputs and the baseline outputs.
- Parameters
dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, pyspark.sql.DataFrame]) –
- property baselines#
Returns the baselines used for the metric.
- __call__(dp_outputs, baseline_outputs)#
Computes the given metric on the given DP and baseline outputs.
- Parameters
dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.
- Return type
- class CustomMultiBaselineMetric(output, func, *, name, description=None, baselines=None)#
Bases:
tmlt.analytics.metrics._base.MultiBaselineMetric
Wrapper to turn a function into a metric using DP and single baseline’s output.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
Example
>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]})) >>> dp_outputs = {"O": dp_df} >>> baseline_df1 = spark.createDataFrame(pd.DataFrame({"A": [5]})) >>> baseline_df2 = spark.createDataFrame(pd.DataFrame({"A": [6]})) >>> baseline_outputs = { ... "O": {"baseline1": baseline_df1, "baseline2": baseline_df2} ... } >>> _func = lambda dp_outputs, baseline_outputs: { ... output_key: { ... baseline_key: AbsoluteError(output_key).compute_on_scalar( ... dp_output.first().A, baseline_output.first().A ... ) ... for baseline_key, baseline_output ... in baseline_outputs[output_key].items() ... } ... for output_key, dp_output in dp_outputs.items() ... }
>>> metric = CustomMultiBaselineMetric( ... output="O", ... func=_func, ... name="Custom Metric", ... description="Custom Description", ... ) >>> result = metric.compute_for_multiple_baselines(dp_outputs, baseline_outputs) >>> result {'O': {'baseline1': 0, 'baseline2': 1}}
- Parameters
output (str) –
func (Callable[[Dict[str, pyspark.sql.DataFrame], Dict[str, Dict[str, pyspark.sql.DataFrame]]], Any]) –
name (str) –
description (Optional[str]) –
baselines (Optional[List[str]]) –
- __init__(output, func, *, name, description=None, baselines=None)#
Constructor.
- Parameters
func (({
str
:DataFrame
}, {str
: {str
:DataFrame
}}) →Any
Callable
[[Dict
[str
,DataFrame
],Dict
[str
,Dict
[str
,DataFrame
]]],Any
]) – Function for computing a metric value from DP outputs and multiple baseline outputs.description (
str
|None
Optional
[str
] (default:None
)) – A description of the metric.baselines (
List
[str
] |None
Optional
[List
[str
]] (default:None
)) – The name of the baseline program(s) used for the error report. If None, use all baselines. If a string, use only that baseline. If a list, use only those baselines.
- property func#
Returns function to be applied.
- Return type
Callable
- format(value)#
Converts value to human-readable format.
- Parameters
value (Any) –
- check_compatibility_with_program(program)#
Checks if the metric is compatible with the program.
- Parameters
program (Type[tmlt.analytics.program.SessionProgram]) –
- compute_for_multiple_baselines(dp_outputs, baseline_outputs)#
Returns the metric value given the DP and multiple baseline outputs.
- Parameters
dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) –
- compute(dp_outputs, baseline_outputs)#
Computes the given metric on the given DP and baseline outputs.
The baseline_outputs will already be filtered to only include the baselines that the metric is supposed to use.
- Parameters
dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs, after filtering to only include the baselines that the metric is supposed to use.
- Return type
- property baselines#
Returns the baselines used for the metric.
- __call__(dp_outputs, baseline_outputs)#
Computes the given metric on the given DP and baseline outputs.
- Parameters
dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.
- Return type