metrics#
Metrics to measure the quality of program outputs.
Metrics are to be used in the context of program evaluation and error reporting, where the goal is to compare the outputs of a program to a set of expected outputs to determine the error. They are designed to be flexible and extensible, allowing users to combine them in various ways and define their own metrics.
A number of pre-built metrics are provided, for example
QuantileAbsoluteError
, HighRelativeErrorFraction
, and
SpuriousRate
. Users can also define their own custom metrics using
JoinedOutputMetric
, SingleOutputMetric
, or Metric
.
Suppose we have a SessionProgram
that has one protected input and
produces one output that is a count of the number of rows in the protected input.
>>> class MinimalProgram(SessionProgram):
... class ProtectedInputs:
... protected_df: DataFrame # DataFrame type annotation is required
... class Outputs:
... count_per_a: DataFrame # required here too
... def session_interaction(self, session: Session):
... a_keyset = KeySet.from_dict({"A": [1, 2, 3, 4]})
... count_query = QueryBuilder("protected_df").groupby(a_keyset).count()
... budget = self.privacy_budget # session.remaining_privacy_budget also works
... count_per_a = session.evaluate(count_query, budget)
... return {"count_per_a": count_per_a}
We can pass this information to the SessionProgramTuner
class,
which is what gives us access to error reports.
We can measure the error of the program by comparing the program output to
the expected output. Suppose we want to use a built-in metric: the median absolute
error MedianAbsoluteError
and a custom metric: root mean squared error.
We need to instantiate the metrics and include them in the list of metric
associated with the metrics
class variable.
>>> protected_df = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]}))
>>> class Tuner(SessionProgramTuner, program=MinimalProgram):
... @joined_output_metric(name="root_mean_squared_error",
... output="count_per_a",
... join_columns=["A"],
... description="Root mean squared error for column count of count_per_a")
... def compute_rmse(
... joined_output: DataFrame,
... result_column_name: str,
... ):
... err = sf.col("count_dp") - sf.col("count_baseline")
... rmse = joined_output.agg(
... sf.sqrt(sf.avg(sf.pow(err, sf.lit(2)))).alias(result_column_name))
... return rmse.head(1)[0][result_column_name]
...
... metrics = [
... MedianAbsoluteError(output="count_per_a", measure_column="count", join_columns=["A"]),
... ]
>>> tuner = (
... Tuner.Builder()
... .with_privacy_budget(PureDPBudget(epsilon=1))
... .with_private_dataframe("protected_df", protected_df, AddOneRow())
... .build()
... )
Now that our SessionProgramTuner is initialized, we can get our very first error
report by calling the error_report()
method.
>>> error_report = tuner.error_report()
>>> error_report.dp_outputs["count_per_a"].show()
+---+-----+
| A|count|
+---+-----+
| 1| 2|
| 2| 2|
| 3| 5|
| 4| 3|
+---+-----+
>>> error_report.baseline_outputs["default"]["count_per_a"].show()
+---+-----+
| A|count|
+---+-----+
| 1| 1|
| 2| 2|
| 3| 3|
| 4| 4|
+---+-----+
>>> error_report.show()
Error report ran with budget PureDPBudget(epsilon=1) and no tunable parameters and no additional parameters
Metric results:
Metric results:
+---------+-------------------------+------------+-------------------------------------------------------------+
| Value | Metric | Baseline | Description |
+=========+=========================+============+=============================================================+
| 0 | mae | default | Median absolute error for column count of table count_per_a |
+---------+-------------------------+------------+-------------------------------------------------------------+
| 0 | root_mean_squared_error | default | Root mean squared error |
+---------+-------------------------+------------+-------------------------------------------------------------+
More illustrated examples of how to define and use metrics can be found in the Basics of error measurement and Specifying error metrics tutorials.
Classes#
Computes the quantile of the empirical absolute error. |
|
Computes the median absolute error. |
|
Computes the quantile of the empirical relative error. |
|
Computes the median relative error. |
|
Computes the fraction of groups with relative error above a threshold. |
|
Computes the count of groups with relative error above a threshold. |
|
Computes the fraction of groups in the DP output but not in the baseline output. |
|
Computes the number of groups in the DP output but not in the baseline output. |
|
Computes the fraction of groups in the baseline output but not in the DP output. |
|
Computes the count of groups in the baseline output but not in the DP output. |
|
A generic metric defined using a function. |
|
A metric computed from a single output table, defined using a function. |
|
A metric computed from a join between a single DP and baseline output. |
|
The output of a |
|
An output of a |
|
The output of a |
|
Returns the number of rows in the baseline output. |
|
Returns the number of rows released in the DP output. |
- class QuantileAbsoluteError(quantile, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Bases:
tmlt.analytics.metrics._base.JoinedOutputMetric
Computes the quantile of the empirical absolute error.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
How it works:
The algorithm takes as input two tables: one representing the differentially private (DP) output and the other representing the baseline output.
DP Table (dp): This table contains the output data generated by a differentially private mechanism.
Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.
The algorithm includes error handling to ensure the validity of the input data. It checks for the existence and numeric type of the
measure_column
.The algorithm performs an inner join between the DP and baseline tables based on
join_columns
. This join must be one-to-one, with each row in the DP table matching exactly one row in the baseline table, and vice versa. This ensures that there is a direct correspondence between the DP and baseline outputs for each entity, allowing for accurate comparison.After performing the join, the algorithm computes the absolute error for each group. Absolute error is calculated as the absolute difference between the corresponding values in the DP and baseline outputs using the formula \(abs(dp - baseline)\).
The algorithm then calculates the n-th quantile of the absolute error across all groups.
The algorithm handles cases where the quantile computation may result in an empty column, returning a NaN (not a number) value in such scenarios.
Note
Provided algorithm assumes a one-to-one join scenario.
Nulls in the measure columns are dropped because the metric cannot handle null values, and the absolute error computation requires valid numeric values in both columns.
Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [50, 110, 100] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [100, 100, 100] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = QuantileAbsoluteError( ... quantile=0.5, ... measure_column="X", ... join_columns=["A"] ... ) >>> metric.quantile 0.5 >>> metric.join_columns ['A'] >>> result = metric(dp_outputs, baseline_outputs).value >>> result 10
# Returns the quantile.
Returns the name of the join columns.
Returns the name of the indicator column.
Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Computes quantile relative error value from grouped dataframe.
Return the required parameters to the metric function.
Check that a particular set of outputs is compatible with the metric.
Check if the join keys uniquely identify rows in the joined DataFrame.
Returns values for the function’s parameters.
Determines if the given inputs are empty.
Computes the given metric on the given DP and baseline outputs.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Check that the outputs have all the structure the metric expects.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property indicator_column_name: str | None#
Returns the name of the indicator column.
- Return type:
Optional[str]
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(quantile, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Constructor.
- Parameters:
quantile (
float
) – The quantile to calculate (between 0 and 1).measure_column (
str
) – The column to compute the quantile of absolute error over.grouping_columns (
Optional
[List
[str
]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).
- compute_qae(joined_output, result_column_name)#
Computes quantile relative error value from grouped dataframe.
- Parameters:
joined_output (pyspark.sql.DataFrame)
result_column_name (str)
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular set of outputs is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_join_key_uniqueness(joined_output)#
Check if the join keys uniquely identify rows in the joined DataFrame.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the given inputs are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class MedianAbsoluteError(measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Bases:
QuantileAbsoluteError
Computes the median absolute error.
Equivalent to
QuantileAbsoluteError
withquantile = 0.5
.Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
Example
>>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [100, 100, 100] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = MedianAbsoluteError( ... measure_column="X", ... join_columns=["A"] ... ) >>> metric.quantile 0.5 >>> metric.join_columns ['A'] >>> result = metric(dp_outputs, baseline_outputs).value >>> result 10
# Returns the quantile.
Returns the name of the join columns.
Returns the name of the indicator column.
Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Computes quantile relative error value from grouped dataframe.
Return the required parameters to the metric function.
Check that a particular set of outputs is compatible with the metric.
Check if the join keys uniquely identify rows in the joined DataFrame.
Returns values for the function’s parameters.
Determines if the given inputs are empty.
Computes the given metric on the given DP and baseline outputs.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Check that the outputs have all the structure the metric expects.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property indicator_column_name: str | None#
Returns the name of the indicator column.
- Return type:
Optional[str]
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Constructor.
- Parameters:
measure_column (
str
) – The column to compute the quantile of absolute error over.grouping_columns (
Optional
[List
[str
]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).
- compute_qae(joined_output, result_column_name)#
Computes quantile relative error value from grouped dataframe.
- Parameters:
joined_output (pyspark.sql.DataFrame)
result_column_name (str)
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular set of outputs is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_join_key_uniqueness(joined_output)#
Check if the join keys uniquely identify rows in the joined DataFrame.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the given inputs are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class QuantileRelativeError(quantile, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Bases:
tmlt.analytics.metrics._base.JoinedOutputMetric
Computes the quantile of the empirical relative error.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
How it works:
The algorithm takes as input two tables: one representing the differentially private (DP) output and the other representing the baseline output.
DP Table (dp): This table contains the output data generated by a differentially private mechanism.
Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.
The algorithm includes error handling to ensure the validity of the input data. It checks for the existence and numeric type of the
measure_column
.The algorithm performs an inner join between the DP and baseline tables based on
join_columns
to produce the combined dataframe. This join must be one-to-one, with each row in the DP table matches exactly one row in the baseline table, and vice versa. This ensures that there is a direct correspondence between the DP and baseline outputs for each entity, allowing for accurate comparison.After performing the join, the algorithm computes the relative error for each group. Relative error is calculated as the absolute difference between the corresponding values in the DP and baseline outputs to the value in the baseline using the formula \(abs(dp - baseline) / baseline\). If baseline is zero, it returns infinity for non-zero differences (\(∞\)) and zero for zero differences (\(0\)).
The algorithm then calculates the n-th quantile of the relative error across all groups.
The algorithm handles cases where the quantile computation may result in an empty column, returning a NaN (not a number) value in such scenarios.
Note
Provided algorithm assumes a one-to-one join scenario.
Nulls in the measure columns are dropped because the metric cannot handle null values, and the absolute error computation requires valid numeric values in both columns.
Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [50, 110, 100] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [100, 100, 100] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = QuantileRelativeError( ... quantile=0.5, ... measure_column="X", ... join_columns=["A"] ... ) >>> metric.quantile 0.5 >>> metric.join_columns ['A'] >>> result = metric(dp_outputs, baseline_outputs).value >>> result 0.1
# Returns the quantile.
Returns the name of the join columns.
Returns the name of the indicator column.
Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Computes quantile relative error value from grouped dataframe.
Return the required parameters to the metric function.
Check that a particular set of outputs is compatible with the metric.
Check if the join keys uniquely identify rows in the joined DataFrame.
Returns values for the function’s parameters.
Determines if the given inputs are empty.
Computes the given metric on the given DP and baseline outputs.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Check that the outputs have all the structure the metric expects.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property indicator_column_name: str | None#
Returns the name of the indicator column.
- Return type:
Optional[str]
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(quantile, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Constructor.
- Parameters:
quantile (
float
) – The quantile to calculate (between 0 and 1).measure_column (
str
) – The column to compute the quantile of relative error over.grouping_columns (
Optional
[List
[str
]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).
- compute_qre(joined_output, result_column_name)#
Computes quantile relative error value from grouped dataframe.
- Parameters:
joined_output (pyspark.sql.DataFrame)
result_column_name (str)
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular set of outputs is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_join_key_uniqueness(joined_output)#
Check if the join keys uniquely identify rows in the joined DataFrame.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the given inputs are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class MedianRelativeError(measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Bases:
QuantileRelativeError
Computes the median relative error.
Equivalent to
QuantileRelativeError
withquantile = 0.5
.Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [50, 110, 100] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [100, 100, 100] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = MedianRelativeError( ... measure_column="X", ... join_columns=["A"] ... ) >>> metric.quantile 0.5 >>> metric.join_columns ['A'] >>> result = metric(dp_outputs, baseline_outputs).value >>> result 0.1
# Returns the quantile.
Returns the name of the join columns.
Returns the name of the indicator column.
Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Computes quantile relative error value from grouped dataframe.
Return the required parameters to the metric function.
Check that a particular set of outputs is compatible with the metric.
Check if the join keys uniquely identify rows in the joined DataFrame.
Returns values for the function’s parameters.
Determines if the given inputs are empty.
Computes the given metric on the given DP and baseline outputs.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Check that the outputs have all the structure the metric expects.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property indicator_column_name: str | None#
Returns the name of the indicator column.
- Return type:
Optional[str]
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Constructor.
- Parameters:
measure_column (
str
) – The column to compute the median of relative error over.grouping_columns (
Optional
[List
[str
]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – The output to compute the metric for.
- compute_qre(joined_output, result_column_name)#
Computes quantile relative error value from grouped dataframe.
- Parameters:
joined_output (pyspark.sql.DataFrame)
result_column_name (str)
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular set of outputs is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_join_key_uniqueness(joined_output)#
Check if the join keys uniquely identify rows in the joined DataFrame.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the given inputs are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class HighRelativeErrorFraction(relative_error_threshold, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Bases:
tmlt.analytics.metrics._base.JoinedOutputMetric
Computes the fraction of groups with relative error above a threshold.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
How it works:
The algorithm takes as input two tables: one representing the differentially private (DP) output and the other representing the baseline output.
DP Table (dp): This table contains the output data generated by a differentially private mechanism.
Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.
The algorithm includes error handling to ensure the validity of the input data. It checks for the existence and numeric type of the
measure_column
.The algorithm performs an inner join between the DP and baseline tables based on
join_columns
to produce the combined dataframe. This join must be one-to-one, with each row in the DP table matches exactly one row in the baseline table, and vice versa. This ensures that there is a direct correspondence between the DP and baseline outputs for each entity, allowing for accurate comparison.After performing the join, the algorithm computes the relative error for each group. Relative error is calculated as the absolute difference between the corresponding values in the DP and baseline outputs to the value in the baseline using the formula \(abs(dp - baseline) / baseline\). If baseline is zero, it returns infinity for non-zero differences (\(∞\)) and zero for zero differences (\(0\)).
Next, the algorithm filters the relative error dataframe to include only those data points where the relative error exceeds a specified threshold (
relative_error_threshold
). This threshold represents the maximum allowable relative error for a data point to be considered within acceptable bounds.Finally, the algorithm then calculates the high relative error fraction by dividing the count of data points with relative errors exceeding the threshold by the total count of data points in the dataframe.
The algorithm handles cases where the resulting dataframe after relative error computation is empty (i.e., it contains no data points), returning a NaN (not a number) value in such scenarios.
Note
Provided algorithm assumes a one-to-one join scenario.
Nulls in the measure columns are dropped because the metric cannot handle null values, and the absolute error computation requires valid numeric values in both columns.
Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [50, 110, 100] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [100, 100, 100] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = HighRelativeErrorFraction( ... measure_column="X", ... relative_error_threshold=0.25, ... join_columns=["A"] ... ) >>> metric.relative_error_threshold 0.25 >>> metric.join_columns ['A'] >>> result = metric(dp_outputs, baseline_outputs).value >>> result 0.333
# Returns the relative error threshold.
Returns the name of the join columns.
Returns the name of the indicator column.
Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Computes quantile relative error value from grouped dataframe.
Return the required parameters to the metric function.
Check that a particular set of outputs is compatible with the metric.
Check if the join keys uniquely identify rows in the joined DataFrame.
Returns values for the function’s parameters.
Determines if the given inputs are empty.
Computes the given metric on the given DP and baseline outputs.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Check that the outputs have all the structure the metric expects.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property indicator_column_name: str | None#
Returns the name of the indicator column.
- Return type:
Optional[str]
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(relative_error_threshold, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Constructor.
- Parameters:
relative_error_threshold (
float
) – The threshold for the relative error.measure_column (
str
) – The column to compute relative error over.grouping_columns (
Optional
[List
[str
]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).
- compute_high_re(joined_output, result_column_name)#
Computes quantile relative error value from grouped dataframe.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular set of outputs is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_join_key_uniqueness(joined_output)#
Check if the join keys uniquely identify rows in the joined DataFrame.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the given inputs are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class HighRelativeErrorCount(relative_error_threshold, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Bases:
tmlt.analytics.metrics._base.JoinedOutputMetric
Computes the count of groups with relative error above a threshold.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
How it works:
The algorithm takes as input two tables: one representing the differentially private (DP) output and the other representing the baseline output.
DP Table (dp): This table contains the output data generated by a differentially private mechanism.
Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.
The algorithm includes error handling to ensure the validity of the input data. It checks for the existence and numeric type of the
measure_column
.The algorithm performs an inner join between the DP and baseline tables based on
join_columns
to produce the combined dataframe. This join must be one-to-one, with each row in the DP table matches exactly one row in the baseline table, and vice versa. This ensures that there is a direct correspondence between the DP and baseline outputs for each entity, allowing for accurate comparison.After performing the join, the algorithm computes the relative error for each group. Relative error is calculated as the absolute difference between the corresponding values in the DP and baseline outputs to the value in the baseline using the formula \(abs(dp - baseline) / baseline\). If baseline is zero, it returns infinity for non-zero differences (\(∞\)) and zero for zero differences (\(0\)).
Next, the algorithm filters the relative error dataframe to include only those data points where the relative error exceeds a specified threshold (
relative_error_threshold
). This threshold represents the maximum allowable relative error for a data point to be considered within acceptable bounds.Finally, the algorithm then counts the number of rows exceeding the threshold..
The algorithm handles cases where the resulting dataframe after relative error computation is empty (i.e., it contains no data points), returning a NaN (not a number) value in such scenarios.
Note
Provided algorithm assumes a one-to-one join scenario.
Nulls in the measure columns are dropped because the metric cannot handle null values, and the absolute error computation requires valid numeric values in both columns.
Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [50, 110, 100] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [100, 100, 100] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = HighRelativeErrorCount( ... measure_column="X", ... relative_error_threshold=0.25, ... join_columns=["A"] ... ) >>> metric.relative_error_threshold 0.25 >>> metric.join_columns ['A'] >>> result = metric(dp_outputs, baseline_outputs).value >>> result 1
# Returns the relative error threshold.
Returns the name of the join columns.
Returns the name of the indicator column.
Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Computes quantile relative error value from grouped dataframe.
Return the required parameters to the metric function.
Check that a particular set of outputs is compatible with the metric.
Check if the join keys uniquely identify rows in the joined DataFrame.
Returns values for the function’s parameters.
Determines if the given inputs are empty.
Computes the given metric on the given DP and baseline outputs.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Check that the outputs have all the structure the metric expects.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property indicator_column_name: str | None#
Returns the name of the indicator column.
- Return type:
Optional[str]
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(relative_error_threshold, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#
Constructor.
- Parameters:
relative_error_threshold (
float
) – The threshold for the relative error.measure_column (
str
) – The column to compute relative error over.grouping_columns (
Optional
[List
[str
]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).
- compute_high_re(joined_output, result_column_name)#
Computes quantile relative error value from grouped dataframe.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular set of outputs is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_join_key_uniqueness(joined_output)#
Check if the join keys uniquely identify rows in the joined DataFrame.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the given inputs are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class SpuriousRate(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Bases:
tmlt.analytics.metrics._base.JoinedOutputMetric
Computes the fraction of groups in the DP output but not in the baseline output.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
Note
Below, released means that the group is in the DP output, and spurious means that the group is not in the baseline output.
How it works:
The algorithm operates on a single table, which must appear in both the DP and baseline outputs. It joins the DP version of that table to the baseline version of the table, and notes for each row whether it appears in the DP version, the baseline version, or both.
After performing the join, the algorithm computes the spurious rate by dividing the spurious released count by the total count of released data points (released_count), using the formula \(\text{spurious released count} / \text{released count}\). The result represents the proportion of released data points in the DP output that have no corresponding data points in the baseline output.
Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3", "c"], ... "X": [50, 110, 100, 50] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3", "b"], ... "X": [100, 100, 100, 50] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = SpuriousRate( ... join_columns=["A"] ... ) >>> metric.join_columns ['A'] >>> metric(dp_outputs, baseline_outputs).value 0.25
# Returns the name of the join columns.
Returns the name of the indicator column.
Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Computes spurious rate given DP and baseline outputs.
Return the required parameters to the metric function.
Check that a particular set of outputs is compatible with the metric.
Check if the join keys uniquely identify rows in the joined DataFrame.
Returns values for the function’s parameters.
Determines if the given inputs are empty.
Computes the given metric on the given DP and baseline outputs.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Check that the outputs have all the structure the metric expects.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property indicator_column_name: str | None#
Returns the name of the indicator column.
- Return type:
Optional[str]
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Constructor.
- Parameters:
baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – The output to compute the spurious rate for. If None, the tuner must have a single output (which will be used).grouping_columns (
Optional
[List
[str
]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.
- compute_spurious_rate(joined_output, result_column_name)#
Computes spurious rate given DP and baseline outputs.
- Parameters:
joined_output (pyspark.sql.DataFrame)
result_column_name (str)
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular set of outputs is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_join_key_uniqueness(joined_output)#
Check if the join keys uniquely identify rows in the joined DataFrame.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the given inputs are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class SpuriousCount(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Bases:
tmlt.analytics.metrics._base.JoinedOutputMetric
Computes the number of groups in the DP output but not in the baseline output.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
Note
Below, released means that the group is in the DP output, and spurious means that the group is not in the baseline output.
How it works:
The algorithm operates on a single table, which must appear in both the DP and baseline outputs. It joins the DP version of that table to the baseline version of the table, and notes for each row whether it appears in the DP version, the baseline version, or both.
After performing the join, the algorithm counts the number of groups that appear only in the DP output (not the baseline), and returns that count.
Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3", "c"], ... "X": [50, 110, 100, 50] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3", "b"], ... "X": [100, 100, 100, 50] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = SpuriousCount( ... join_columns=["A"] ... ) >>> metric.join_columns ['A'] >>> metric(dp_outputs, baseline_outputs).value 1
# Returns the name of the join columns.
Returns the name of the indicator column.
Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Computes spurious count given DP and baseline outputs.
Return the required parameters to the metric function.
Check that a particular set of outputs is compatible with the metric.
Check if the join keys uniquely identify rows in the joined DataFrame.
Returns values for the function’s parameters.
Determines if the given inputs are empty.
Computes the given metric on the given DP and baseline outputs.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Check that the outputs have all the structure the metric expects.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property indicator_column_name: str | None#
Returns the name of the indicator column.
- Return type:
Optional[str]
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Constructor.
- Parameters:
baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – The output to compute the spurious count for. If None, the tuner must have a single output (which will be used).grouping_columns (
Optional
[List
[str
]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.
- count_spurious_rows(joined_output, result_column_name)#
Computes spurious count given DP and baseline outputs.
- Parameters:
joined_output (pyspark.sql.DataFrame)
result_column_name (str)
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular set of outputs is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_join_key_uniqueness(joined_output)#
Check if the join keys uniquely identify rows in the joined DataFrame.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the given inputs are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class SuppressionRate(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Bases:
tmlt.analytics.metrics._base.JoinedOutputMetric
Computes the fraction of groups in the baseline output but not in the DP output.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
How it works:
The algorithm operates on a single table, which must appear in both the DP and baseline outputs. It joins the DP version of that table to the baseline version of the table, and notes for each row whether it appears in the DP version, the baseline version, or both.
After performing the join, the algorithm computes the suppression rate by dividing the count of rows that appear in the baseline but not the dp output by the count of released rows, using the formula \(\text{spurious released count} / \text{released count}\). The result represents the proportion of real, unreleased data points in the baseline output that have no corresponding data points in the DP output.
Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3", "c"], ... "X": [50, 110, 100, 50] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3", "b"], ... "X": [100, 100, 100, 50] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = SuppressionRate( ... join_columns=["A"] ... ) >>> metric.join_columns ['A'] >>> metric(dp_outputs, baseline_outputs).value 0.25
# Returns the name of the join columns.
Returns the name of the indicator column.
Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Computes suppression rate given DP and baseline outputs.
Return the required parameters to the metric function.
Check that a particular set of outputs is compatible with the metric.
Check if the join keys uniquely identify rows in the joined DataFrame.
Returns values for the function’s parameters.
Determines if the given inputs are empty.
Computes the given metric on the given DP and baseline outputs.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Check that the outputs have all the structure the metric expects.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property indicator_column_name: str | None#
Returns the name of the indicator column.
- Return type:
Optional[str]
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Constructor.
- Parameters:
baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – Which output to compute the suppression rate for. If None, the tuner must have a single output (which will be used).grouping_columns (
Optional
[List
[str
]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.
- compute_suppression_rate(joined_output, result_column_name)#
Computes suppression rate given DP and baseline outputs.
- Parameters:
joined_output (pyspark.sql.DataFrame)
result_column_name (str)
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular set of outputs is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_join_key_uniqueness(joined_output)#
Check if the join keys uniquely identify rows in the joined DataFrame.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the given inputs are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class SuppressionCount(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Bases:
tmlt.analytics.metrics._base.JoinedOutputMetric
Computes the count of groups in the baseline output but not in the DP output.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
How it works:
The algorithm operates on a single table, which must appear in both the DP and baseline outputs. It joins the DP version of that table to the baseline version of the table, and notes for each row whether it appears in the DP version, the baseline version, or both.
After performing the join, the algorithm computes the suppression rate by counting the rows that appear in the baseline but not the dp output. The result represents the count of real, unreleased data points in the baseline output that have no corresponding data points in the DP output.
Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3", "c"], ... "X": [50, 110, 100, 50] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3", "b"], ... "X": [100, 100, 100, 50] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = SuppressionCount( ... join_columns=["A"] ... ) >>> metric.join_columns ['A'] >>> metric(dp_outputs, baseline_outputs).value 1
# Returns the name of the join columns.
Returns the name of the indicator column.
Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Counts the number of suppressed rows given DP and baseline outputs.
Return the required parameters to the metric function.
Check that a particular set of outputs is compatible with the metric.
Check if the join keys uniquely identify rows in the joined DataFrame.
Returns values for the function’s parameters.
Determines if the given inputs are empty.
Computes the given metric on the given DP and baseline outputs.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Check that the outputs have all the structure the metric expects.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property indicator_column_name: str | None#
Returns the name of the indicator column.
- Return type:
Optional[str]
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Constructor.
- Parameters:
baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – Which output to compute the suppression rate for. If None, the tuner must have a single output (which will be used).grouping_columns (
Optional
[List
[str
]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.
- count_suppressed_rows(joined_output, result_column_name)#
Counts the number of suppressed rows given DP and baseline outputs.
- Parameters:
joined_output (pyspark.sql.DataFrame)
result_column_name (str)
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular set of outputs is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_join_key_uniqueness(joined_output)#
Check if the join keys uniquely identify rows in the joined DataFrame.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the given inputs are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class Metric(name, func, description=None, grouping_columns=None, measure_column=None, empty_value=None)#
A generic metric defined using a function.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
This function (the
func
argument) must have the following parameters:dp_outputs
: a dictionary of DataFrames containing the program’s outputs.baseline_outputs
: a dictionary mapping baseline names to dictionaries of output DataFrames.
It may also have the following optional parameters:
result_column_name
: if the function returns a DataFrame, the metric results should be in a column with this nameunprotected_inputs
: A dictionary containing the program’s unprotected inputs.parameters
: A dictionary containing the program’s parameters.
If the metric does not have grouping columns, the function must return a numeric value, a boolean, or a string. If the metric has grouping columns, then it must return a DataFrame. This DataFrame should contain the grouping columns, and exactly one additional column containing the metric value for each group. This column’s type should be numeric, boolean, or string.
Example
>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]})) >>> dp_outputs = {"O": dp_df} >>> baseline_df1 = spark.createDataFrame(pd.DataFrame({"A": [5]})) >>> baseline_df2 = spark.createDataFrame(pd.DataFrame({"A": [6]})) >>> baseline_outputs = { ... "baseline1": {"O": baseline_df1}, "baseline2": {"O": baseline_df2} ... } >>> def size_difference(dp_outputs, baseline_outputs): ... baseline_count = baseline_outputs["baseline1"]["O"].count() ... return abs(baseline_count - dp_outputs["O"].count())
>>> metric = Metric( ... func=size_difference, ... name="Custom Metric", ... description="Custom Description", ... ) >>> result = metric(dp_outputs, baseline_outputs) >>> result.value 0
- Parameters:
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(name, func, description=None, grouping_columns=None, measure_column=None, empty_value=None)#
Constructor.
- Parameters:
name (
str
) – A name for the metric.func (
Union
[Callable
,staticmethod
]) – The function that calculates the metric result. See the docstring forMetric
for detail on the allowed input/output types of this function.grouping_columns (
Optional
[List
[str
]]) – If specified, the metric should group the outputs by the given columns, and calculate the metric for each group.measure_column (
Optional
[str
]) – If specified, the column in the outputs to measure.empty_value (
Optional
[Any
]) – If all dp and baseline outputs are empty, the metric will return this value.
- required_func_parameters()#
Return the required parameters to the metric function.
- optional_func_parameters()#
Return the optional parameters to the metric function.
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- metric_function_inputs_empty(function_params)#
Determines if the inputs to the metric function are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- class SingleOutputMetric(name, func, description=None, baseline=None, output=None, grouping_columns=None, measure_column=None, empty_value=None)#
Bases:
Metric
A metric computed from a single output table, defined using a function.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
This metric is defined using a function
func
. This function must have the following parameters:dp_output
: the chosen DP output DataFrame.baseline_output
: the chosen baseline output DataFrame.
It may also have the following optional parameters:
result_column_name
: if the function returns a DataFrame, the metric results should be in a column with this nameunprotected_inputs
: A dictionary containing the program’s unprotected inputs.parameters
: A dictionary containing the program’s parameters.
If the metric does not have grouping columns, the function must return a numeric value, a boolean, or a string. If the metric has grouping columns, then it must return a DataFrame. This DataFrame should contain the grouping columns, and exactly one additional column containing the metric value for each group. This column’s type should be numeric, boolean, or string.
Example
>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]})) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame(pd.DataFrame({"A": [5]})) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> def size_difference(dp_output: DataFrame, baseline_output: DataFrame): ... return baseline_output.count() - dp_output.count()
>>> metric = SingleOutputMetric( ... func=size_difference, ... name="Output size difference", ... description="Difference in number of rows.", ... ) >>> result = metric(dp_outputs, baseline_outputs).value >>> result 0
# Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Return the required parameters to the metric function.
Check that a particular output is compatible with the metric.
Check that the outputs have all the structure the metric expects.
Returns values for the function’s parameters.
Determines if the inputs to the metric function are empty.
Computes the given metric on the given DP and baseline outputs.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(name, func, description=None, baseline=None, output=None, grouping_columns=None, measure_column=None, empty_value=None)#
Constructor.
- Parameters:
name (
str
) – A name for the metric.func (
Union
[Callable
,staticmethod
]) – The function that calculates the metric result. See the docstring forSingleOutputMetric
for detail on the allowed input/output types of this function.baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).grouping_columns (
Optional
[List
[str
]]) – If specified, the metric should group the outputs by the given columns, and calculate the metric for each group.measure_column (
Optional
[str
]) – If specified, the column in the outputs to measure.empty_value (
Optional
[Any
]) – If all dp and baseline outputs are empty, the metric will return this value.
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular output is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the inputs to the metric function are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class JoinedOutputMetric(name, func, join_columns, description=None, baseline=None, output=None, grouping_columns=None, measure_column=None, empty_value=None, join_how='inner', dropna_columns=None, indicator_column_name=None)#
Bases:
SingleOutputMetric
A metric computed from a join between a single DP and baseline output.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
The metric is defined using a function
func
. This function must have the following parameters:joined_output
: A DataFrame created by joining the selected DP and baseline outputs.
It may also have the following optional parameters:
result_column_name
: if the function returns a dataframe, the metric results should be in a column with this nameunprotected_inputs
: A dictionary containing the program’s unprotected inputs.parameters
: A dictionary containing the program’s parameters.
If the metric does not have grouping columns, the function must return a numeric value, a boolean, or a string. If the metric has grouping columns, then it must return a DataFrame. This DataFrame should contain the grouping columns, and exactly one additional column containing the metric value for each group. This column’s type should be numeric, boolean, or string.
Example
>>> dp_df = spark.createDataFrame(pd.DataFrame([{"A": 1, "B": "a"}])) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame(pd.DataFrame([{"A": 5}])) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> def size_difference(joined_output: DataFrame, ... result_column_name: str): ... in_dp = (col("indicator") == "both") | (col("indicator") == "dp") ... in_baseline = ((col("indicator") == "both") | ... (col("indicator") == "baseline")) ... dp_count = sf.sum(sf.when(in_dp, sf.lit(1)).otherwise(0)) ... baseline_count = sf.sum(sf.when(in_baseline, sf.lit(1)).otherwise(0)) ... size_difference = joined_output.agg( ... sf.abs(dp_count - baseline_count).alias(result_column_name) ... ) ... return size_difference.head(1)[0][result_column_name]
>>> metric = JoinedOutputMetric( ... func=size_difference, ... name="Output size difference", ... description="Difference in number of rows.", ... join_columns=["A"], ... join_how="outer", ... indicator_column_name="indicator", ... ) >>> result = metric(dp_outputs, baseline_outputs).value >>> result 0
# Returns the name of the join columns.
Returns the name of the indicator column.
Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Return the required parameters to the metric function.
Check that a particular set of outputs is compatible with the metric.
Check if the join keys uniquely identify rows in the joined DataFrame.
Returns values for the function’s parameters.
Determines if the given inputs are empty.
Computes the given metric on the given DP and baseline outputs.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Check that the outputs have all the structure the metric expects.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
name (str)
func (Union[Callable, staticmethod])
join_columns (List[str])
description (Optional[str])
baseline (Optional[str])
output (Optional[str])
grouping_columns (Optional[List[str]])
measure_column (Optional[str])
empty_value (Optional[Any])
join_how (str)
dropna_columns (Optional[List[str]])
indicator_column_name (Optional[str])
- property indicator_column_name: str | None#
Returns the name of the indicator column.
- Return type:
Optional[str]
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(name, func, join_columns, description=None, baseline=None, output=None, grouping_columns=None, measure_column=None, empty_value=None, join_how='inner', dropna_columns=None, indicator_column_name=None)#
Constructor.
- Parameters:
name (
str
) – A name for the metric.func (
Union
[Callable
,staticmethod
]) – The function that calculates the metric result. See the docstring forJoinedOutputMetric
for detail on the allowed input/output types of this function.baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).grouping_columns (
Optional
[List
[str
]]) – If specified, the metric should group the outputs by the given columns, and calculate the metric for each group.measure_column (
Optional
[str
]) – If specified, the column in the outputs to measure.empty_value (
Optional
[Any
]) – If all dp and baseline outputs are empty, the metric will return this value.join_how (
str
) – The type of join to perform. Must be one of “left”, “right”, “inner”, “outer”. Defaults to “inner”.dropna_columns (
Optional
[List
[str
]]) – If specified, rows with nulls in these columns will be dropped.indicator_column_name (
Optional
[str
]) – If specified, we will add a column with the specified name to the joined data that contains either “dp”, “baseline”, or “both” to indicate where the values in the row came from.
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular set of outputs is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_join_key_uniqueness(joined_output)#
Check if the join keys uniquely identify rows in the joined DataFrame.
- Parameters:
joined_output (pyspark.sql.DataFrame)
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the given inputs are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class JoinedOutputMetricResult#
Bases:
SingleOutputMetricResult
The output of a
JoinedOutputMetric
with additional metadata.Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
- value: Any#
The value of the metric applied to the program outputs.
- format_as_summary_row()#
Return a table row summarizing the metric result.
- Return type:
- result_column_name()#
Returns the name of the column containing the metric results.
Only relevant if value is a DataFrame.
- Return type:
- format_as_dataframe()#
Returns the results of this metric formatted as a DataFrame.
- Return type:
- class MetricResult#
An output of a
Metric
with additional metadata.Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
- value: Any#
The value of the metric applied to the program outputs.
- format_as_summary_row()#
Return a table row summarizing the metric result.
- Return type:
- format_as_dataframe()#
Returns the results of this metric formatted as a DataFrame.
- Return type:
- class SingleOutputMetricResult#
Bases:
MetricResult
The output of a
SingleOutputMetric
with additional metadata.Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
- value: Any#
The value of the metric applied to the program outputs.
- format_as_summary_row()#
Return a table row summarizing the metric result.
- Return type:
- result_column_name()#
Returns the name of the column containing the metric results.
Only relevant if value is a DataFrame.
- Return type:
- format_as_dataframe()#
Returns the results of this metric formatted as a DataFrame.
- Return type:
- class CountBaselineRows(*, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Bases:
tmlt.analytics.metrics._base.SingleOutputMetric
Returns the number of rows in the baseline output.
If grouped, will return a count for every group that appears in either the DP or baseline output.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [50, 110, 100] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3", "b"], ... "X": [100, 100, 100, 50] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = CountBaselineRows() >>> metric(dp_outputs, baseline_outputs).value 4
# Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Counts the number of released rows.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Return the required parameters to the metric function.
Check that a particular output is compatible with the metric.
Check that the outputs have all the structure the metric expects.
Returns values for the function’s parameters.
Determines if the inputs to the metric function are empty.
Computes the given metric on the given DP and baseline outputs.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(*, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Constructor.
- Parameters:
baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – The output to compute the spurious rate for. If None, the tuner must have a single output (which will be used).grouping_columns (
Optional
[List
[str
]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.
- count_baseline_rows(dp_output, baseline_output, result_column_name)#
Counts the number of released rows.
- Parameters:
dp_output (pyspark.sql.DataFrame)
baseline_output (pyspark.sql.DataFrame)
result_column_name (str)
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular output is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the inputs to the metric function are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- class CountReleasedRows(*, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Bases:
tmlt.analytics.metrics._base.SingleOutputMetric
Returns the number of rows released in the DP output.
If grouped, will return a count for every group that appears in either the DP or baseline output.
Note
This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.
Example
>>> dp_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3"], ... "X": [50, 110, 100] ... } ... ) ... ) >>> dp_outputs = {"O": dp_df} >>> baseline_df = spark.createDataFrame( ... pd.DataFrame( ... { ... "A": ["a1", "a2", "a3", "b"], ... "X": [100, 100, 100, 50] ... } ... ) ... ) >>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = CountReleasedRows() >>> metric(dp_outputs, baseline_outputs).value 3
# Return the name of the baseline specified in the constructor (if any).
Return the name of the output specified in the constructor (if any).
Returns the name of the metric.
Returns the description of the metric.
Returns function to be applied.
Returns the grouping columns.
Returns the measure column (if any).
The value this metric will return when inputs are empty.
# Counts the number of released rows.
Returns the name of the single baseline this metric will be applied to.
Returns the name of the single output the metric will be applied to.
Get the result column name for a given set of outputs.
Return the required parameters to the metric function.
Check that a particular output is compatible with the metric.
Check that the outputs have all the structure the metric expects.
Returns values for the function’s parameters.
Determines if the inputs to the metric function are empty.
Computes the given metric on the given DP and baseline outputs.
Return the optional parameters to the metric function.
Check that the metric result is an allowed type.
- Parameters:
- property baseline: str | None#
Return the name of the baseline specified in the constructor (if any).
- Return type:
Optional[str]
- property output: str | None#
Return the name of the output specified in the constructor (if any).
- Return type:
Optional[str]
- property func: Callable#
Returns function to be applied.
- Return type:
Callable
- property measure_column: str | None#
Returns the measure column (if any).
- Return type:
Optional[str]
- property empty_value: Any#
The value this metric will return when inputs are empty.
- Return type:
Any
- __init__(*, name=None, description=None, baseline=None, output=None, grouping_columns=None)#
Constructor.
- Parameters:
baseline (
Optional
[str
]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).output (
Optional
[str
]) – The output to compute the spurious rate for. If None, the tuner must have a single output (which will be used).grouping_columns (
Optional
[List
[str
]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.
- count_released_rows(dp_output, baseline_output, result_column_name)#
Counts the number of released rows.
- Parameters:
dp_output (pyspark.sql.DataFrame)
baseline_output (pyspark.sql.DataFrame)
result_column_name (str)
- get_baseline(baseline_outputs)#
Returns the name of the single baseline this metric will be applied to.
- Return type:
- get_output(outputs)#
Returns the name of the single output the metric will be applied to.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
- Return type:
- get_column_name_from_baselines(baseline_outputs)#
Get the result column name for a given set of outputs.
- required_func_parameters()#
Return the required parameters to the metric function.
- check_compatibility_with_outputs(outputs, output_type)#
Check that a particular output is compatible with the metric.
Should throw a ValueError if the metric is not compatible.
- Parameters:
outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
output_type (str)
- check_compatibility_with_data(dp_outputs, baseline_outputs)#
Check that the outputs have all the structure the metric expects.
Should throw a ValueError if the metric is not compatible.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])
- get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#
Returns values for the function’s parameters.
- Return type:
Dict[str, Any]
- metric_function_inputs_empty(function_params)#
Determines if the inputs to the metric function are empty.
- __call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#
Computes the given metric on the given DP and baseline outputs.
- Parameters:
dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.
- Return type:
- optional_func_parameters()#
Return the optional parameters to the metric function.
- validate_result(result, baseline_outputs)#
Check that the metric result is an allowed type.
- Parameters:
result (Any)
baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]])