Measuring accuracy#

Note

The features described in this page are only available on a paid version of the Tumult Platform. If you would like to hear more, please contact us at info@tmlt.io.

Metrics are used as part of a SessionProgramTuner to evaluate the accuracy of differentially private programs.

A number of pre-built metrics are provided, for example QuantileAbsoluteError, HighRelativeErrorRate, and SpuriousRate. Users can also define their own custom metrics using JoinedOutputMetric, SingleOutputMetric, or Metric.

Suppose we have a SessionProgram that has one protected input and produces one output that is a count of the number of rows in the protected input.

>>> class MinimalProgram(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame  # DataFrame type annotation is required
...     class Outputs:
...         count_per_a: DataFrame  # required here too
...     def session_interaction(self, session: Session):
...         a_keyset = KeySet.from_dict({"A": [1, 2, 3, 4]})
...         count_query = QueryBuilder("protected_df").groupby(a_keyset).count()
...         budget = self.privacy_budget  #  session.remaining_privacy_budget also works
...         count_per_a = session.evaluate(count_query, budget)
...         return {"count_per_a": count_per_a}

We can pass this information to the SessionProgramTuner class, which is what gives us access to error reports.

We can measure the error of the program by comparing the program output to a baseline, which is usually chosen to be the true answer to each query. Suppose we want to use a built-in metric: the median absolute error MedianAbsoluteError and a custom metric: root mean squared error. We need to include the built-in metric in the metrics class variable, and define our custom metric using a decorated method.

>>> protected_df = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]}))

>>> class Tuner(SessionProgramTuner, program=MinimalProgram):
...     metrics = [
...         MedianAbsoluteError(output="count_per_a", measure_column="count", join_columns=["A"]),
...     ]
...
...     @joined_output_metric(name="root_mean_squared_error",
...             output="count_per_a",
...             join_columns=["A"],
...             description="Root mean squared error for column count of count_per_a")
...     def compute_rmse(
...         joined_output: DataFrame,
...         result_column_name: str,
...     ):
...         err = sf.col("count_dp") - sf.col("count_baseline")
...         rmse = joined_output.agg(
...             sf.sqrt(sf.avg(sf.pow(err, sf.lit(2)))).alias(result_column_name))
...         return rmse.head(1)[0][result_column_name]
>>> tuner = (
...    Tuner.Builder()
...    .with_privacy_budget(PureDPBudget(epsilon=1))
...    .with_private_dataframe("protected_df", protected_df, AddOneRow())
...    .build()
... )

Now that our SessionProgramTuner is initialized, we can get our very first error report by calling the error_report() method.

>>> error_report = tuner.error_report()
>>> error_report.dp_outputs["count_per_a"].show()  
+---+-----+
|  A|count|
+---+-----+
|  1|    2|
|  2|    2|
|  3|    5|
|  4|    3|
+---+-----+
>>> error_report.baseline_outputs["default"]["count_per_a"].show()  
+---+-----+
|  A|count|
+---+-----+
|  1|    1|
|  2|    2|
|  3|    3|
|  4|    4|
+---+-----+
>>> error_report.show()  
Error report ran with budget PureDPBudget(epsilon=1) and no tunable parameters and no additional parameters

Metric results:
+---------+-------------------------+------------+-------------------------------------------------------------+
|   Value | Metric                  | Baseline   | Description                                                 |
+=========+=========================+============+=============================================================+
|       0 | mae                     | default    | Median absolute error for column count of table count_per_a |
+---------+-------------------------+------------+-------------------------------------------------------------+
|       0 | root_mean_squared_error | default    | Root mean squared error                                     |
+---------+-------------------------+------------+-------------------------------------------------------------+

More illustrated examples of how to define and use metrics can be found in the Basics of error measurement and Specifying error metrics tutorials.

Built-in metrics#

Built-in metrics implement commonly-used ways to measure accuracy. They only require users to specify their constructor arguments to be used in a SessionProgramTuner.

`QuantileAbsoluteError`(quantile, ...[, ...])	Computes the quantile of the empirical absolute error.
`MedianAbsoluteError`(measure_column, join_columns)	Computes the median absolute error.
`QuantileRelativeError`(quantile, ...[, ...])	Computes the quantile of the empirical relative error.
`MedianRelativeError`(measure_column, join_columns)	Computes the median relative error.
`HighRelativeErrorRate`(...[, ...])	Computes the fraction of values whose relative error is above a fixed threshold.
`HighRelativeErrorCount`(...[, ...])	Computes the number of values whose relative error is above a fixed threshold.
`SpuriousRate`(join_columns, *[, name, ...])	Computes the fraction of values in the DP output but not in the baseline output.
`SpuriousCount`(join_columns, *[, name, ...])	Computes the number of values in the DP output but not in the baseline output.
`SuppressionRate`(join_columns, *[, name, ...])	Computes the fraction of values in the baseline output but not in the DP output.
`SuppressionCount`(join_columns, *[, name, ...])	Computes the count of values in the baseline output but not in the DP output.
`CountDPRows`(*[, name, description, ...])	Computes the number of rows in the DP output.
`CountBaselineRows`(*[, name, description, ...])	Computes the number of rows in the baseline output.

Custom metrics#

Custom metrics allows users to implement arbitrary error measurement logic. They can be specified in a SessionProgramTuner using decorated methods, or the metrics class variable using custom metric classes.

`metric`(name[, description, ...])	Decorator to define a generic `Metric`.
`single_output_metric`(name[, description, ...])	Decorator to define a custom `SingleOutputMetric`.
`joined_output_metric`(name, join_columns[, ...])	Decorator to define a custom `JoinedOutputMetric`.
`Metric`(name, func[, description, ...])	A generic metric defined using a function.
`SingleOutputMetric`(name, func[, ...])	A metric computed from a single output table, defined using a function.
`JoinedOutputMetric`(name, func, join_columns)	A metric computed from a join between a single DP and baseline output.

Metric results#

These classes, common between built-in metrics and custom metrics, contain the results obtained when computing a metric on a single run of the program.

`MetricResult`	The output of a `Metric` with additional metadata.
`SingleOutputMetricResult`	The output of a `SingleOutputMetric` with additional metadata.
`JoinedOutputMetricResult`	The output of a `JoinedOutputMetric` with additional metadata.