Measuring accuracy#
Note
The features described in this page are only available on a paid version of the Tumult Platform. If you would like to hear more, please contact us at info@tmlt.io.
Metrics are used as part of a SessionProgramTuner
to evaluate the
accuracy of differentially private programs.
A number of pre-built metrics are provided, for example
QuantileAbsoluteError
,
HighRelativeErrorRate
, and
SpuriousRate
. Users can also define their own custom
metrics using JoinedOutputMetric
,
SingleOutputMetric
, or Metric
.
Suppose we have a SessionProgram
that has one protected input
and produces one output that is a count of the number of rows in the protected input.
>>> class MinimalProgram(SessionProgram):
... class ProtectedInputs:
... protected_df: DataFrame # DataFrame type annotation is required
... class Outputs:
... count_per_a: DataFrame # required here too
... def session_interaction(self, session: Session):
... a_keyset = KeySet.from_dict({"A": [1, 2, 3, 4]})
... count_query = QueryBuilder("protected_df").groupby(a_keyset).count()
... budget = self.privacy_budget # session.remaining_privacy_budget also works
... count_per_a = session.evaluate(count_query, budget)
... return {"count_per_a": count_per_a}
We can pass this information to the SessionProgramTuner
class, which
is what gives us access to error reports.
We can measure the error of the program by comparing the program output to a baseline,
which is usually chosen to be the true answer to each query. Suppose we want to use a
built-in metric: the median absolute error MedianAbsoluteError
and a
custom metric: root mean squared error. We need to include the built-in metric in the
metrics
class variable, and define our custom
metric using a decorated method.
>>> protected_df = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]}))
>>> class Tuner(SessionProgramTuner, program=MinimalProgram):
... metrics = [
... MedianAbsoluteError(output="count_per_a", measure_column="count", join_columns=["A"]),
... ]
...
... @joined_output_metric(name="root_mean_squared_error",
... output="count_per_a",
... join_columns=["A"],
... description="Root mean squared error for column count of count_per_a")
... def compute_rmse(
... joined_output: DataFrame,
... result_column_name: str,
... ):
... err = sf.col("count_dp") - sf.col("count_baseline")
... rmse = joined_output.agg(
... sf.sqrt(sf.avg(sf.pow(err, sf.lit(2)))).alias(result_column_name))
... return rmse.head(1)[0][result_column_name]
>>> tuner = (
... Tuner.Builder()
... .with_privacy_budget(PureDPBudget(epsilon=1))
... .with_private_dataframe("protected_df", protected_df, AddOneRow())
... .build()
... )
Now that our SessionProgramTuner is initialized, we can get our very first error
report by calling the error_report()
method.
>>> error_report = tuner.error_report()
>>> error_report.dp_outputs["count_per_a"].show()
+---+-----+
| A|count|
+---+-----+
| 1| 2|
| 2| 2|
| 3| 5|
| 4| 3|
+---+-----+
>>> error_report.baseline_outputs["default"]["count_per_a"].show()
+---+-----+
| A|count|
+---+-----+
| 1| 1|
| 2| 2|
| 3| 3|
| 4| 4|
+---+-----+
>>> error_report.show()
Error report ran with budget PureDPBudget(epsilon=1) and no tunable parameters and no additional parameters
Metric results:
+---------+-------------------------+------------+-------------------------------------------------------------+
| Value | Metric | Baseline | Description |
+=========+=========================+============+=============================================================+
| 0 | mae | default | Median absolute error for column count of table count_per_a |
+---------+-------------------------+------------+-------------------------------------------------------------+
| 0 | root_mean_squared_error | default | Root mean squared error |
+---------+-------------------------+------------+-------------------------------------------------------------+
More illustrated examples of how to define and use metrics can be found in the Basics of error measurement and Specifying error metrics tutorials.
Built-in metrics#
Built-in metrics implement commonly-used ways to measure accuracy. They only
require users to specify their constructor arguments to be used in a
SessionProgramTuner
.
|
Computes the quantile of the empirical absolute error. |
|
Computes the median absolute error. |
|
Computes the quantile of the empirical relative error. |
|
Computes the median relative error. |
|
Computes the fraction of values whose relative error is above a fixed threshold. |
|
Computes the number of values whose relative error is above a fixed threshold. |
|
Computes the fraction of values in the DP output but not in the baseline output. |
|
Computes the number of values in the DP output but not in the baseline output. |
|
Computes the fraction of values in the baseline output but not in the DP output. |
|
Computes the count of values in the baseline output but not in the DP output. |
|
Computes the number of rows in the DP output. |
|
Computes the number of rows in the baseline output. |
Custom metrics#
Custom metrics allows users to implement arbitrary error measurement logic. They can be
specified in a SessionProgramTuner
using decorated methods, or the
metrics
class variable using custom metric
classes.
|
Decorator to define a generic |
|
Decorator to define a custom |
|
Decorator to define a custom |
|
A generic metric defined using a function. |
|
A metric computed from a single output table, defined using a function. |
|
A metric computed from a join between a single DP and baseline output. |
Metric results#
These classes, common between built-in metrics and custom metrics, contain the results obtained when computing a metric on a single run of the program.
The output of a |
|
The output of a |
|
The output of a |