metrics#

This is only applicable to Analytics Pro. Metrics to measure the quality of program outputs.

Metrics are to be used in the context of program evaluation and error reporting, where the goal is to compare the outputs of a program to a set of expected outputs to determine the error. They are designed to be flexible and extensible, allowing users to combine them in various ways and define their own metrics.

A number of pre-built metrics are provided, for example QuantileAbsoluteError, HighRelativeErrorFraction, and SpuriousRate. Users can also define their own custom metrics using JoinedOutputMetric, SingleOutputMetric, or Metric.

Suppose we have a SessionProgram that has one protected input and produces one output that is a count of the number of rows in the protected input.

>>> class MinimalProgram(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame  # DataFrame type annotation is required
...     class Outputs:
...         count_per_a: DataFrame  # required here too
...     def session_interaction(self, session: Session):
...         a_keyset = KeySet.from_dict({"A": [1, 2, 3, 4]})
...         count_query = QueryBuilder("protected_df").groupby(a_keyset).count()
...         budget = self.privacy_budget  #  session.remaining_privacy_budget also works
...         count_per_a = session.evaluate(count_query, budget)
...         return {"count_per_a": count_per_a}

We can pass this information to the SessionProgramTuner class, which is what gives us access to error reports.

We can measure the error of the program by comparing the program output to the expected output. Suppose we want to use a built-in metric: the median absolute error MedianAbsoluteError and a custom metric: root mean squared error. We need to instantiate the metrics and include them in the list of metric associated with the metrics class variable.

>>> protected_df = spark.createDataFrame(pd.DataFrame({"A": [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]}))
>>> class Tuner(SessionProgramTuner, program=MinimalProgram):
...     @joined_output_metric(name="root_mean_squared_error",
...             output="count_per_a",
...             join_columns=["A"],
...             description="Root mean squared error for column count of count_per_a")
...     def compute_rmse(
...         joined_output: DataFrame,
...         result_column_name: str,
...     ):
...         err = sf.col("count_dp") - sf.col("count_baseline")
...         rmse = joined_output.agg(
...             sf.sqrt(sf.avg(sf.pow(err, sf.lit(2)))).alias(result_column_name))
...         return rmse.head(1)[0][result_column_name]
...
...     metrics = [
...         MedianAbsoluteError(output="count_per_a", measure_column="count", join_columns=["A"]),
...     ]
>>> tuner = (
...    Tuner.Builder()
...    .with_privacy_budget(PureDPBudget(epsilon=1))
...    .with_private_dataframe("protected_df", protected_df, AddOneRow())
...    .build()
... )

Now that our SessionProgramTuner is initialized, we can get our very first error report by calling the error_report() method.

>>> error_report = tuner.error_report()
>>> error_report.dp_outputs["count_per_a"].show()  
+---+-----+
|  A|count|
+---+-----+
|  1|    2|
|  2|    2|
|  3|    5|
|  4|    3|
+---+-----+
>>> error_report.baseline_outputs["default"]["count_per_a"].show()  
+---+-----+
|  A|count|
+---+-----+
|  1|    1|
|  2|    2|
|  3|    3|
|  4|    4|
+---+-----+
>>> error_report.show()  
Error report ran with budget PureDPBudget(epsilon=1) and no tunable parameters and no additional parameters

Metric results:
Metric results:
+---------+-------------------------+------------+-------------------------------------------------------------+
|   Value | Metric                  | Baseline   | Description                                                 |
+=========+=========================+============+=============================================================+
|       0 | mae                     | default    | Median absolute error for column count of table count_per_a |
+---------+-------------------------+------------+-------------------------------------------------------------+
|       0 | root_mean_squared_error | default    | Root mean squared error                                     |
+---------+-------------------------+------------+-------------------------------------------------------------+

More illustrated examples of how to define and use metrics can be found in the Basics of error measurement and Specifying error metrics tutorials.

Classes#

QuantileAbsoluteError

Computes the quantile of the empirical absolute error.

MedianAbsoluteError

Computes the median absolute error.

QuantileRelativeError

Computes the quantile of the empirical relative error.

MedianRelativeError

Computes the median relative error.

HighRelativeErrorFraction

Computes the fraction of groups with relative error above a threshold.

HighRelativeErrorCount

Computes the count of groups with relative error above a threshold.

SpuriousRate

Computes the fraction of groups in the DP output but not in the baseline output.

SpuriousCount

Computes the number of groups in the DP output but not in the baseline output.

SuppressionRate

Computes the fraction of groups in the baseline output but not in the DP output.

SuppressionCount

Computes the count of groups in the baseline output but not in the DP output.

Metric

A generic metric defined using a function.

SingleOutputMetric

A metric computed from a single output table, defined using a function.

JoinedOutputMetric

A metric computed from a join between a single DP and baseline output.

JoinedOutputMetricResult

The output of a JoinedOutputMetric with additional metadata.

MetricResult

An output of a Metric with additional metadata.

SingleOutputMetricResult

The output of a SingleOutputMetric with additional metadata.

CountBaselineRows

Returns the number of rows in the baseline output.

CountReleasedRows

Returns the number of rows released in the DP output.

class QuantileAbsoluteError(quantile, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Bases: tmlt.analytics.metrics._base.JoinedOutputMetric

Computes the quantile of the empirical absolute error.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

How it works:

  1. The algorithm takes as input two tables: one representing the differentially private (DP) output and the other representing the baseline output.

    DP Table (dp): This table contains the output data generated by a differentially private mechanism.

    Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.

    The algorithm includes error handling to ensure the validity of the input data. It checks for the existence and numeric type of the measure_column.

    The algorithm performs an inner join between the DP and baseline tables based on join_columns. This join must be one-to-one, with each row in the DP table matching exactly one row in the baseline table, and vice versa. This ensures that there is a direct correspondence between the DP and baseline outputs for each entity, allowing for accurate comparison.

  2. After performing the join, the algorithm computes the absolute error for each group. Absolute error is calculated as the absolute difference between the corresponding values in the DP and baseline outputs using the formula \(abs(dp - baseline)\).

  3. The algorithm then calculates the n-th quantile of the absolute error across all groups.

    The algorithm handles cases where the quantile computation may result in an empty column, returning a NaN (not a number) value in such scenarios.

    Note

    • Provided algorithm assumes a one-to-one join scenario.

    • Nulls in the measure columns are dropped because the metric cannot handle null values, and the absolute error computation requires valid numeric values in both columns.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [50, 110, 100]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [100, 100, 100]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = QuantileAbsoluteError(
...     quantile=0.5,
...     measure_column="X",
...     join_columns=["A"]
... )
>>> metric.quantile
0.5
>>> metric.join_columns
['A']
>>> result = metric(dp_outputs, baseline_outputs).value
>>> result
10
Properties#

quantile

Returns the quantile.

join_columns

Returns the name of the join columns.

indicator_column_name

Returns the name of the indicator column.

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

compute_qae()

Computes quantile relative error value from grouped dataframe.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the given inputs are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • quantile (float)

  • measure_column (str)

  • join_columns (List[str])

  • grouping_columns (Optional[List[str]])

  • name (Optional[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

property quantile: float#

Returns the quantile.

Return type:

float

property join_columns: List[str]#

Returns the name of the join columns.

Return type:

List[str]

property indicator_column_name: str | None#

Returns the name of the indicator column.

Return type:

Optional[str]

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(quantile, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Constructor.

Parameters:
  • quantile (float) – The quantile to calculate (between 0 and 1).

  • measure_column (str) – The column to compute the quantile of absolute error over.

  • join_columns (List[str]) – Columns to join on.

  • grouping_columns (Optional[List[str]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

  • name (Optional[str]) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).

compute_qae(joined_output, result_column_name)#

Computes quantile relative error value from grouped dataframe.

Parameters:
required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters:

joined_output (pyspark.sql.DataFrame)

get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the given inputs are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

JoinedOutputMetricResult

get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class MedianAbsoluteError(measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Bases: QuantileAbsoluteError

Computes the median absolute error.

Equivalent to QuantileAbsoluteError with quantile = 0.5.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Example

>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [100, 100, 100]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = MedianAbsoluteError(
...     measure_column="X",
...     join_columns=["A"]
... )
>>> metric.quantile
0.5
>>> metric.join_columns
['A']
>>> result = metric(dp_outputs, baseline_outputs).value
>>> result
10
Properties#

quantile

Returns the quantile.

join_columns

Returns the name of the join columns.

indicator_column_name

Returns the name of the indicator column.

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

compute_qae()

Computes quantile relative error value from grouped dataframe.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the given inputs are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • measure_column (str)

  • join_columns (List[str])

  • grouping_columns (Optional[List[str]])

  • name (Optional[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

property quantile: float#

Returns the quantile.

Return type:

float

property join_columns: List[str]#

Returns the name of the join columns.

Return type:

List[str]

property indicator_column_name: str | None#

Returns the name of the indicator column.

Return type:

Optional[str]

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Constructor.

Parameters:
  • measure_column (str) – The column to compute the quantile of absolute error over.

  • join_columns (List[str]) – Columns to join on.

  • grouping_columns (Optional[List[str]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

  • name (Optional[str]) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).

compute_qae(joined_output, result_column_name)#

Computes quantile relative error value from grouped dataframe.

Parameters:
required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters:

joined_output (pyspark.sql.DataFrame)

get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the given inputs are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

JoinedOutputMetricResult

get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class QuantileRelativeError(quantile, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Bases: tmlt.analytics.metrics._base.JoinedOutputMetric

Computes the quantile of the empirical relative error.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

How it works:

  1. The algorithm takes as input two tables: one representing the differentially private (DP) output and the other representing the baseline output.

    DP Table (dp): This table contains the output data generated by a differentially private mechanism.

    Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.

    The algorithm includes error handling to ensure the validity of the input data. It checks for the existence and numeric type of the measure_column.

    The algorithm performs an inner join between the DP and baseline tables based on join_columns to produce the combined dataframe. This join must be one-to-one, with each row in the DP table matches exactly one row in the baseline table, and vice versa. This ensures that there is a direct correspondence between the DP and baseline outputs for each entity, allowing for accurate comparison.

  2. After performing the join, the algorithm computes the relative error for each group. Relative error is calculated as the absolute difference between the corresponding values in the DP and baseline outputs to the value in the baseline using the formula \(abs(dp - baseline) / baseline\). If baseline is zero, it returns infinity for non-zero differences (\(∞\)) and zero for zero differences (\(0\)).

  3. The algorithm then calculates the n-th quantile of the relative error across all groups.

    The algorithm handles cases where the quantile computation may result in an empty column, returning a NaN (not a number) value in such scenarios.

    Note

    • Provided algorithm assumes a one-to-one join scenario.

    • Nulls in the measure columns are dropped because the metric cannot handle null values, and the absolute error computation requires valid numeric values in both columns.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [50, 110, 100]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [100, 100, 100]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = QuantileRelativeError(
...     quantile=0.5,
...     measure_column="X",
...     join_columns=["A"]
... )
>>> metric.quantile
0.5
>>> metric.join_columns
['A']
>>> result = metric(dp_outputs, baseline_outputs).value
>>> result
0.1
Properties#

quantile

Returns the quantile.

join_columns

Returns the name of the join columns.

indicator_column_name

Returns the name of the indicator column.

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

compute_qre()

Computes quantile relative error value from grouped dataframe.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the given inputs are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • quantile (float)

  • measure_column (str)

  • join_columns (List[str])

  • grouping_columns (Optional[List[str]])

  • name (Optional[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

property quantile: float#

Returns the quantile.

Return type:

float

property join_columns: List[str]#

Returns the name of the join columns.

Return type:

List[str]

property indicator_column_name: str | None#

Returns the name of the indicator column.

Return type:

Optional[str]

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(quantile, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Constructor.

Parameters:
  • quantile (float) – The quantile to calculate (between 0 and 1).

  • measure_column (str) – The column to compute the quantile of relative error over.

  • join_columns (List[str]) – Columns to join on.

  • grouping_columns (Optional[List[str]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

  • name (Optional[str]) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).

compute_qre(joined_output, result_column_name)#

Computes quantile relative error value from grouped dataframe.

Parameters:
required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters:

joined_output (pyspark.sql.DataFrame)

get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the given inputs are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

JoinedOutputMetricResult

get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class MedianRelativeError(measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Bases: QuantileRelativeError

Computes the median relative error.

Equivalent to QuantileRelativeError with quantile = 0.5.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [50, 110, 100]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [100, 100, 100]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = MedianRelativeError(
...     measure_column="X",
...     join_columns=["A"]
... )
>>> metric.quantile
0.5
>>> metric.join_columns
['A']
>>> result = metric(dp_outputs, baseline_outputs).value
>>> result
0.1
Properties#

quantile

Returns the quantile.

join_columns

Returns the name of the join columns.

indicator_column_name

Returns the name of the indicator column.

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

compute_qre()

Computes quantile relative error value from grouped dataframe.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the given inputs are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • measure_column (str)

  • join_columns (List[str])

  • grouping_columns (Optional[List[str]])

  • name (Optional[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

property quantile: float#

Returns the quantile.

Return type:

float

property join_columns: List[str]#

Returns the name of the join columns.

Return type:

List[str]

property indicator_column_name: str | None#

Returns the name of the indicator column.

Return type:

Optional[str]

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Constructor.

Parameters:
  • measure_column (str) – The column to compute the median of relative error over.

  • join_columns (List[str]) – Columns to join on.

  • grouping_columns (Optional[List[str]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

  • name (Optional[str]) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The output to compute the metric for.

compute_qre(joined_output, result_column_name)#

Computes quantile relative error value from grouped dataframe.

Parameters:
required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters:

joined_output (pyspark.sql.DataFrame)

get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the given inputs are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

JoinedOutputMetricResult

get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class HighRelativeErrorFraction(relative_error_threshold, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Bases: tmlt.analytics.metrics._base.JoinedOutputMetric

Computes the fraction of groups with relative error above a threshold.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

How it works:

  1. The algorithm takes as input two tables: one representing the differentially private (DP) output and the other representing the baseline output.

    DP Table (dp): This table contains the output data generated by a differentially private mechanism.

    Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.

    The algorithm includes error handling to ensure the validity of the input data. It checks for the existence and numeric type of the measure_column.

    The algorithm performs an inner join between the DP and baseline tables based on join_columns to produce the combined dataframe. This join must be one-to-one, with each row in the DP table matches exactly one row in the baseline table, and vice versa. This ensures that there is a direct correspondence between the DP and baseline outputs for each entity, allowing for accurate comparison.

  2. After performing the join, the algorithm computes the relative error for each group. Relative error is calculated as the absolute difference between the corresponding values in the DP and baseline outputs to the value in the baseline using the formula \(abs(dp - baseline) / baseline\). If baseline is zero, it returns infinity for non-zero differences (\(∞\)) and zero for zero differences (\(0\)).

  3. Next, the algorithm filters the relative error dataframe to include only those data points where the relative error exceeds a specified threshold (relative_error_threshold). This threshold represents the maximum allowable relative error for a data point to be considered within acceptable bounds.

  4. Finally, the algorithm then calculates the high relative error fraction by dividing the count of data points with relative errors exceeding the threshold by the total count of data points in the dataframe.

    The algorithm handles cases where the resulting dataframe after relative error computation is empty (i.e., it contains no data points), returning a NaN (not a number) value in such scenarios.

    Note

    • Provided algorithm assumes a one-to-one join scenario.

    • Nulls in the measure columns are dropped because the metric cannot handle null values, and the absolute error computation requires valid numeric values in both columns.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [50, 110, 100]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [100, 100, 100]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = HighRelativeErrorFraction(
...     measure_column="X",
...     relative_error_threshold=0.25,
...     join_columns=["A"]
... )
>>> metric.relative_error_threshold
0.25
>>> metric.join_columns
['A']
>>> result = metric(dp_outputs, baseline_outputs).value
>>> result
0.333
Properties#

relative_error_threshold

Returns the relative error threshold.

join_columns

Returns the name of the join columns.

indicator_column_name

Returns the name of the indicator column.

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

compute_high_re()

Computes quantile relative error value from grouped dataframe.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the given inputs are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • relative_error_threshold (float)

  • measure_column (str)

  • join_columns (List[str])

  • grouping_columns (Optional[List[str]])

  • name (Optional[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

property relative_error_threshold: float#

Returns the relative error threshold.

Return type:

float

property join_columns: List[str]#

Returns the name of the join columns.

Return type:

List[str]

property indicator_column_name: str | None#

Returns the name of the indicator column.

Return type:

Optional[str]

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(relative_error_threshold, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Constructor.

Parameters:
  • relative_error_threshold (float) – The threshold for the relative error.

  • measure_column (str) – The column to compute relative error over.

  • join_columns (List[str]) – Columns to join on.

  • grouping_columns (Optional[List[str]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

  • name (Optional[str]) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).

compute_high_re(joined_output, result_column_name)#

Computes quantile relative error value from grouped dataframe.

Parameters:

joined_output (pyspark.sql.DataFrame)

required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters:

joined_output (pyspark.sql.DataFrame)

get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the given inputs are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

JoinedOutputMetricResult

get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class HighRelativeErrorCount(relative_error_threshold, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Bases: tmlt.analytics.metrics._base.JoinedOutputMetric

Computes the count of groups with relative error above a threshold.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

How it works:

  1. The algorithm takes as input two tables: one representing the differentially private (DP) output and the other representing the baseline output.

    DP Table (dp): This table contains the output data generated by a differentially private mechanism.

    Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.

    The algorithm includes error handling to ensure the validity of the input data. It checks for the existence and numeric type of the measure_column.

    The algorithm performs an inner join between the DP and baseline tables based on join_columns to produce the combined dataframe. This join must be one-to-one, with each row in the DP table matches exactly one row in the baseline table, and vice versa. This ensures that there is a direct correspondence between the DP and baseline outputs for each entity, allowing for accurate comparison.

  2. After performing the join, the algorithm computes the relative error for each group. Relative error is calculated as the absolute difference between the corresponding values in the DP and baseline outputs to the value in the baseline using the formula \(abs(dp - baseline) / baseline\). If baseline is zero, it returns infinity for non-zero differences (\(∞\)) and zero for zero differences (\(0\)).

  3. Next, the algorithm filters the relative error dataframe to include only those data points where the relative error exceeds a specified threshold (relative_error_threshold). This threshold represents the maximum allowable relative error for a data point to be considered within acceptable bounds.

  4. Finally, the algorithm then counts the number of rows exceeding the threshold..

    The algorithm handles cases where the resulting dataframe after relative error computation is empty (i.e., it contains no data points), returning a NaN (not a number) value in such scenarios.

    Note

    • Provided algorithm assumes a one-to-one join scenario.

    • Nulls in the measure columns are dropped because the metric cannot handle null values, and the absolute error computation requires valid numeric values in both columns.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [50, 110, 100]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [100, 100, 100]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = HighRelativeErrorCount(
...     measure_column="X",
...     relative_error_threshold=0.25,
...     join_columns=["A"]
... )
>>> metric.relative_error_threshold
0.25
>>> metric.join_columns
['A']
>>> result = metric(dp_outputs, baseline_outputs).value
>>> result
1
Properties#

relative_error_threshold

Returns the relative error threshold.

join_columns

Returns the name of the join columns.

indicator_column_name

Returns the name of the indicator column.

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

compute_high_re()

Computes quantile relative error value from grouped dataframe.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the given inputs are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • relative_error_threshold (float)

  • measure_column (str)

  • join_columns (List[str])

  • grouping_columns (Optional[List[str]])

  • name (Optional[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

property relative_error_threshold: float#

Returns the relative error threshold.

Return type:

float

property join_columns: List[str]#

Returns the name of the join columns.

Return type:

List[str]

property indicator_column_name: str | None#

Returns the name of the indicator column.

Return type:

Optional[str]

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(relative_error_threshold, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baseline=None, output=None)#

Constructor.

Parameters:
  • relative_error_threshold (float) – The threshold for the relative error.

  • measure_column (str) – The column to compute relative error over.

  • join_columns (List[str]) – Columns to join on.

  • grouping_columns (Optional[List[str]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

  • name (Optional[str]) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).

compute_high_re(joined_output, result_column_name)#

Computes quantile relative error value from grouped dataframe.

Parameters:

joined_output (pyspark.sql.DataFrame)

required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters:

joined_output (pyspark.sql.DataFrame)

get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the given inputs are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

JoinedOutputMetricResult

get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class SpuriousRate(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#

Bases: tmlt.analytics.metrics._base.JoinedOutputMetric

Computes the fraction of groups in the DP output but not in the baseline output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Note

Below, released means that the group is in the DP output, and spurious means that the group is not in the baseline output.

How it works:

  1. The algorithm operates on a single table, which must appear in both the DP and baseline outputs. It joins the DP version of that table to the baseline version of the table, and notes for each row whether it appears in the DP version, the baseline version, or both.

  2. After performing the join, the algorithm computes the spurious rate by dividing the spurious released count by the total count of released data points (released_count), using the formula \(\text{spurious released count} / \text{released count}\). The result represents the proportion of released data points in the DP output that have no corresponding data points in the baseline output.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3", "c"],
...             "X": [50, 110, 100, 50]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3", "b"],
...             "X": [100, 100, 100, 50]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = SpuriousRate(
...     join_columns=["A"]
... )
>>> metric.join_columns
['A']
>>> metric(dp_outputs, baseline_outputs).value
0.25
Properties#

join_columns

Returns the name of the join columns.

indicator_column_name

Returns the name of the indicator column.

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

compute_spurious_rate()

Computes spurious rate given DP and baseline outputs.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the given inputs are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • join_columns (List[str])

  • name (Optional[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

  • grouping_columns (Optional[List[str]])

property join_columns: List[str]#

Returns the name of the join columns.

Return type:

List[str]

property indicator_column_name: str | None#

Returns the name of the indicator column.

Return type:

Optional[str]

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#

Constructor.

Parameters:
  • join_columns (List[str]) – The columns to join on.

  • name (Optional[str]) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The output to compute the spurious rate for. If None, the tuner must have a single output (which will be used).

  • grouping_columns (Optional[List[str]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

compute_spurious_rate(joined_output, result_column_name)#

Computes spurious rate given DP and baseline outputs.

Parameters:
required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters:

joined_output (pyspark.sql.DataFrame)

get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the given inputs are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

JoinedOutputMetricResult

get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class SpuriousCount(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#

Bases: tmlt.analytics.metrics._base.JoinedOutputMetric

Computes the number of groups in the DP output but not in the baseline output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Note

Below, released means that the group is in the DP output, and spurious means that the group is not in the baseline output.

How it works:

  1. The algorithm operates on a single table, which must appear in both the DP and baseline outputs. It joins the DP version of that table to the baseline version of the table, and notes for each row whether it appears in the DP version, the baseline version, or both.

  2. After performing the join, the algorithm counts the number of groups that appear only in the DP output (not the baseline), and returns that count.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3", "c"],
...             "X": [50, 110, 100, 50]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3", "b"],
...             "X": [100, 100, 100, 50]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = SpuriousCount(
...     join_columns=["A"]
... )
>>> metric.join_columns
['A']
>>> metric(dp_outputs, baseline_outputs).value
1
Properties#

join_columns

Returns the name of the join columns.

indicator_column_name

Returns the name of the indicator column.

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

count_spurious_rows()

Computes spurious count given DP and baseline outputs.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the given inputs are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • join_columns (List[str])

  • name (Optional[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

  • grouping_columns (Optional[List[str]])

property join_columns: List[str]#

Returns the name of the join columns.

Return type:

List[str]

property indicator_column_name: str | None#

Returns the name of the indicator column.

Return type:

Optional[str]

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#

Constructor.

Parameters:
  • join_columns (List[str]) – The columns to join on.

  • name (Optional[str]) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The output to compute the spurious count for. If None, the tuner must have a single output (which will be used).

  • grouping_columns (Optional[List[str]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

count_spurious_rows(joined_output, result_column_name)#

Computes spurious count given DP and baseline outputs.

Parameters:
required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters:

joined_output (pyspark.sql.DataFrame)

get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the given inputs are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

JoinedOutputMetricResult

get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class SuppressionRate(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#

Bases: tmlt.analytics.metrics._base.JoinedOutputMetric

Computes the fraction of groups in the baseline output but not in the DP output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

How it works:

  1. The algorithm operates on a single table, which must appear in both the DP and baseline outputs. It joins the DP version of that table to the baseline version of the table, and notes for each row whether it appears in the DP version, the baseline version, or both.

  2. After performing the join, the algorithm computes the suppression rate by dividing the count of rows that appear in the baseline but not the dp output by the count of released rows, using the formula \(\text{spurious released count} / \text{released count}\). The result represents the proportion of real, unreleased data points in the baseline output that have no corresponding data points in the DP output.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3", "c"],
...             "X": [50, 110, 100, 50]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3", "b"],
...             "X": [100, 100, 100, 50]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = SuppressionRate(
...     join_columns=["A"]
... )
>>> metric.join_columns
['A']
>>> metric(dp_outputs, baseline_outputs).value
0.25
Properties#

join_columns

Returns the name of the join columns.

indicator_column_name

Returns the name of the indicator column.

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

compute_suppression_rate()

Computes suppression rate given DP and baseline outputs.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the given inputs are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • join_columns (List[str])

  • name (Optional[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

  • grouping_columns (Optional[List[str]])

property join_columns: List[str]#

Returns the name of the join columns.

Return type:

List[str]

property indicator_column_name: str | None#

Returns the name of the indicator column.

Return type:

Optional[str]

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#

Constructor.

Parameters:
  • join_columns (List[str]) – The columns to join on.

  • name (Optional[str]) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – Which output to compute the suppression rate for. If None, the tuner must have a single output (which will be used).

  • grouping_columns (Optional[List[str]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

compute_suppression_rate(joined_output, result_column_name)#

Computes suppression rate given DP and baseline outputs.

Parameters:
required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters:

joined_output (pyspark.sql.DataFrame)

get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the given inputs are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

JoinedOutputMetricResult

get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class SuppressionCount(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#

Bases: tmlt.analytics.metrics._base.JoinedOutputMetric

Computes the count of groups in the baseline output but not in the DP output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

How it works:

  1. The algorithm operates on a single table, which must appear in both the DP and baseline outputs. It joins the DP version of that table to the baseline version of the table, and notes for each row whether it appears in the DP version, the baseline version, or both.

  2. After performing the join, the algorithm computes the suppression rate by counting the rows that appear in the baseline but not the dp output. The result represents the count of real, unreleased data points in the baseline output that have no corresponding data points in the DP output.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3", "c"],
...             "X": [50, 110, 100, 50]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3", "b"],
...             "X": [100, 100, 100, 50]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = SuppressionCount(
...     join_columns=["A"]
... )
>>> metric.join_columns
['A']
>>> metric(dp_outputs, baseline_outputs).value
1
Properties#

join_columns

Returns the name of the join columns.

indicator_column_name

Returns the name of the indicator column.

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

count_suppressed_rows()

Counts the number of suppressed rows given DP and baseline outputs.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the given inputs are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • join_columns (List[str])

  • name (Optional[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

  • grouping_columns (Optional[List[str]])

property join_columns: List[str]#

Returns the name of the join columns.

Return type:

List[str]

property indicator_column_name: str | None#

Returns the name of the indicator column.

Return type:

Optional[str]

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(join_columns, *, name=None, description=None, baseline=None, output=None, grouping_columns=None)#

Constructor.

Parameters:
  • join_columns (List[str]) – The columns to join on.

  • name (Optional[str]) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – Which output to compute the suppression rate for. If None, the tuner must have a single output (which will be used).

  • grouping_columns (Optional[List[str]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

count_suppressed_rows(joined_output, result_column_name)#

Counts the number of suppressed rows given DP and baseline outputs.

Parameters:
required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters:

joined_output (pyspark.sql.DataFrame)

get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the given inputs are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

JoinedOutputMetricResult

get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class Metric(name, func, description=None, grouping_columns=None, measure_column=None, empty_value=None)#

A generic metric defined using a function.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

This function (the func argument) must have the following parameters:

  • dp_outputs: a dictionary of DataFrames containing the program’s outputs.

  • baseline_outputs: a dictionary mapping baseline names to dictionaries of output DataFrames.

It may also have the following optional parameters:

  • result_column_name: if the function returns a DataFrame, the metric results should be in a column with this name

  • unprotected_inputs: A dictionary containing the program’s unprotected inputs.

  • parameters: A dictionary containing the program’s parameters.

If the metric does not have grouping columns, the function must return a numeric value, a boolean, or a string. If the metric has grouping columns, then it must return a DataFrame. This DataFrame should contain the grouping columns, and exactly one additional column containing the metric value for each group. This column’s type should be numeric, boolean, or string.

Example

>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> dp_outputs = {"O": dp_df}
>>> baseline_df1 = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> baseline_df2 = spark.createDataFrame(pd.DataFrame({"A": [6]}))
>>> baseline_outputs = {
...    "baseline1": {"O": baseline_df1}, "baseline2": {"O": baseline_df2}
... }
>>> def size_difference(dp_outputs, baseline_outputs):
...     baseline_count = baseline_outputs["baseline1"]["O"].count()
...     return abs(baseline_count - dp_outputs["O"].count())
>>> metric = Metric(
...     func=size_difference,
...     name="Custom Metric",
...     description="Custom Description",
... )
>>> result = metric(dp_outputs, baseline_outputs)
>>> result.value
0
Parameters:
  • name (str)

  • func (Union[Callable, staticmethod])

  • description (Optional[str])

  • grouping_columns (Optional[List[str]])

  • measure_column (Optional[str])

  • empty_value (Optional[Any])

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(name, func, description=None, grouping_columns=None, measure_column=None, empty_value=None)#

Constructor.

Parameters:
  • name (str) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • func (Union[Callable, staticmethod]) – The function that calculates the metric result. See the docstring for Metric for detail on the allowed input/output types of this function.

  • grouping_columns (Optional[List[str]]) – If specified, the metric should group the outputs by the given columns, and calculate the metric for each group.

  • measure_column (Optional[str]) – If specified, the column in the outputs to measure.

  • empty_value (Optional[Any]) – If all dp and baseline outputs are empty, the metric will return this value.

required_func_parameters()#

Return the required parameters to the metric function.

optional_func_parameters()#

Return the optional parameters to the metric function.

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
metric_function_inputs_empty(function_params)#

Determines if the inputs to the metric function are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

MetricResult

class SingleOutputMetric(name, func, description=None, baseline=None, output=None, grouping_columns=None, measure_column=None, empty_value=None)#

Bases: Metric

A metric computed from a single output table, defined using a function.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

This metric is defined using a function func. This function must have the following parameters:

  • dp_output: the chosen DP output DataFrame.

  • baseline_output: the chosen baseline output DataFrame.

It may also have the following optional parameters:

  • result_column_name: if the function returns a DataFrame, the metric results should be in a column with this name

  • unprotected_inputs: A dictionary containing the program’s unprotected inputs.

  • parameters: A dictionary containing the program’s parameters.

If the metric does not have grouping columns, the function must return a numeric value, a boolean, or a string. If the metric has grouping columns, then it must return a DataFrame. This DataFrame should contain the grouping columns, and exactly one additional column containing the metric value for each group. This column’s type should be numeric, boolean, or string.

Example

>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> def size_difference(dp_output: DataFrame, baseline_output: DataFrame):
...     return baseline_output.count() - dp_output.count()
>>> metric = SingleOutputMetric(
...     func=size_difference,
...     name="Output size difference",
...     description="Difference in number of rows.",
... )
>>> result = metric(dp_outputs, baseline_outputs).value
>>> result
0
Properties#

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular output is compatible with the metric.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the inputs to the metric function are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • name (str)

  • func (Union[Callable, staticmethod])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

  • grouping_columns (Optional[List[str]])

  • measure_column (Optional[str])

  • empty_value (Optional[Any])

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(name, func, description=None, baseline=None, output=None, grouping_columns=None, measure_column=None, empty_value=None)#

Constructor.

Parameters:
  • name (str) – A name for the metric.

  • func (Union[Callable, staticmethod]) – The function that calculates the metric result. See the docstring for SingleOutputMetric for detail on the allowed input/output types of this function.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).

  • grouping_columns (Optional[List[str]]) – If specified, the metric should group the outputs by the given columns, and calculate the metric for each group.

  • measure_column (Optional[str]) – If specified, the column in the outputs to measure.

  • empty_value (Optional[Any]) – If all dp and baseline outputs are empty, the metric will return this value.

get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular output is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the inputs to the metric function are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

SingleOutputMetricResult

optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class JoinedOutputMetric(name, func, join_columns, description=None, baseline=None, output=None, grouping_columns=None, measure_column=None, empty_value=None, join_how='inner', dropna_columns=None, indicator_column_name=None)#

Bases: SingleOutputMetric

A metric computed from a join between a single DP and baseline output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

The metric is defined using a function func. This function must have the following parameters:

  • joined_output: A DataFrame created by joining the selected DP and baseline outputs.

It may also have the following optional parameters:

  • result_column_name: if the function returns a dataframe, the metric results should be in a column with this name

  • unprotected_inputs: A dictionary containing the program’s unprotected inputs.

  • parameters: A dictionary containing the program’s parameters.

If the metric does not have grouping columns, the function must return a numeric value, a boolean, or a string. If the metric has grouping columns, then it must return a DataFrame. This DataFrame should contain the grouping columns, and exactly one additional column containing the metric value for each group. This column’s type should be numeric, boolean, or string.

Example

>>> dp_df = spark.createDataFrame(pd.DataFrame([{"A": 1, "B": "a"}]))
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(pd.DataFrame([{"A": 5}]))
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> def size_difference(joined_output: DataFrame,
...                     result_column_name: str):
...     in_dp = (col("indicator") == "both") | (col("indicator") == "dp")
...     in_baseline = ((col("indicator") == "both") |
...          (col("indicator") == "baseline"))
...     dp_count = sf.sum(sf.when(in_dp, sf.lit(1)).otherwise(0))
...     baseline_count = sf.sum(sf.when(in_baseline, sf.lit(1)).otherwise(0))
...     size_difference = joined_output.agg(
...         sf.abs(dp_count - baseline_count).alias(result_column_name)
...     )
...     return size_difference.head(1)[0][result_column_name]
>>> metric = JoinedOutputMetric(
...     func=size_difference,
...     name="Output size difference",
...     description="Difference in number of rows.",
...     join_columns=["A"],
...     join_how="outer",
...     indicator_column_name="indicator",
... )
>>> result = metric(dp_outputs, baseline_outputs).value
>>> result
0
Properties#

join_columns

Returns the name of the join columns.

indicator_column_name

Returns the name of the indicator column.

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the given inputs are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • name (str)

  • func (Union[Callable, staticmethod])

  • join_columns (List[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

  • grouping_columns (Optional[List[str]])

  • measure_column (Optional[str])

  • empty_value (Optional[Any])

  • join_how (str)

  • dropna_columns (Optional[List[str]])

  • indicator_column_name (Optional[str])

property join_columns: List[str]#

Returns the name of the join columns.

Return type:

List[str]

property indicator_column_name: str | None#

Returns the name of the indicator column.

Return type:

Optional[str]

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(name, func, join_columns, description=None, baseline=None, output=None, grouping_columns=None, measure_column=None, empty_value=None, join_how='inner', dropna_columns=None, indicator_column_name=None)#

Constructor.

Parameters:
  • name (str) – A name for the metric.

  • func (Union[Callable, staticmethod]) – The function that calculates the metric result. See the docstring for JoinedOutputMetric for detail on the allowed input/output types of this function.

  • join_columns (List[str]) – The columns to join on.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).

  • grouping_columns (Optional[List[str]]) – If specified, the metric should group the outputs by the given columns, and calculate the metric for each group.

  • measure_column (Optional[str]) – If specified, the column in the outputs to measure.

  • empty_value (Optional[Any]) – If all dp and baseline outputs are empty, the metric will return this value.

  • join_how (str) – The type of join to perform. Must be one of “left”, “right”, “inner”, “outer”. Defaults to “inner”.

  • dropna_columns (Optional[List[str]]) – If specified, rows with nulls in these columns will be dropped.

  • indicator_column_name (Optional[str]) – If specified, we will add a column with the specified name to the joined data that contains either “dp”, “baseline”, or “both” to indicate where the values in the row came from.

required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters:

joined_output (pyspark.sql.DataFrame)

get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the given inputs are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

JoinedOutputMetricResult

get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class JoinedOutputMetricResult#

Bases: SingleOutputMetricResult

The output of a JoinedOutputMetric with additional metadata.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

name: str#

The name of the metric.

description: str#

The description of the metric.

baseline: str | List[str]#

The name of the baseline program(s) used to calculate this metric.

output: str | List[str]#

The name of the program output(s) used to calculate this metric.

value: Any#

The value of the metric applied to the program outputs.

grouping_columns: List[str]#

Grouping columns of the metric.

measure_column: str | None#

Measure column of the metric.

format_as_summary_row()#

Return a table row summarizing the metric result.

Return type:

pandas.DataFrame

result_column_name()#

Returns the name of the column containing the metric results.

Only relevant if value is a DataFrame.

Return type:

str

format_as_dataframe()#

Returns the results of this metric formatted as a DataFrame.

Return type:

pandas.DataFrame

class MetricResult#

An output of a Metric with additional metadata.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

name: str#

The name of the metric.

description: str#

The description of the metric.

baseline: str | List[str]#

The name of the baseline program(s) used to calculate this metric.

output: str | List[str]#

The name of the program output(s) used to calculate this metric.

value: Any#

The value of the metric applied to the program outputs.

grouping_columns: List[str]#

Grouping columns of the metric.

measure_column: str | None#

Measure column of the metric.

format_as_summary_row()#

Return a table row summarizing the metric result.

Return type:

pandas.DataFrame

format_as_dataframe()#

Returns the results of this metric formatted as a DataFrame.

Return type:

pandas.DataFrame

result_column_name()#

Returns the name of the column containing the metric results.

Only relevant if value is a DataFrame.

Return type:

str

class SingleOutputMetricResult#

Bases: MetricResult

The output of a SingleOutputMetric with additional metadata.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

name: str#

The name of the metric.

description: str#

The description of the metric.

baseline: str | List[str]#

The name of the baseline program(s) used to calculate this metric.

output: str | List[str]#

The name of the program output(s) used to calculate this metric.

value: Any#

The value of the metric applied to the program outputs.

grouping_columns: List[str]#

Grouping columns of the metric.

measure_column: str | None#

Measure column of the metric.

format_as_summary_row()#

Return a table row summarizing the metric result.

Return type:

pandas.DataFrame

result_column_name()#

Returns the name of the column containing the metric results.

Only relevant if value is a DataFrame.

Return type:

str

format_as_dataframe()#

Returns the results of this metric formatted as a DataFrame.

Return type:

pandas.DataFrame

class CountBaselineRows(*, name=None, description=None, baseline=None, output=None, grouping_columns=None)#

Bases: tmlt.analytics.metrics._base.SingleOutputMetric

Returns the number of rows in the baseline output.

If grouped, will return a count for every group that appears in either the DP or baseline output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [50, 110, 100]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3", "b"],
...             "X": [100, 100, 100, 50]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = CountBaselineRows()
>>> metric(dp_outputs, baseline_outputs).value
4
Properties#

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

count_baseline_rows()

Counts the number of released rows.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular output is compatible with the metric.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the inputs to the metric function are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • name (Optional[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

  • grouping_columns (Optional[List[str]])

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(*, name=None, description=None, baseline=None, output=None, grouping_columns=None)#

Constructor.

Parameters:
  • name (Optional[str]) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The output to compute the spurious rate for. If None, the tuner must have a single output (which will be used).

  • grouping_columns (Optional[List[str]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

count_baseline_rows(dp_output, baseline_output, result_column_name)#

Counts the number of released rows.

Parameters:
get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular output is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the inputs to the metric function are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

SingleOutputMetricResult

optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters:
class CountReleasedRows(*, name=None, description=None, baseline=None, output=None, grouping_columns=None)#

Bases: tmlt.analytics.metrics._base.SingleOutputMetric

Returns the number of rows released in the DP output.

If grouped, will return a count for every group that appears in either the DP or baseline output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [50, 110, 100]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3", "b"],
...             "X": [100, 100, 100, 50]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = CountReleasedRows()
>>> metric(dp_outputs, baseline_outputs).value
3
Properties#

baseline

Return the name of the baseline specified in the constructor (if any).

output

Return the name of the output specified in the constructor (if any).

name

Returns the name of the metric.

description

Returns the description of the metric.

func

Returns function to be applied.

grouping_columns

Returns the grouping columns.

measure_column

Returns the measure column (if any).

empty_value

The value this metric will return when inputs are empty.

Methods#

count_released_rows()

Counts the number of released rows.

get_baseline()

Returns the name of the single baseline this metric will be applied to.

get_output()

Returns the name of the single output the metric will be applied to.

get_column_name_from_baselines()

Get the result column name for a given set of outputs.

required_func_parameters()

Return the required parameters to the metric function.

check_compatibility_with_outputs()

Check that a particular output is compatible with the metric.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

get_parameter_values()

Returns values for the function’s parameters.

metric_function_inputs_empty()

Determines if the inputs to the metric function are empty.

__call__()

Computes the given metric on the given DP and baseline outputs.

optional_func_parameters()

Return the optional parameters to the metric function.

validate_result()

Check that the metric result is an allowed type.

Parameters:
  • name (Optional[str])

  • description (Optional[str])

  • baseline (Optional[str])

  • output (Optional[str])

  • grouping_columns (Optional[List[str]])

property baseline: str | None#

Return the name of the baseline specified in the constructor (if any).

Return type:

Optional[str]

property output: str | None#

Return the name of the output specified in the constructor (if any).

Return type:

Optional[str]

property name: str#

Returns the name of the metric.

Return type:

str

property description: str#

Returns the description of the metric.

Return type:

str

property func: Callable#

Returns function to be applied.

Return type:

Callable

property grouping_columns: List[str]#

Returns the grouping columns.

Return type:

List[str]

property measure_column: str | None#

Returns the measure column (if any).

Return type:

Optional[str]

property empty_value: Any#

The value this metric will return when inputs are empty.

Return type:

Any

__init__(*, name=None, description=None, baseline=None, output=None, grouping_columns=None)#

Constructor.

Parameters:
  • name (Optional[str]) – A name for the metric.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The output to compute the spurious rate for. If None, the tuner must have a single output (which will be used).

  • grouping_columns (Optional[List[str]]) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

count_released_rows(dp_output, baseline_output, result_column_name)#

Counts the number of released rows.

Parameters:
get_baseline(baseline_outputs)#

Returns the name of the single baseline this metric will be applied to.

Return type:

str

get_output(outputs)#

Returns the name of the single output the metric will be applied to.

Parameters:

outputs (Mapping[str, Optional[pyspark.sql.DataFrame]])

Return type:

str

get_column_name_from_baselines(baseline_outputs)#

Get the result column name for a given set of outputs.

required_func_parameters()#

Return the required parameters to the metric function.

check_compatibility_with_outputs(outputs, output_type)#

Check that a particular output is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters:
check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters:
get_parameter_values(dp_outputs, baseline_outputs, unprotected_inputs, parameters)#

Returns values for the function’s parameters.

Return type:

Dict[str, Any]

metric_function_inputs_empty(function_params)#

Determines if the inputs to the metric function are empty.

Parameters:

function_params (Mapping[str, Any])

Return type:

bool

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters:
  • dp_outputs (Mapping[str, Optional[pyspark.sql.DataFrame]]) – The differentially private outputs of the program.

  • baseline_outputs (Mapping[str, Mapping[str, Optional[pyspark.sql.DataFrame]]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Mapping[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • parameters (Optional[Mapping[str, Any]]) – Optional program specific parameters used in error computation.

Return type:

SingleOutputMetricResult

optional_func_parameters()#

Return the optional parameters to the metric function.

validate_result(result, baseline_outputs)#

Check that the metric result is an allowed type.

Parameters: