_relative_error#

Metric functions for relating to relative error.

Classes#

RelativeError

Computes the relative error between two scalar values.

QuantileRelativeError

Computes the quantile of the empirical relative error.

MedianRelativeError

Computes the median relative error.

HighRelativeErrorFraction

Computes the fraction of groups with relative error above a threshold.

class RelativeError(output, column=None, *, name=None, description=None, baselines=None)#

Bases: tmlt.analytics.metrics._base.ScalarMetric

Computes the relative error between two scalar values.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

How it works:

  1. The algorithm takes as input two single-row tables: one representing the differentially private (DP) output and the other representing the baseline output.

    DP Table (dp): This table contains the output data generated by a differentially private mechanism.

    Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.

    The scalar values are retrieved from these single-row dataframes. Both values are expected to be numeric (either integers or floats). If not, the algorithm raises a ValueError.

  2. The algorithm computes the relative error. Relative error is calculated as the absolute difference between the corresponding values in the DP and baseline outputs to the value in the baseline using the formula \(abs(dp - baseline) / baseline\). If baseline is zero, it returns infinity for non-zero differences (\(∞\)) and zero for zero differences (\(0\)).

Example

>>> dp_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(pd.DataFrame({"A": [5]}))
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = RelativeError(output="O")
>>> result = metric(dp_outputs, baseline_outputs)[0].value
>>> result
0.0
>>> metric.format(result)
'0.0'
Methods#

format()

Returns a string representation of this object.

format_as_table_row()

Return a table row summarizing the metric result.

compute_on_scalar()

Computes metric value from DP and baseline values.

output()

Returns the name of the run output or view name.

column()

Returns the name of the value column, if it is set.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

compute_for_baseline()

Returns the metric value given the DP outputs and the baseline outputs.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

name()

Returns the name of the metric.

description()

Returns the description of the metric.

baselines()

Returns the baselines used for the metric.

format_as_dataframe()

Returns the results of this metric formatted as a dataframe.

__call__()

Computes the given metric on the given DP and baseline outputs.

Parameters
  • output (str) –

  • column (Optional[str]) –

  • name (Optional[str]) –

  • description (Optional[str]) –

  • baselines (Optional[List[str]]) –

__init__(output, column=None, *, name=None, description=None, baselines=None)#

Constructor.

Parameters
  • output (strstr) – The output to compute the metric for.

  • column (str | NoneOptional[str] (default: None)) – The column to compute the relative error over. If the given output has only one column, this argument may be omitted.

  • name (str | NoneOptional[str] (default: None)) – A name for the metric.

  • description (str | NoneOptional[str] (default: None)) – A description of the metric.

  • baselines (List[str] | NoneOptional[List[str]] (default: None)) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

format(value)#

Returns a string representation of this object.

format_as_table_row(result)#

Return a table row summarizing the metric result.

Parameters

result (tmlt.analytics.metrics._base.MetricResult) –

Return type

pandas.DataFrame

compute_on_scalar(dp_value, baseline_value)#

Computes metric value from DP and baseline values.

property output#

Returns the name of the run output or view name.

Return type

str

property column#

Returns the name of the value column, if it is set.

Return type

Optional[str]

check_compatibility_with_outputs(outputs, output_name)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters
compute_for_baseline(baseline_name, dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Returns the metric value given the DP outputs and the baseline outputs.

Parameters
Return type

Any

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters
property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

format_as_dataframe(result)#

Returns the results of this metric formatted as a dataframe.

Parameters

result (tmlt.analytics.metrics.MetricResult) –

Return type

MetricResultDataframe

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricResult]

class QuantileRelativeError(output, quantile, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baselines=None)#

Bases: tmlt.analytics.metrics._base.MeasureColumnMetric

Computes the quantile of the empirical relative error.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

How it works:

  1. The algorithm takes as input two tables: one representing the differentially private (DP) output and the other representing the baseline output.

    DP Table (dp): This table contains the output data generated by a differentially private mechanism.

    Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.

    The algorithm includes error handling to ensure the validity of the input data. It checks for the existence and numeric type of the measure_column.

    The algorithm performs an inner join between the DP and baseline tables based on join_columns to produce the combined dataframe. This join must be one-to-one, with each row in the DP table matches exactly one row in the baseline table, and vice versa. This ensures that there is a direct correspondence between the DP and baseline outputs for each entity, allowing for accurate comparison.

  2. After performing the join, the algorithm computes the relative error for each group. Relative error is calculated as the absolute difference between the corresponding values in the DP and baseline outputs to the value in the baseline using the formula \(abs(dp - baseline) / baseline\). If baseline is zero, it returns infinity for non-zero differences (\(∞\)) and zero for zero differences (\(0\)).

  3. The algorithm then calculates the n-th quantile of the relative error across all groups.

    The algorithm handles cases where the quantile computation may result in an empty column, returning a NaN (not a number) value in such scenarios.

    Note

    • Provided algorithm assumes a one-to-one join scenario.

    • Nulls in the measure columns are dropped because the metric cannot handle null values, and the absolute error computation requires valid numeric values in both columns.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [50, 110, 100]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [100, 100, 100]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = QuantileRelativeError(
...     output="O",
...     quantile=0.5,
...     measure_column="X",
...     join_columns=["A"]
... )
>>> metric.quantile
0.5
>>> metric.join_columns
['A']
>>> result = metric(dp_outputs, baseline_outputs)[0].value
>>> result
0.1
>>> metric.format(result)
'0.10'
Methods#

quantile()

Returns the quantile.

format()

Returns a string representation of this object.

format_as_table_row()

Return a table row summarizing the metric result.

compute_on_grouped_output()

Computes quantile relative error value from grouped dataframe.

measure_column()

Returns the names of the grouping columns.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

format_as_dataframe()

Returns the results of this metric formatted as a dataframe.

grouping_columns()

Returns the names of the grouping columns.

output()

Returns the name of the run output or view name.

join_columns()

Returns the name of the join columns.

indicator_column_name()

Returns the name of the indicator column.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

compute_for_baseline()

Computes metric value.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

name()

Returns the name of the metric.

description()

Returns the description of the metric.

baselines()

Returns the baselines used for the metric.

__call__()

Computes the given metric on the given DP and baseline outputs.

Parameters
  • output (str) –

  • quantile (float) –

  • measure_column (str) –

  • join_columns (List[str]) –

  • grouping_columns (Optional[List[str]]) –

  • name (Optional[str]) –

  • description (Optional[str]) –

  • baselines (Optional[List[str]]) –

__init__(output, quantile, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baselines=None)#

Constructor.

Parameters
  • output (strstr) – The output to compute the metric for.

  • quantile (floatfloat) – The quantile to calculate (between 0 and 1).

  • measure_column (strstr) – The column to compute the quantile of relative error over.

  • join_columns (List[str]List[str]) – Columns to join on.

  • grouping_columns (List[str] | NoneOptional[List[str]] (default: None)) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

  • name (str | NoneOptional[str] (default: None)) – A name for the metric.

  • description (str | NoneOptional[str] (default: None)) – A description of the metric.

  • baselines (List[str] | NoneOptional[List[str]] (default: None)) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property quantile#

Returns the quantile.

Return type

float

format(value)#

Returns a string representation of this object.

format_as_table_row(result)#

Return a table row summarizing the metric result.

Parameters

result (tmlt.analytics.metrics._base.MetricResult) –

Return type

pandas.DataFrame

compute_on_grouped_output(grouped_output, baseline_name, unprotected_inputs=None, program_parameters=None)#

Computes quantile relative error value from grouped dataframe.

Parameters
property measure_column#

Returns the names of the grouping columns.

Return type

str

check_compatibility_with_outputs(outputs, output_name)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters
format_as_dataframe(result)#

Returns the results of this metric formatted as a dataframe.

Parameters

result (tmlt.analytics.metrics.MetricResult) –

Return type

pandas.DataFrame

property grouping_columns#

Returns the names of the grouping columns.

Return type

List[str]

property output#

Returns the name of the run output or view name.

Return type

str

property join_columns#

Returns the name of the join columns.

Return type

List[str]

property indicator_column_name#

Returns the name of the indicator column.

Return type

Optional[str]

check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters

joined_output (pyspark.sql.DataFrame) –

compute_for_baseline(baseline_name, dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes metric value.

Parameters
check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters
property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricResult]

class MedianRelativeError(output, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baselines=None)#

Bases: QuantileRelativeError

Computes the median relative error.

Equivalent to QuantileRelativeError with quantile = 0.5.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [50, 110, 100]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [100, 100, 100]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = MedianRelativeError(
...     output="O",
...     measure_column="X",
...     join_columns=["A"]
... )
>>> metric.quantile
0.5
>>> metric.join_columns
['A']
>>> result = metric(dp_outputs, baseline_outputs)[0].value
>>> result
0.1
>>> metric.format(result)
'0.10'
Methods#

quantile()

Returns the quantile.

format()

Returns a string representation of this object.

format_as_table_row()

Return a table row summarizing the metric result.

compute_on_grouped_output()

Computes quantile relative error value from grouped dataframe.

measure_column()

Returns the names of the grouping columns.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

format_as_dataframe()

Returns the results of this metric formatted as a dataframe.

grouping_columns()

Returns the names of the grouping columns.

output()

Returns the name of the run output or view name.

join_columns()

Returns the name of the join columns.

indicator_column_name()

Returns the name of the indicator column.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

compute_for_baseline()

Computes metric value.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

name()

Returns the name of the metric.

description()

Returns the description of the metric.

baselines()

Returns the baselines used for the metric.

__call__()

Computes the given metric on the given DP and baseline outputs.

Parameters
  • output (str) –

  • measure_column (str) –

  • join_columns (List[str]) –

  • grouping_columns (Optional[List[str]]) –

  • name (Optional[str]) –

  • description (Optional[str]) –

  • baselines (Optional[List[str]]) –

__init__(output, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baselines=None)#

Constructor.

Parameters
  • output (strstr) – The output to compute the metric for.

  • measure_column (strstr) – The column to compute the median of relative error over.

  • join_columns (List[str]List[str]) – Columns to join on.

  • grouping_columns (List[str] | NoneOptional[List[str]] (default: None)) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

  • name (str | NoneOptional[str] (default: None)) – A name for the metric.

  • description (str | NoneOptional[str] (default: None)) – A description of the metric.

  • baselines (List[str] | NoneOptional[List[str]] (default: None)) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property quantile#

Returns the quantile.

Return type

float

format(value)#

Returns a string representation of this object.

format_as_table_row(result)#

Return a table row summarizing the metric result.

Parameters

result (tmlt.analytics.metrics._base.MetricResult) –

Return type

pandas.DataFrame

compute_on_grouped_output(grouped_output, baseline_name, unprotected_inputs=None, program_parameters=None)#

Computes quantile relative error value from grouped dataframe.

Parameters
property measure_column#

Returns the names of the grouping columns.

Return type

str

check_compatibility_with_outputs(outputs, output_name)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters
format_as_dataframe(result)#

Returns the results of this metric formatted as a dataframe.

Parameters

result (tmlt.analytics.metrics.MetricResult) –

Return type

pandas.DataFrame

property grouping_columns#

Returns the names of the grouping columns.

Return type

List[str]

property output#

Returns the name of the run output or view name.

Return type

str

property join_columns#

Returns the name of the join columns.

Return type

List[str]

property indicator_column_name#

Returns the name of the indicator column.

Return type

Optional[str]

check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters

joined_output (pyspark.sql.DataFrame) –

compute_for_baseline(baseline_name, dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes metric value.

Parameters
check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters
property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricResult]

class HighRelativeErrorFraction(output, relative_error_threshold, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baselines=None)#

Bases: tmlt.analytics.metrics._base.MeasureColumnMetric

Computes the fraction of groups with relative error above a threshold.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

How it works:

  1. The algorithm takes as input two tables: one representing the differentially private (DP) output and the other representing the baseline output.

    DP Table (dp): This table contains the output data generated by a differentially private mechanism.

    Baseline Table (baseline): This table contains the output data generated by a non-private or baseline mechanism. It serves as a reference point for comparison with the DP output.

    The algorithm includes error handling to ensure the validity of the input data. It checks for the existence and numeric type of the measure_column.

    The algorithm performs an inner join between the DP and baseline tables based on join_columns to produce the combined dataframe. This join must be one-to-one, with each row in the DP table matches exactly one row in the baseline table, and vice versa. This ensures that there is a direct correspondence between the DP and baseline outputs for each entity, allowing for accurate comparison.

  2. After performing the join, the algorithm computes the relative error for each group. Relative error is calculated as the absolute difference between the corresponding values in the DP and baseline outputs to the value in the baseline using the formula \(abs(dp - baseline) / baseline\). If baseline is zero, it returns infinity for non-zero differences (\(∞\)) and zero for zero differences (\(0\)).

  3. Next, the algorithm filters the relative error dataframe to include only those data points where the relative error exceeds a specified threshold (relative_error_threshold). This threshold represents the maximum allowable relative error for a data point to be considered within acceptable bounds.

  4. Finally, the algorithm then calculates the high relative error fraction by dividing the count of data points with relative errors exceeding the threshold by the total count of data points in the dataframe.

    The algorithm handles cases where the resulting dataframe after relative error computation is empty (i.e., it contains no data points), returning a NaN (not a number) value in such scenarios.

    Note

    • Provided algorithm assumes a one-to-one join scenario.

    • Nulls in the measure columns are dropped because the metric cannot handle null values, and the absolute error computation requires valid numeric values in both columns.

Example

>>> dp_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [50, 110, 100]
...         }
...     )
... )
>>> dp_outputs = {"O": dp_df}
>>> baseline_df = spark.createDataFrame(
...     pd.DataFrame(
...         {
...             "A": ["a1", "a2", "a3"],
...             "X": [100, 100, 100]
...         }
...     )
... )
>>> baseline_outputs = {"default": {"O": baseline_df}}
>>> metric = HighRelativeErrorFraction(
...     output="O",
...     measure_column="X",
...     relative_error_threshold=0.25,
...     join_columns=["A"]
... )
>>> metric.relative_error_threshold
0.25
>>> metric.join_columns
['A']
>>> result = metric(dp_outputs, baseline_outputs)[0].value
>>> result
0.333
>>> metric.format(result)
'0.33'
Methods#

relative_error_threshold()

Returns the relative error threshold.

format()

Returns a string representation of this object.

format_as_table_row()

Return a table row summarizing the metric result.

compute_on_grouped_output()

Computes quantile relative error value from grouped dataframe.

measure_column()

Returns the names of the grouping columns.

check_compatibility_with_outputs()

Check that a particular set of outputs is compatible with the metric.

format_as_dataframe()

Returns the results of this metric formatted as a dataframe.

grouping_columns()

Returns the names of the grouping columns.

output()

Returns the name of the run output or view name.

join_columns()

Returns the name of the join columns.

indicator_column_name()

Returns the name of the indicator column.

check_join_key_uniqueness()

Check if the join keys uniquely identify rows in the joined DataFrame.

compute_for_baseline()

Computes metric value.

check_compatibility_with_data()

Check that the outputs have all the structure the metric expects.

name()

Returns the name of the metric.

description()

Returns the description of the metric.

baselines()

Returns the baselines used for the metric.

__call__()

Computes the given metric on the given DP and baseline outputs.

Parameters
  • output (str) –

  • relative_error_threshold (float) –

  • measure_column (str) –

  • join_columns (List[str]) –

  • grouping_columns (Optional[List[str]]) –

  • name (Optional[str]) –

  • description (Optional[str]) –

  • baselines (Optional[str]) –

__init__(output, relative_error_threshold, measure_column, join_columns, grouping_columns=None, *, name=None, description=None, baselines=None)#

Constructor.

Parameters
  • output (strstr) – The output to compute the metric for.

  • relative_error_threshold (floatfloat) – The threshold for the relative error.

  • measure_column (strstr) – The column to compute relative error over.

  • join_columns (List[str]List[str]) – Columns to join on.

  • grouping_columns (List[str] | NoneOptional[List[str]] (default: None)) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.

  • name (str | NoneOptional[str] (default: None)) – A name for the metric.

  • description (str | NoneOptional[str] (default: None)) – A description of the metric.

  • baselines (str | NoneOptional[str] (default: None)) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property relative_error_threshold#

Returns the relative error threshold.

Return type

float

format(value)#

Returns a string representation of this object.

format_as_table_row(result)#

Return a table row summarizing the metric result.

Parameters

result (tmlt.analytics.metrics._base.MetricResult) –

Return type

pandas.DataFrame

compute_on_grouped_output(grouped_output, baseline_name, unprotected_inputs=None, program_parameters=None)#

Computes quantile relative error value from grouped dataframe.

Parameters
property measure_column#

Returns the names of the grouping columns.

Return type

str

check_compatibility_with_outputs(outputs, output_name)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters
format_as_dataframe(result)#

Returns the results of this metric formatted as a dataframe.

Parameters

result (tmlt.analytics.metrics.MetricResult) –

Return type

pandas.DataFrame

property grouping_columns#

Returns the names of the grouping columns.

Return type

List[str]

property output#

Returns the name of the run output or view name.

Return type

str

property join_columns#

Returns the name of the join columns.

Return type

List[str]

property indicator_column_name#

Returns the name of the indicator column.

Return type

Optional[str]

check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters

joined_output (pyspark.sql.DataFrame) –

compute_for_baseline(baseline_name, dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes metric value.

Parameters
check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters
property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricResult]