_base#

Base classes for metrics.

Classes#

`Metric`	A generic metric.
`MetricResult`	An output of a Metric with additional metadata.
`MetricResultDataframe`	A table version of a metric’s results.
`SingleBaselineMetric`	Base class for metrics computed from DP outputs and a single baseline’s outputs.
`MultiBaselineMetric`	Base class for metrics computed from DP outputs and multiple baseline outputs.
`JoinedOutputMetric`	Base class for metrics computed from join between single DP and baseline output.
`GroupedMetric`	Base class for metrics that can be computed on each group in a joined output.
`MeasureColumnMetric`	Base class for metrics that are computed on a single measure column.
`ScalarMetric`	Base class for metrics computed from outputs containing only one value.

class Metric(name, description, baselines)#

Bases: abc.ABC

A generic metric.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Parameters

name (str) –
description (str) –
baselines (Optional[Union[str, List[str]]]) –

__init__(name, description, baselines)#

Constructor.

Parameters

name (strstr) – A name for the metric.
description (strstr) – A description of the metric.
baselines (str | List[str] | NoneUnion[str, List[str], None]) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property name#

Returns the name of the metric.

Return type: str

property description#

Returns the description of the metric.

Return type: str

property baselines#

Returns the baselines used for the metric.

Return type: Optional[Union[str, List[str]]]

abstract format(value)#

Converts value to human-readable format.

Parameters: value (Any) –

abstract format_as_table_row(result)#

Return a table row summarizing the metric result.

Parameters: result (tmlt.analytics.metrics.MetricResult) –
Return type: pandas.DataFrame

format_as_dataframe(result)#

Returns the results of this metric formatted as a dataframe.

Parameters: result (tmlt.analytics.metrics.MetricResult) –
Return type: MetricResultDataframe

abstract check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) –

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricResult]

class MetricResult#

An output of a Metric with additional metadata.

Note

💡 This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

name :str#: The name of the metric.

description :str#: The description of the metric.

baseline :Union[str, List[str]]#: The name of the baseline program(s) used for the error report.

metric :Metric#: The metric that was used.

value :Any#: The value of the metric applied to the program outputs.

format_as_table_row()#

Return a table row summarizing the metric result.

Return type: pandas.DataFrame

format_as_dataframe()#

Returns the results of this metric formatted as a dataframe.

Return type: MetricResultDataframe

class MetricResultDataframe#

Bases: NamedTuple

A table version of a metric’s results.

df :pandas.DataFrame#: The results, formatted as a dataframe.

value_column :str#: The name of the column containing the metric value.

class SingleBaselineMetric(name, description, baselines)#

Bases: Metric, abc.ABC

Base class for metrics computed from DP outputs and a single baseline’s outputs.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Subclasses of SingleBaselineMetric define a compute_for_baseline method from DP outputs and one baseline’s outputs to the metric value.

Parameters

name (str) –
description (str) –
baselines (Optional[Union[str, List[str]]]) –

__init__(name, description, baselines)#

Constructor.

Parameters

name (strstr) – A name for the metric.
description (strstr) – A description of the metric.
baselines (str | List[str] | NoneUnion[str, List[str], None]) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

abstract check_compatibility_with_outputs(outputs, output_name)#

Check that a particular output is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters

outputs (Dict[str, pyspark.sql.DataFrame]) –
output_name (str) –

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) –

property name#

Returns the name of the metric.

Return type: str

property description#

Returns the description of the metric.

Return type: str

property baselines#

Returns the baselines used for the metric.

Return type: Optional[Union[str, List[str]]]

abstract format(value)#

Converts value to human-readable format.

Parameters: value (Any) –

abstract format_as_table_row(result)#

Return a table row summarizing the metric result.

Parameters: result (tmlt.analytics.metrics.MetricResult) –
Return type: pandas.DataFrame

format_as_dataframe(result)#

Returns the results of this metric formatted as a dataframe.

Parameters: result (tmlt.analytics.metrics.MetricResult) –
Return type: MetricResultDataframe

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricResult]

class MultiBaselineMetric(name, description, baselines)#

Bases: Metric, abc.ABC

Base class for metrics computed from DP outputs and multiple baseline outputs.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Subclasses of MultiBaselineMetric define a compute_for_multiple_baselines method from DP outputs and a collection of outputs from several baselines to the metric value.

Parameters

name (str) –
description (str) –
baselines (Optional[Union[str, List[str]]]) –

__init__(name, description, baselines)#

Constructor.

Parameters

name (strstr) – A name for the metric.
description (strstr) – A description of the metric.
baselines (str | List[str] | NoneUnion[str, List[str], None]) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

compute(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

The baseline_outputs will already be filtered to only include the baselines that the metric is supposed to use.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs, after filtering to only include the baselines that the metric is supposed to use.
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricResult]

property name#

Returns the name of the metric.

Return type: str

property description#

Returns the description of the metric.

Return type: str

property baselines#

Returns the baselines used for the metric.

Return type: Optional[Union[str, List[str]]]

abstract format(value)#

Converts value to human-readable format.

Parameters: value (Any) –

abstract format_as_table_row(result)#

Return a table row summarizing the metric result.

Parameters: result (tmlt.analytics.metrics.MetricResult) –
Return type: pandas.DataFrame

format_as_dataframe(result)#

Returns the results of this metric formatted as a dataframe.

Parameters: result (tmlt.analytics.metrics.MetricResult) –
Return type: MetricResultDataframe

abstract check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) –

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricResult]

class JoinedOutputMetric(output, join_columns, name, description, baselines, join_how='inner', dropna_columns=None, indicator_column_name=None)#

Bases: SingleBaselineMetric, abc.ABC

Base class for metrics computed from join between single DP and baseline output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Subclasses of JoinedOutputMetric define a compute_on_joined_output method which takes in a single dataframe, the result of joining the DP and baseline output tables with the given name on the given list of columns, and returns the metric value. The joined table is the result of performing a join (default inner join) between the DP and baseline tables on the given join columns.

Methods#
`output()`	Returns the name of the run output or view name.
`join_columns()`	Returns the name of the join columns.
`indicator_column_name()`	Returns the name of the indicator column.
`check_compatibility_with_outputs()`	Check that a particular set of outputs is compatible with the metric.
`check_join_key_uniqueness()`	Check if the join keys uniquely identify rows in the joined DataFrame.
`compute_for_baseline()`	Computes metric value.
`format_as_table_row()`	Return a table row summarizing the metric result.
`check_compatibility_with_data()`	Check that the outputs have all the structure the metric expects.
`name()`	Returns the name of the metric.
`description()`	Returns the description of the metric.
`baselines()`	Returns the baselines used for the metric.
`format()`	Converts value to human-readable format.
`format_as_dataframe()`	Returns the results of this metric formatted as a dataframe.
`__call__()`	Computes the given metric on the given DP and baseline outputs.

Parameters

output (str) –
join_columns (List[str]) –
name (str) –
description (str) –
baselines (Optional[Union[str, List[str]]]) –
join_how (str) –
dropna_columns (Optional[List[str]]) –
indicator_column_name (Optional[str]) –

__init__(output, join_columns, name, description, baselines, join_how='inner', dropna_columns=None, indicator_column_name=None)#

Constructor.

Parameters

output (strstr) – The output to compute the metric for.
join_columns (List[str]List[str]) – The columns to join on.
name (strstr) – A name for the metric.
description (strstr) – A description of the metric.
baselines (str | List[str] | NoneUnion[str, List[str], None]) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.
join_how (strstr (default: 'inner')) – The type of join to perform. Must be one of “left”, “right”, “inner”, “outer”. Defaults to “inner”.
dropna_columns (List[str] | NoneOptional[List[str]] (default: None)) – If specified, rows with nulls in these columns will be dropped.
indicator_column_name (str | NoneOptional[str] (default: None)) – If specified, we will add a column with the specified name to the joined data that contains either “dp”, “baseline”, or “both” to indicate where the values in the row came from.

property output#

Returns the name of the run output or view name.

Return type: str

property join_columns#

Returns the name of the join columns.

Return type: List[str]

property indicator_column_name#

Returns the name of the indicator column.

Return type: Optional[str]

check_compatibility_with_outputs(outputs, output_name)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters

outputs (Dict[str, pyspark.sql.DataFrame]) –
output_name (str) –

check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters: joined_output (pyspark.sql.DataFrame) –

compute_for_baseline(baseline_name, dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes metric value.

Parameters

baseline_name (str) –
dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, pyspark.sql.DataFrame]) –
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) –
program_parameters (Optional[Dict[str, Any]]) –

format_as_table_row(result)#

Return a table row summarizing the metric result.

Parameters: result (MetricResult) –
Return type: pandas.DataFrame

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) –

property name#

Returns the name of the metric.

Return type: str

property description#

Returns the description of the metric.

Return type: str

property baselines#

Returns the baselines used for the metric.

Return type: Optional[Union[str, List[str]]]

abstract format(value)#

Converts value to human-readable format.

Parameters: value (Any) –

format_as_dataframe(result)#

Returns the results of this metric formatted as a dataframe.

Parameters: result (tmlt.analytics.metrics.MetricResult) –
Return type: MetricResultDataframe

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricResult]

class GroupedMetric(output, join_columns, name, description, baselines, grouping_columns=None, join_how='inner', dropna_columns=None, indicator_column_name=None)#

Bases: JoinedOutputMetric, abc.ABC

Base class for metrics that can be computed on each group in a joined output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Subclasses of GroupedMetric define a compute_on_grouped_output method which takes in a single grouped dataframe, the result of joining the DP and baseline output tables with the given name on the given list of columns and grouping by the grouping column, and returns the metric value. The joined table is the result of performing an inner join between the DP and baseline tables on the given join columns.

Methods#
`grouping_columns()`	Returns the names of the grouping columns.
`check_compatibility_with_outputs()`	Check that a particular set of outputs is compatible with the metric.
`compute_on_grouped_output()`	Computes metric value from the joined, grouped DP and baseline output.
`format_as_table_row()`	Return a table row summarizing the metric result.
`format_as_dataframe()`	Returns the results of this metric formatted as a dataframe.
`output()`	Returns the name of the run output or view name.
`join_columns()`	Returns the name of the join columns.
`indicator_column_name()`	Returns the name of the indicator column.
`check_join_key_uniqueness()`	Check if the join keys uniquely identify rows in the joined DataFrame.
`compute_for_baseline()`	Computes metric value.
`check_compatibility_with_data()`	Check that the outputs have all the structure the metric expects.
`name()`	Returns the name of the metric.
`description()`	Returns the description of the metric.
`baselines()`	Returns the baselines used for the metric.
`format()`	Converts value to human-readable format.
`__call__()`	Computes the given metric on the given DP and baseline outputs.

Parameters

output (str) –
join_columns (List[str]) –
name (str) –
description (str) –
baselines (Optional[Union[str, List[str]]]) –
grouping_columns (Optional[List[str]]) –
join_how (str) –
dropna_columns (Optional[List[str]]) –
indicator_column_name (Optional[str]) –

__init__(output, join_columns, name, description, baselines, grouping_columns=None, join_how='inner', dropna_columns=None, indicator_column_name=None)#

Constructor.

Parameters

output (strstr) – The output to compute the metric for.
join_columns (List[str]List[str]) – The columns to join on.
name (strstr) – A name for the metric.
description (strstr) – A description of the metric.
baselines (str | List[str] | NoneUnion[str, List[str], None]) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.
grouping_columns (List[str] | NoneOptional[List[str]] (default: None)) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.
join_how (strstr (default: 'inner')) – The type of join to perform. Must be one of “left”, “right”, “inner”, “outer”. Defaults to “inner”.
dropna_columns (List[str] | NoneOptional[List[str]] (default: None)) – If specified, rows with nulls in these columns will be dropped.
indicator_column_name (str | NoneOptional[str] (default: None)) – If specified, we will add a column with the specified name to the joined data that contains either “dp”, “baseline”, or “both” to indicate where the values in the row came from.

property grouping_columns#

Returns the names of the grouping columns.

Return type: List[str]

check_compatibility_with_outputs(outputs, output_name)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters

outputs (Dict[str, pyspark.sql.DataFrame]) –
output_name (str) –

abstract compute_on_grouped_output(grouped_output, baseline_name, unprotected_inputs=None, program_parameters=None)#

Computes metric value from the joined, grouped DP and baseline output.

If grouping columns are empty, the grouped output will have one group that is the entire dataset.

Parameters

grouped_output (pyspark.sql.GroupedData) –
baseline_name (str) –
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) –
program_parameters (Optional[Dict[str, Any]]) –

format_as_table_row(result)#

Return a table row summarizing the metric result.

Parameters: result (tmlt.analytics.metrics.MetricResult) –
Return type: pandas.DataFrame

format_as_dataframe(result)#

Returns the results of this metric formatted as a dataframe.

Parameters: result (tmlt.analytics.metrics.MetricResult) –
Return type: pandas.DataFrame

property output#

Returns the name of the run output or view name.

Return type: str

property join_columns#

Returns the name of the join columns.

Return type: List[str]

property indicator_column_name#

Returns the name of the indicator column.

Return type: Optional[str]

check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters: joined_output (pyspark.sql.DataFrame) –

compute_for_baseline(baseline_name, dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes metric value.

Parameters

baseline_name (str) –
dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, pyspark.sql.DataFrame]) –
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) –
program_parameters (Optional[Dict[str, Any]]) –

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) –

property name#

Returns the name of the metric.

Return type: str

property description#

Returns the description of the metric.

Return type: str

property baselines#

Returns the baselines used for the metric.

Return type: Optional[Union[str, List[str]]]

abstract format(value)#

Converts value to human-readable format.

Parameters: value (Any) –

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricResult]

class MeasureColumnMetric(output, join_columns, measure_column, name, description, baselines, grouping_columns=None, join_how='inner', dropna_columns=None)#

Bases: GroupedMetric, abc.ABC

Base class for metrics that are computed on a single measure column.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Methods#
`measure_column()`	Returns the names of the grouping columns.
`check_compatibility_with_outputs()`	Check that a particular set of outputs is compatible with the metric.
`format_as_table_row()`	Return a table row summarizing the metric result.
`format_as_dataframe()`	Returns the results of this metric formatted as a dataframe.
`grouping_columns()`	Returns the names of the grouping columns.
`compute_on_grouped_output()`	Computes metric value from the joined, grouped DP and baseline output.
`output()`	Returns the name of the run output or view name.
`join_columns()`	Returns the name of the join columns.
`indicator_column_name()`	Returns the name of the indicator column.
`check_join_key_uniqueness()`	Check if the join keys uniquely identify rows in the joined DataFrame.
`compute_for_baseline()`	Computes metric value.
`check_compatibility_with_data()`	Check that the outputs have all the structure the metric expects.
`name()`	Returns the name of the metric.
`description()`	Returns the description of the metric.
`baselines()`	Returns the baselines used for the metric.
`format()`	Converts value to human-readable format.
`__call__()`	Computes the given metric on the given DP and baseline outputs.

Parameters

output (str) –
join_columns (List[str]) –
measure_column (str) –
name (str) –
description (str) –
baselines (Optional[Union[str, List[str]]]) –
grouping_columns (Optional[List[str]]) –
join_how (str) –
dropna_columns (Optional[List[str]]) –

__init__(output, join_columns, measure_column, name, description, baselines, grouping_columns=None, join_how='inner', dropna_columns=None)#

Constructor.

Parameters

output (strstr) – The output to compute the metric for.
join_columns (List[str]List[str]) – The columns to join on.
measure_column (strstr) – The column the measure will be calculated on.
name (strstr) – A name for the metric.
description (strstr) – A description of the metric.
baselines (str | List[str] | NoneUnion[str, List[str], None]) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.
grouping_columns (List[str] | NoneOptional[List[str]] (default: None)) – A set of columns that will be used to group the DP and baseline outputs. The error metric will be calculated for each group, and returned in a table. If grouping columns are None, the metric will be calculated over the whole output, and returned as a single number.
join_how (strstr (default: 'inner')) – The type of join to perform. Must be one of “left”, “right”, “inner”, “outer”. Defaults to “outer”.
dropna_columns (List[str] | NoneOptional[List[str]] (default: None)) – If specified, rows with nulls in these columns will be dropped.

property measure_column#

Returns the names of the grouping columns.

Return type: str

check_compatibility_with_outputs(outputs, output_name)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters

outputs (Dict[str, pyspark.sql.DataFrame]) –
output_name (str) –

format_as_table_row(result)#

Return a table row summarizing the metric result.

Parameters: result (MetricResult) –
Return type: pandas.DataFrame

format_as_dataframe(result)#

Returns the results of this metric formatted as a dataframe.

Parameters: result (tmlt.analytics.metrics.MetricResult) –
Return type: pandas.DataFrame

property grouping_columns#

Returns the names of the grouping columns.

Return type: List[str]

abstract compute_on_grouped_output(grouped_output, baseline_name, unprotected_inputs=None, program_parameters=None)#

Computes metric value from the joined, grouped DP and baseline output.

If grouping columns are empty, the grouped output will have one group that is the entire dataset.

Parameters

grouped_output (pyspark.sql.GroupedData) –
baseline_name (str) –
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) –
program_parameters (Optional[Dict[str, Any]]) –

property output#

Returns the name of the run output or view name.

Return type: str

property join_columns#

Returns the name of the join columns.

Return type: List[str]

property indicator_column_name#

Returns the name of the indicator column.

Return type: Optional[str]

check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters: joined_output (pyspark.sql.DataFrame) –

compute_for_baseline(baseline_name, dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes metric value.

Parameters

baseline_name (str) –
dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, pyspark.sql.DataFrame]) –
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) –
program_parameters (Optional[Dict[str, Any]]) –

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) –

property name#

Returns the name of the metric.

Return type: str

property description#

Returns the description of the metric.

Return type: str

property baselines#

Returns the baselines used for the metric.

Return type: Optional[Union[str, List[str]]]

abstract format(value)#

Converts value to human-readable format.

Parameters: value (Any) –

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricResult]

class ScalarMetric(output, name, description, column=None, baselines=None)#

Bases: SingleBaselineMetric, abc.ABC

Base class for metrics computed from outputs containing only one value.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Subclasses of ScalarMetric define a compute_on_scalar method which takes two values, each one taken from the given column of the given output, and returns a metric value. The given output must contain a single row in both the DP and baseline outputs.

Methods#
`output()`	Returns the name of the run output or view name.
`column()`	Returns the name of the value column, if it is set.
`check_compatibility_with_outputs()`	Check that a particular set of outputs is compatible with the metric.
`compute_for_baseline()`	Returns the metric value given the DP outputs and the baseline outputs.
`check_compatibility_with_data()`	Check that the outputs have all the structure the metric expects.
`name()`	Returns the name of the metric.
`description()`	Returns the description of the metric.
`baselines()`	Returns the baselines used for the metric.
`format()`	Converts value to human-readable format.
`format_as_table_row()`	Return a table row summarizing the metric result.
`format_as_dataframe()`	Returns the results of this metric formatted as a dataframe.
`__call__()`	Computes the given metric on the given DP and baseline outputs.

Parameters

output (str) –
name (str) –
description (str) –
column (Optional[str]) –
baselines (Optional[Union[str, List[str]]]) –

__init__(output, name, description, column=None, baselines=None)#

Constructor.

Parameters

output (strstr) – The output to compute the metric for.
column (str | NoneOptional[str] (default: None)) – The column to take the value from. If the given output has only one column, this argument may be omitted.
name (strstr) – A name for the metric.
description (strstr) – A description of the metric.
baselines (str | List[str] | NoneUnion[str, List[str], None] (default: None)) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property output#

Returns the name of the run output or view name.

Return type: str

property column#

Returns the name of the value column, if it is set.

Return type: Optional[str]

check_compatibility_with_outputs(outputs, output_name)#

Check that a particular set of outputs is compatible with the metric.

Should throw a ValueError if the metric is not compatible.

Parameters

outputs (Dict[str, pyspark.sql.DataFrame]) –
output_name (str) –

compute_for_baseline(baseline_name, dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Returns the metric value given the DP outputs and the baseline outputs.

Parameters

baseline_name (str) –
dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, pyspark.sql.DataFrame]) –
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) –
program_parameters (Optional[Dict[str, Any]]) –

Return type

Any

check_compatibility_with_data(dp_outputs, baseline_outputs)#

Check that the outputs have all the structure the metric expects.

Should throw a ValueError if the metric is not compatible.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) –
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) –

property name#

Returns the name of the metric.

Return type: str

property description#

Returns the description of the metric.

Return type: str

property baselines#

Returns the baselines used for the metric.

Return type: Optional[Union[str, List[str]]]

abstract format(value)#

Converts value to human-readable format.

Parameters: value (Any) –

abstract format_as_table_row(result)#

Return a table row summarizing the metric result.

Parameters: result (tmlt.analytics.metrics.MetricResult) –
Return type: pandas.DataFrame

format_as_dataframe(result)#

Returns the results of this metric formatted as a dataframe.

Parameters: result (tmlt.analytics.metrics.MetricResult) –
Return type: MetricResultDataframe

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters

dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.
baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.
unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.
program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricResult]

Tumult Analytics Pro

_base#

Classes#