_base#

Base classes for metrics.

Classes#

Metric

A generic metric.

MetricOutput

An output of a Metric with additional metadata.

SingleBaselineMetric

Base class for metrics computed from DP outputs and a single baseline’s outputs.

MultiBaselineMetric

Base class for metrics computed from DP outputs and multiple baseline outputs.

JoinedOutputMetric

Base class for metrics computed from join between single DP and baseline output.

ScalarMetric

Base class for metrics computed from outputs containing only one value.

class Metric(name, description, baselines)#

Bases: abc.ABC

A generic metric.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Parameters
  • name (str) –

  • description (str) –

  • baselines (Optional[Union[str, List[str]]]) –

__init__(name, description, baselines)#

Constructor.

Parameters
  • name (strstr) – A name for the metric.

  • description (strstr) – A description of the metric.

  • baselines (str | List[str] | NoneUnion[str, List[str], None]) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

abstract format(value)#

Converts value to human-readable format.

Parameters

value (Any) –

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricOutput]

class MetricOutput#

An output of a Metric with additional metadata.

Note

💡 This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

name :str#

The name of the metric.

description :str#

The description of the metric.

baseline :Union[str, List[str]]#

The name of the baseline program(s) used for the error report.

metric :Metric#

The metric that was used.

value :Any#

The value of the metric applied to the program outputs.

format()#

Returns a string representation of this object.

class SingleBaselineMetric(name, description, baselines)#

Bases: Metric, abc.ABC

Base class for metrics computed from DP outputs and a single baseline’s outputs.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Subclasses of SingleBaselineMetric define a compute_for_baseline method from DP outputs and one baseline’s outputs to the metric value.

Parameters
  • name (str) –

  • description (str) –

  • baselines (Optional[Union[str, List[str]]]) –

__init__(name, description, baselines)#

Constructor.

Parameters
  • name (strstr) – A name for the metric.

  • description (strstr) – A description of the metric.

  • baselines (str | List[str] | NoneUnion[str, List[str], None]) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

abstract format(value)#

Converts value to human-readable format.

Parameters

value (Any) –

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricOutput]

class MultiBaselineMetric(name, description, baselines)#

Bases: Metric, abc.ABC

Base class for metrics computed from DP outputs and multiple baseline outputs.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Subclasses of MultiBaselineMetric define a compute_for_multiple_baselines method from DP outputs and a collection of outputs from several baselines to the metric value.

Parameters
  • name (str) –

  • description (str) –

  • baselines (Optional[Union[str, List[str]]]) –

__init__(name, description, baselines)#

Constructor.

Parameters
  • name (strstr) – A name for the metric.

  • description (strstr) – A description of the metric.

  • baselines (str | List[str] | NoneUnion[str, List[str], None]) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

compute(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

The baseline_outputs will already be filtered to only include the baselines that the metric is supposed to use.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs, after filtering to only include the baselines that the metric is supposed to use.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricOutput]

property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

abstract format(value)#

Converts value to human-readable format.

Parameters

value (Any) –

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricOutput]

class JoinedOutputMetric(output, join_columns, name, description, baselines)#

Bases: SingleBaselineMetric, abc.ABC

Base class for metrics computed from join between single DP and baseline output.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Subclasses of JoinedOutputMetric define a compute_on_joined_output method which takes in a single dataframe, the result of joining the DP and baseline output tables with the given name on the given list of columns, and returns the metric value. The joined table is the result of performing an outer join between the DP and baseline tables on the given join columns.

Parameters
  • output (str) –

  • join_columns (List[str]) –

  • name (str) –

  • description (str) –

  • baselines (Optional[Union[str, List[str]]]) –

__init__(output, join_columns, name, description, baselines)#

Constructor.

Parameters
  • output (strstr) – The output to compute the metric for.

  • join_columns (List[str]List[str]) – The columns to join on.

  • name (strstr) – A name for the metric.

  • description (strstr) – A description of the metric.

  • baselines (str | List[str] | NoneUnion[str, List[str], None]) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property output#

Returns the name of the run output or view name.

Return type

str

property join_columns#

Returns the name of the join columns.

Return type

List[str]

check_join_key_uniqueness(joined_output)#

Check if the join keys uniquely identify rows in the joined DataFrame.

Parameters

joined_output (pyspark.sql.DataFrame) –

compute_for_baseline(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes metric value.

Parameters
property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

abstract format(value)#

Converts value to human-readable format.

Parameters

value (Any) –

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricOutput]

class ScalarMetric(output, name, description, column=None, baselines=None)#

Bases: SingleBaselineMetric, abc.ABC

Base class for metrics computed from outputs containing only one value.

Note

This is only available on a paid version of Tumult Analytics. If you would like to hear more, please contact us at info@tmlt.io.

Subclasses of ScalarMetric define a compute_on_scalar method which takes two values, each one taken from the given column of the given output, and returns a metric value. The given output must contain a single row in both the DP and baseline outputs.

Parameters
  • output (str) –

  • name (str) –

  • description (str) –

  • column (Optional[str]) –

  • baselines (Optional[Union[str, List[str]]]) –

__init__(output, name, description, column=None, baselines=None)#

Constructor.

Parameters
  • output (strstr) – The output to compute the metric for.

  • column (str | NoneOptional[str] (default: None)) – The column to take the value from. If the given output has only one column, this argument may be omitted.

  • name (strstr) – A name for the metric.

  • description (strstr) – A description of the metric.

  • baselines (str | List[str] | NoneUnion[str, List[str], None] (default: None)) – The name of the baseline program(s) used for the error report. If None, use all baselines specified as custom baseline and baseline options on tuner class. If no baselines are specified on tuner class, use default baseline. If a string, use only that baseline. If a list, use only those baselines.

property output#

Returns the name of the run output or view name.

Return type

str

property column#

Returns the name of the value column, if it is set.

Return type

Optional[str]

compute_for_baseline(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Returns the metric value given the DP outputs and the baseline outputs.

Parameters
Return type

Any

property name#

Returns the name of the metric.

Return type

str

property description#

Returns the description of the metric.

Return type

str

property baselines#

Returns the baselines used for the metric.

Return type

Optional[Union[str, List[str]]]

abstract format(value)#

Converts value to human-readable format.

Parameters

value (Any) –

__call__(dp_outputs, baseline_outputs, unprotected_inputs=None, program_parameters=None)#

Computes the given metric on the given DP and baseline outputs.

Parameters
  • dp_outputs (Dict[str, pyspark.sql.DataFrame]) – The differentially private outputs of the program.

  • baseline_outputs (Dict[str, Dict[str, pyspark.sql.DataFrame]]) – The outputs of the baseline programs.

  • unprotected_inputs (Optional[Dict[str, pyspark.sql.DataFrame]]) – Optional public dataframes used in error computation.

  • program_parameters (Optional[Dict[str, Any]]) – Optional program specific parameters used in error computation.

Return type

List[tmlt.analytics.metrics.MetricOutput]