_measurement_visitor#

Defines a visitor for creating noisy measurements from query expressions.

Classes#

MeasurementVisitor

A visitor to create a measurement from a DP query expression.

class MeasurementVisitor(privacy_budget, stability, input_domain, input_metric, output_measure, default_mechanism, public_sources, catalog, table_constraints)#

Bases: tmlt.analytics._query_expr_compiler._base_measurement_visitor.BaseMeasurementVisitor

A visitor to create a measurement from a DP query expression.

Methods#

visit_get_groups()

Create a measurement from a GetGroups query expression.

visit_private_source()

Visit a PrivateSource query expression (raises an error).

visit_rename()

Visit a Rename query expression (raises an error).

visit_filter()

Visit a Filter query expression (raises an error).

visit_select()

Visit a Select query expression (raises an error).

visit_map()

Visit a Map query expression (raises an error).

visit_flat_map()

Visit a FlatMap query expression (raises an error).

visit_join_private()

Visit a JoinPrivate query expression (raises an error).

visit_join_public()

Visit a JoinPublic query expression (raises an error).

visit_replace_null_and_nan()

Visit a ReplaceNullAndNan query expression (raises an error).

visit_replace_infinity()

Visit a ReplaceInfinity query expression (raises an error).

visit_drop_null_and_nan()

Visit a DropNullAndNan query expression (raises an error).

visit_drop_infinity()

Visit a DropInfinity query expression (raises an error).

visit_enforce_constraint()

Visit a EnforceConstraint query expression (raises an error).

build_groupby_count()

Build a Measurement for a GroupByCount query.

visit_groupby_count()

Create a measurement from a GroupByCount query expression.

build_count_distinct_measurement()

Build a Measurement for a GroupByCountDistinct query.

visit_groupby_count_distinct()

Create a measurement from a GroupByCountDistinct query expression.

build_groupby_quantile()

Build a Measurement for a GroupByQuantile query.

visit_groupby_quantile()

Create a measurement from a GroupByQuantile query expression.

build_groupby_bounded_sum()

Build a Measurement for a GroupByBoundedSum query.

visit_groupby_bounded_sum()

Create a measurement from a GroupByBoundedSum query expression.

build_groupby_bounded_average()

Build a Measurement for a GroupByBoundedAverage query.

visit_groupby_bounded_average()

Create a measurement from a GroupByBoundedAverage query expression.

build_groupby_bounded_variance()

Build a Measurement for a GroupByBoundedVariance query.

visit_groupby_bounded_variance()

Create a measurement from a GroupByBoundedVariance query expression.

build_groupby_bounded_stdev()

Build a Measurement for a GroupByBoundedStdev query.

visit_groupby_bounded_stdev()

Create a measurement from a GroupByBoundedStdev query expression.

Parameters
__init__(privacy_budget, stability, input_domain, input_metric, output_measure, default_mechanism, public_sources, catalog, table_constraints)#

Constructor for MeasurementVisitor.

Parameters
visit_get_groups(expr)#

Create a measurement from a GetGroups query expression.

Parameters

expr (tmlt.analytics.query_expr.GetGroups) –

Return type

Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]

abstract visit_private_source(expr)#

Visit a PrivateSource query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.PrivateSource) –

Return type

Any

abstract visit_rename(expr)#

Visit a Rename query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.Rename) –

Return type

Any

abstract visit_filter(expr)#

Visit a Filter query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.Filter) –

Return type

Any

abstract visit_select(expr)#

Visit a Select query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.Select) –

Return type

Any

abstract visit_map(expr)#

Visit a Map query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.Map) –

Return type

Any

abstract visit_flat_map(expr)#

Visit a FlatMap query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.FlatMap) –

Return type

Any

abstract visit_join_private(expr)#

Visit a JoinPrivate query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.JoinPrivate) –

Return type

Any

abstract visit_join_public(expr)#

Visit a JoinPublic query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.JoinPublic) –

Return type

Any

abstract visit_replace_null_and_nan(expr)#

Visit a ReplaceNullAndNan query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.ReplaceNullAndNan) –

Return type

Any

abstract visit_replace_infinity(expr)#

Visit a ReplaceInfinity query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.ReplaceInfinity) –

Return type

Any

abstract visit_drop_null_and_nan(expr)#

Visit a DropNullAndNan query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.DropNullAndNan) –

Return type

Any

abstract visit_drop_infinity(expr)#

Visit a DropInfinity query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.DropInfinity) –

Return type

Any

abstract visit_enforce_constraint(expr)#

Visit a EnforceConstraint query expression (raises an error).

Parameters

expr (tmlt.analytics.query_expr.EnforceConstraint) –

Return type

Any

build_groupby_count(input_domain, input_metric, stability, mechanism, budget, groupby, output_column)#

Build a Measurement for a GroupByCount query.

Parameters
  • input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –

  • input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –

  • stability (Any) –

  • mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –

  • budget (tmlt.analytics.privacy_budget.PrivacyBudget) –

  • groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –

  • output_column (str) –

Return type

tmlt.core.measurements.base.Measurement

visit_groupby_count(expr)#

Create a measurement from a GroupByCount query expression.

Parameters

expr (tmlt.analytics.query_expr.GroupByCount) –

Return type

Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]

build_count_distinct_measurement(input_domain, input_metric, mechanism, stability, budget, groupby, output_column)#

Build a Measurement for a GroupByCountDistinct query.

Parameters
  • input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –

  • input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –

  • mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –

  • stability (Any) –

  • budget (tmlt.analytics.privacy_budget.PrivacyBudget) –

  • groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –

  • output_column (str) –

Return type

tmlt.core.measurements.base.Measurement

visit_groupby_count_distinct(expr)#

Create a measurement from a GroupByCountDistinct query expression.

Parameters

expr (tmlt.analytics.query_expr.GroupByCountDistinct) –

Return type

Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]

build_groupby_quantile(input_domain, input_metric, measure_column, quantile, lower, upper, stability, budget, groupby, output_column)#

Build a Measurement for a GroupByQuantile query.

Parameters
  • input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –

  • input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –

  • measure_column (str) –

  • quantile (float) –

  • lower (Union[int, float]) –

  • upper (Union[int, float]) –

  • stability (Any) –

  • budget (tmlt.analytics.privacy_budget.PrivacyBudget) –

  • groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –

  • output_column (str) –

Return type

tmlt.core.measurements.base.Measurement

visit_groupby_quantile(expr)#

Create a measurement from a GroupByQuantile query expression.

Parameters

expr (tmlt.analytics.query_expr.GroupByQuantile) –

Return type

Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]

build_groupby_bounded_sum(input_domain, input_metric, measure_column, lower, upper, stability, mechanism, budget, groupby, output_column)#

Build a Measurement for a GroupByBoundedSum query.

Parameters
  • input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –

  • input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –

  • measure_column (str) –

  • lower (tmlt.core.utils.exact_number.ExactNumber) –

  • upper (tmlt.core.utils.exact_number.ExactNumber) –

  • stability (Any) –

  • mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –

  • budget (tmlt.analytics.privacy_budget.PrivacyBudget) –

  • groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –

  • output_column (str) –

Return type

tmlt.core.measurements.base.Measurement

visit_groupby_bounded_sum(expr)#

Create a measurement from a GroupByBoundedSum query expression.

Parameters

expr (tmlt.analytics.query_expr.GroupByBoundedSum) –

Return type

Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]

build_groupby_bounded_average(input_domain, input_metric, measure_column, lower, upper, stability, mechanism, budget, groupby, output_column)#

Build a Measurement for a GroupByBoundedAverage query.

Parameters
  • input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –

  • input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –

  • measure_column (str) –

  • lower (tmlt.core.utils.exact_number.ExactNumber) –

  • upper (tmlt.core.utils.exact_number.ExactNumber) –

  • stability (Any) –

  • mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –

  • budget (tmlt.analytics.privacy_budget.PrivacyBudget) –

  • groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –

  • output_column (str) –

Return type

tmlt.core.measurements.base.Measurement

visit_groupby_bounded_average(expr)#

Create a measurement from a GroupByBoundedAverage query expression.

Parameters

expr (tmlt.analytics.query_expr.GroupByBoundedAverage) –

Return type

Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]

build_groupby_bounded_variance(input_domain, input_metric, measure_column, lower, upper, stability, mechanism, budget, groupby, output_column)#

Build a Measurement for a GroupByBoundedVariance query.

Parameters
  • input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –

  • input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –

  • measure_column (str) –

  • lower (tmlt.core.utils.exact_number.ExactNumber) –

  • upper (tmlt.core.utils.exact_number.ExactNumber) –

  • stability (Any) –

  • mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –

  • budget (tmlt.analytics.privacy_budget.PrivacyBudget) –

  • groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –

  • output_column (str) –

Return type

tmlt.core.measurements.base.Measurement

visit_groupby_bounded_variance(expr)#

Create a measurement from a GroupByBoundedVariance query expression.

Parameters

expr (tmlt.analytics.query_expr.GroupByBoundedVariance) –

Return type

Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]

build_groupby_bounded_stdev(input_domain, input_metric, measure_column, lower, upper, stability, mechanism, budget, groupby, output_column)#

Build a Measurement for a GroupByBoundedStdev query.

Parameters
  • input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –

  • input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –

  • measure_column (str) –

  • lower (tmlt.core.utils.exact_number.ExactNumber) –

  • upper (tmlt.core.utils.exact_number.ExactNumber) –

  • stability (Any) –

  • mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –

  • budget (tmlt.analytics.privacy_budget.PrivacyBudget) –

  • groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –

  • output_column (str) –

Return type

tmlt.core.measurements.base.Measurement

visit_groupby_bounded_stdev(expr)#

Create a measurement from a GroupByBoundedStdev query expression.

Parameters

expr (tmlt.analytics.query_expr.GroupByBoundedSTDEV) –

Return type

Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]