_base_measurement_visitor#
Defines a base class for building measurement visitors.
Classes#
A visitor to create a measurement from a query expression. |
- class BaseMeasurementVisitor(privacy_budget, stability, input_domain, input_metric, output_measure, default_mechanism, public_sources, catalog, table_constraints)#
Bases:
tmlt.analytics.query_expr.QueryExprVisitor
A visitor to create a measurement from a query expression.
# Visit a PrivateSource query expression (raises an error).
Visit a Rename query expression (raises an error).
Visit a Filter query expression (raises an error).
Visit a Select query expression (raises an error).
Visit a Map query expression (raises an error).
Visit a FlatMap query expression (raises an error).
Visit a JoinPrivate query expression (raises an error).
Visit a JoinPublic query expression (raises an error).
Visit a ReplaceNullAndNan query expression (raises an error).
Visit a ReplaceInfinity query expression (raises an error).
Visit a DropNullAndNan query expression (raises an error).
Visit a DropInfinity query expression (raises an error).
Visit a EnforceConstraint query expression (raises an error).
Visit a GetGroups query expression (raises an error).
Build a Measurement for a GroupByCount query.
Create a measurement from a GroupByCount query expression.
Build a Measurement for a GroupByCountDistinct query.
Create a measurement from a GroupByCountDistinct query expression.
Build a Measurement for a GroupByQuantile query.
Create a measurement from a GroupByQuantile query expression.
Build a Measurement for a GroupByBoundedSum query.
Create a measurement from a GroupByBoundedSum query expression.
Build a Measurement for a GroupByBoundedAverage query.
Create a measurement from a GroupByBoundedAverage query expression.
Build a Measurement for a GroupByBoundedVariance query.
Create a measurement from a GroupByBoundedVariance query expression.
Build a Measurement for a GroupByBoundedStdev query.
Create a measurement from a GroupByBoundedStdev query expression.
- Parameters
privacy_budget (tmlt.analytics.privacy_budget.PrivacyBudget) –
stability (Any) –
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.DictMetric) –
output_measure (Union[tmlt.core.measures.PureDP, tmlt.core.measures.ApproxDP, tmlt.core.measures.RhoZCDP]) –
default_mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –
public_sources (Dict[str, pyspark.sql.DataFrame]) –
catalog (tmlt.analytics._catalog.Catalog) –
table_constraints (Dict[tmlt.analytics._table_identifier.Identifier, List[tmlt.analytics.constraints.Constraint]]) –
- __init__(privacy_budget, stability, input_domain, input_metric, output_measure, default_mechanism, public_sources, catalog, table_constraints)#
Constructor for MeasurementVisitor.
- Parameters
privacy_budget (tmlt.analytics.privacy_budget.PrivacyBudget) –
stability (Any) –
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.DictMetric) –
output_measure (Union[tmlt.core.measures.PureDP, tmlt.core.measures.ApproxDP, tmlt.core.measures.RhoZCDP]) –
default_mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –
public_sources (Dict[str, pyspark.sql.dataframe.DataFrame]) –
catalog (tmlt.analytics._catalog.Catalog) –
table_constraints (Dict[tmlt.analytics._table_identifier.Identifier, List[tmlt.analytics.constraints._base.Constraint]]) –
- abstract visit_private_source(expr)#
Visit a PrivateSource query expression (raises an error).
- Parameters
- Return type
Any
- abstract visit_rename(expr)#
Visit a Rename query expression (raises an error).
- Parameters
expr (tmlt.analytics.query_expr.Rename) –
- Return type
Any
- abstract visit_filter(expr)#
Visit a Filter query expression (raises an error).
- Parameters
expr (tmlt.analytics.query_expr.Filter) –
- Return type
Any
- abstract visit_select(expr)#
Visit a Select query expression (raises an error).
- Parameters
expr (tmlt.analytics.query_expr.Select) –
- Return type
Any
- abstract visit_map(expr)#
Visit a Map query expression (raises an error).
- Parameters
expr (tmlt.analytics.query_expr.Map) –
- Return type
Any
- abstract visit_flat_map(expr)#
Visit a FlatMap query expression (raises an error).
- Parameters
expr (tmlt.analytics.query_expr.FlatMap) –
- Return type
Any
- abstract visit_join_private(expr)#
Visit a JoinPrivate query expression (raises an error).
- Parameters
- Return type
Any
- abstract visit_join_public(expr)#
Visit a JoinPublic query expression (raises an error).
- Parameters
expr (tmlt.analytics.query_expr.JoinPublic) –
- Return type
Any
- abstract visit_replace_null_and_nan(expr)#
Visit a ReplaceNullAndNan query expression (raises an error).
- Parameters
- Return type
Any
- abstract visit_replace_infinity(expr)#
Visit a ReplaceInfinity query expression (raises an error).
- Parameters
- Return type
Any
- abstract visit_drop_null_and_nan(expr)#
Visit a DropNullAndNan query expression (raises an error).
- Parameters
- Return type
Any
- abstract visit_drop_infinity(expr)#
Visit a DropInfinity query expression (raises an error).
- Parameters
- Return type
Any
- abstract visit_enforce_constraint(expr)#
Visit a EnforceConstraint query expression (raises an error).
- Parameters
- Return type
Any
- abstract visit_get_groups(expr)#
Visit a GetGroups query expression (raises an error).
- Parameters
expr (tmlt.analytics.query_expr.GetGroups) –
- Return type
Any
- build_groupby_count(input_domain, input_metric, stability, mechanism, budget, groupby, output_column)#
Build a Measurement for a GroupByCount query.
- Parameters
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –
input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –
stability (Any) –
mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –
budget (tmlt.analytics.privacy_budget.PrivacyBudget) –
groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –
output_column (str) –
- Return type
tmlt.core.measurements.base.Measurement
- visit_groupby_count(expr)#
Create a measurement from a GroupByCount query expression.
- Parameters
- Return type
Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]
- build_count_distinct_measurement(input_domain, input_metric, mechanism, stability, budget, groupby, output_column)#
Build a Measurement for a GroupByCountDistinct query.
- Parameters
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –
input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –
mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –
stability (Any) –
budget (tmlt.analytics.privacy_budget.PrivacyBudget) –
groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –
output_column (str) –
- Return type
tmlt.core.measurements.base.Measurement
- visit_groupby_count_distinct(expr)#
Create a measurement from a GroupByCountDistinct query expression.
- Parameters
- Return type
Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]
- build_groupby_quantile(input_domain, input_metric, measure_column, quantile, lower, upper, stability, budget, groupby, output_column)#
Build a Measurement for a GroupByQuantile query.
- Parameters
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –
input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –
measure_column (str) –
quantile (float) –
stability (Any) –
budget (tmlt.analytics.privacy_budget.PrivacyBudget) –
groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –
output_column (str) –
- Return type
tmlt.core.measurements.base.Measurement
- visit_groupby_quantile(expr)#
Create a measurement from a GroupByQuantile query expression.
- Parameters
- Return type
Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]
- build_groupby_bounded_sum(input_domain, input_metric, measure_column, lower, upper, stability, mechanism, budget, groupby, output_column)#
Build a Measurement for a GroupByBoundedSum query.
- Parameters
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –
input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –
measure_column (str) –
lower (tmlt.core.utils.exact_number.ExactNumber) –
upper (tmlt.core.utils.exact_number.ExactNumber) –
stability (Any) –
mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –
budget (tmlt.analytics.privacy_budget.PrivacyBudget) –
groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –
output_column (str) –
- Return type
tmlt.core.measurements.base.Measurement
- visit_groupby_bounded_sum(expr)#
Create a measurement from a GroupByBoundedSum query expression.
- Parameters
- Return type
Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]
- build_groupby_bounded_average(input_domain, input_metric, measure_column, lower, upper, stability, mechanism, budget, groupby, output_column)#
Build a Measurement for a GroupByBoundedAverage query.
- Parameters
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –
input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –
measure_column (str) –
lower (tmlt.core.utils.exact_number.ExactNumber) –
upper (tmlt.core.utils.exact_number.ExactNumber) –
stability (Any) –
mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –
budget (tmlt.analytics.privacy_budget.PrivacyBudget) –
groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –
output_column (str) –
- Return type
tmlt.core.measurements.base.Measurement
- visit_groupby_bounded_average(expr)#
Create a measurement from a GroupByBoundedAverage query expression.
- Parameters
- Return type
Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]
- build_groupby_bounded_variance(input_domain, input_metric, measure_column, lower, upper, stability, mechanism, budget, groupby, output_column)#
Build a Measurement for a GroupByBoundedVariance query.
- Parameters
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –
input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –
measure_column (str) –
lower (tmlt.core.utils.exact_number.ExactNumber) –
upper (tmlt.core.utils.exact_number.ExactNumber) –
stability (Any) –
mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –
budget (tmlt.analytics.privacy_budget.PrivacyBudget) –
groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –
output_column (str) –
- Return type
tmlt.core.measurements.base.Measurement
- visit_groupby_bounded_variance(expr)#
Create a measurement from a GroupByBoundedVariance query expression.
- Parameters
- Return type
Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]
- build_groupby_bounded_stdev(input_domain, input_metric, measure_column, lower, upper, stability, mechanism, budget, groupby, output_column)#
Build a Measurement for a GroupByBoundedStdev query.
- Parameters
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –
input_metric (Union[tmlt.core.metrics.IfGroupedBy, tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance]) –
measure_column (str) –
lower (tmlt.core.utils.exact_number.ExactNumber) –
upper (tmlt.core.utils.exact_number.ExactNumber) –
stability (Any) –
mechanism (tmlt.core.measurements.aggregations.NoiseMechanism) –
budget (tmlt.analytics.privacy_budget.PrivacyBudget) –
groupby (tmlt.core.transformations.spark_transformations.groupby.GroupBy) –
output_column (str) –
- Return type
tmlt.core.measurements.base.Measurement
- visit_groupby_bounded_stdev(expr)#
Create a measurement from a GroupByBoundedStdev query expression.
- Parameters
- Return type
Tuple[tmlt.core.measurements.base.Measurement, tmlt.analytics._noise_info.NoiseInfo]