aggregations#
Derived measurements for computing noisy aggregates on spark DataFrames.
Functions#
Returns a noisy count measurement. |
|
Returns a noisy count_distinct measurement. |
|
Returns a noisy sum measurement. |
|
Returns a noisy average measurement. |
|
Returns a noisy variance measurement. |
|
Returns a noisy standard deviation measurement. |
|
Returns a noisy quantile measurement. |
|
Returns the midpoint of lower and upper. |
|
Returns a partition selection measurement. |
|
Returns a bounds measurement. |
- create_count_measurement(input_domain, input_metric, output_measure, d_out, noise_mechanism, d_in=1, groupby_transformation=None, count_column=None)#
Returns a noisy count measurement.
This function constructs a measurement M with the following privacy contract - for any two inputs x, x’ that are
d_in
-close under theinput_metric
, M(x) and M(x’) are sampled from distributions that ared_out
apart under theoutput_measure
. Noise scale is computed appropriately for the specifiednoise_mechanism
such that the stated privacy property is guaranteed.Note
d_out
is interpreted as the “epsilon” parameter ifoutput_measure
isPureDP
, the “rho” parameter ifoutput_measure
isRhoZCDP
, and (“epsilon”, “delta”) ifoutput_measure
isApproxDP
.Note
ApproxDP
budgets with delta>0 are not yet supported.- Parameters:
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) – Domain of input spark DataFrames.
input_metric (Union[tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance, tmlt.core.metrics.IfGroupedBy]) – Distance metric on input DataFrames.
output_measure (Union[tmlt.core.measures.PureDP, tmlt.core.measures.ApproxDP, tmlt.core.measures.RhoZCDP]) – Desired privacy guarantee (one of
PureDP
,RhoZCDP
, orApproxDP
).d_out (tmlt.core.measures.PrivacyBudgetInput) – Desired distance between output distributions w.r.t.
d_in
. This is interpreted as “epsilon” if output_measure isPureDP
, “rho” if it isRhoZCDP
, and (“epsilon”, “delta”) if it isApproxDP
.noise_mechanism (NoiseMechanism) – Noise mechanism to apply to count(s).
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under the
input_metric
. The returned measurement is guaranteed to have output distributions that ared_out
apart for inputs that ared_in
apart. Defaults to 1.groupby_transformation (Optional[tmlt.core.transformations.spark_transformations.groupby.GroupBy]) – If provided, this measurement returns a DataFrame with noisy counts for each group obtained by applying the groupby transformation . Otherwise, this measurement outputs a single number - the noisy count.
count_column (Optional[str]) – If a
groupby_transformation
is provided, this is the column name to be used for counts in the dataframe output by the measurement. If None, this column will be named “count”.
- Return type:
- create_count_distinct_measurement(input_domain, input_metric, output_measure, d_out, noise_mechanism, d_in=1, groupby_transformation=None, count_column=None)#
Returns a noisy count_distinct measurement.
This function constructs a measurement M with the following privacy contract - for any two inputs x, x’ that are
d_in
-close under theinput_metric
, M(x) and M(x’) are sampled from distributions that ared_out
apart under theoutput_measure
. Noise scale is computed appropriately for the specifiednoise_mechanism
such that the stated privacy property is guaranteed.Note
d_out
is interpreted as the “epsilon” parameter ifoutput_measure
isPureDP
, the “rho” parameter ifoutput_measure
isRhoZCDP
, and (“epsilon”, “delta”) ifoutput_measure
isApproxDP
.Note
ApproxDP
budgets with delta>0 are not yet supported.- Parameters:
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) – Domain of input spark DataFrames.
input_metric (Union[tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance, tmlt.core.metrics.IfGroupedBy]) – Distance metric on input DataFrames.
output_measure (Union[tmlt.core.measures.PureDP, tmlt.core.measures.ApproxDP, tmlt.core.measures.RhoZCDP]) – Desired privacy guarantee (one of
PureDP
,RhoZCDP
, orApproxDP
).d_out (tmlt.core.measures.PrivacyBudgetInput) – Desired distance between output distributions w.r.t.
d_in
. This is interpreted as “epsilon” if output_measure isPureDP
, “rho” if it isRhoZCDP
, and (“epsilon”, “delta”) if it isApproxDP
.noise_mechanism (NoiseMechanism) – Noise mechanism to apply to count(s).
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under the
input_metric
. The returned measurement is guaranteed to have output distributions that ared_out
apart for inputs that ared_in
apart. Defaults to 1.groupby_transformation (Optional[tmlt.core.transformations.spark_transformations.groupby.GroupBy]) – If provided, this measurement returns a DataFrame with noisy counts for each group obtained by applying the groupby transformation. Otherwise, this measurement outputs a single number - the noisy count of distinct items.
count_column (Optional[str]) – If a
groupby_transformation
is provided, this is the column name to be used for counts in the dataframe output by the measurement. If None, this column will be named “count”.
- Return type:
- create_sum_measurement(input_domain, input_metric, output_measure, d_out, noise_mechanism, measure_column, lower, upper, d_in=1, groupby_transformation=None, sum_column=None)#
Returns a noisy sum measurement.
This function constructs a measurement M with the following privacy contract - for any two inputs x, x’ that are
d_in
-close under theinput_metric
, M(x) and M(x’) are sampled from distributions that ared_out
apart under theoutput_measure
. Noise scale is computed appropriately for the specifiednoise_mechanism
such that the stated privacy property is guaranteed.Note
d_out
is interpreted as the “epsilon” parameter ifoutput_measure
isPureDP
, the “rho” parameter ifoutput_measure
isRhoZCDP
, and (“epsilon”, “delta”) ifoutput_measure
isApproxDP
.Note
ApproxDP
budgets with delta>0 are not yet supported.- Parameters:
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) – Domain of input spark DataFrames.
input_metric (Union[tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance, tmlt.core.metrics.IfGroupedBy]) – Distance metric on input DataFrames.
output_measure (Union[tmlt.core.measures.PureDP, tmlt.core.measures.ApproxDP, tmlt.core.measures.RhoZCDP]) – Desired privacy guarantee (one of
PureDP
,RhoZCDP
, orApproxDP
).d_out (tmlt.core.measures.PrivacyBudgetInput) – Desired distance between output distributions w.r.t.
d_in
. This is interpreted as “epsilon” if output_measure isPureDP
, “rho” if it isRhoZCDP
, and (“epsilon”, “delta”) if it isApproxDP
.noise_mechanism (NoiseMechanism) – Noise mechanism to be applied to the sum(s).
measure_column (str) – Column to be summed.
lower (tmlt.core.utils.exact_number.ExactNumberInput) – Lower clipping bound on
measure_column
.upper (tmlt.core.utils.exact_number.ExactNumberInput) – Upper clipping bound on
measure_column
.d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under the
input_metric
. The returned measurement is guaranteed to have output distributions that ared_out
apart for inputs that ared_in
apart. Defaults to 1.groupby_transformation (Optional[tmlt.core.transformations.spark_transformations.groupby.GroupBy]) – If provided, this measurement returns a DataFrame with noisy sums for each group obtained by applying the groupby transformation. If None, this measurement outputs a single number - the noisy sum.
sum_column (Optional[str]) – If a
groupby_transformation
is supplied, this is the column name to be used for sums in the DataFrame output by the measurement. If None, this column will be named “sum(<measure_column>)”.
- Return type:
- create_average_measurement(input_domain, input_metric, output_measure, d_out, noise_mechanism, measure_column, lower, upper, d_in=1, groupby_transformation=None, average_column=None, keep_intermediates=False, sum_column=None, count_column=None)#
Returns a noisy average measurement.
This function constructs a measurement M with the following privacy contract - for any two inputs x, x’ that are
d_in
-close under theinput_metric
, M(x) and M(x’) are sampled from distributions that ared_out
apart under theoutput_measure
. Noise scale is computed appropriately for the specifiednoise_mechanism
such that the stated privacy property is guaranteed.Note
d_out
is interpreted as the “epsilon” parameter ifoutput_measure
isPureDP
, the “rho” parameter ifoutput_measure
isRhoZCDP
, and (“epsilon”, “delta”) ifoutput_measure
isApproxDP
.Note
ApproxDP
budgets with delta>0 are not yet supported.- Parameters:
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) – Domain of input DataFrames.
input_metric (Union[tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance, tmlt.core.metrics.IfGroupedBy]) – Distance metric on input DataFrames.
output_measure (Union[tmlt.core.measures.PureDP, tmlt.core.measures.ApproxDP, tmlt.core.measures.RhoZCDP]) – Desired privacy guarantee (one of
PureDP
,RhoZCDP
, orApproxDP
).d_out (tmlt.core.measures.PrivacyBudgetInput) – Desired distance between output distributions w.r.t.
d_in
. This is interpreted as “epsilon” if output_measure isPureDP
, “rho” if it isRhoZCDP
, and (“epsilon”, “delta”) if it isApproxDP
.noise_mechanism (NoiseMechanism) – Noise mechanism to apply.
measure_column (str) – Name to column to compute average of.
lower (tmlt.core.utils.exact_number.ExactNumberInput) – Lower clipping bound for
measure_column
.upper (tmlt.core.utils.exact_number.ExactNumberInput) – Upper clipping bound for
measure_column
.d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under the
input_metric
. The returned measurement is guaranteed to have output distributions that ared_out
apart for inputs that ared_in
apart. Defaults to 1.groupby_transformation (Optional[tmlt.core.transformations.spark_transformations.groupby.GroupBy]) – If provided, this measurement returns a DataFrame with noisy averages for each group obtained from the groupby transformation. If None, this measurement outputs a single number - the noisy average.
average_column (Optional[str]) – If a
groupby_transformation
is supplied, this is the column name to be used for noisy average in the DataFrame output by the measurement. If None, this column will be named “avg(<measure_column>)”.keep_intermediates (bool) – If True, intermediates (noisy sum of deviations and noisy count) will also be output in addition to the noisy average.
sum_column (Optional[str]) – If a
groupby_transformation
is supplied andkeep_intermediates
is True, this is the column name to be used for intermediate sums in the DataFrame output by the measurement. If None, this column will be named “sum(<measure_column>)”.count_column (Optional[str]) – If a
groupby_transformation
is supplied andkeep_intermediates
is True, this is the column name to be used for intermediate counts in the DataFrame output by the measurement. If None, this column will be named “count”.
- Return type:
Union[tmlt.core.measurements.postprocess.PostProcess, tmlt.core.measurements.converters.PureDPToApproxDP]
- create_variance_measurement(input_domain, input_metric, output_measure, d_out, noise_mechanism, measure_column, lower, upper, d_in=1, groupby_transformation=None, variance_column=None, keep_intermediates=False, sum_of_deviations_column=None, sum_of_squared_deviations_column=None, count_column=None)#
Returns a noisy variance measurement.
This function constructs a measurement M with the following privacy contract - for any two inputs x, x’ that are
d_in
-close under theinput_metric
, M(x) and M(x’) are sampled from distributions that ared_out
apart under theoutput_measure
. Noise scale is computed appropriately for the specifiednoise_mechanism
such that the stated privacy property is guaranteed.Note
d_out
is interpreted as the “epsilon” parameter ifoutput_measure
isPureDP
, the “rho” parameter ifoutput_measure
isRhoZCDP
, and (“epsilon”, “delta”) ifoutput_measure
isApproxDP
.Note
ApproxDP
budgets with delta>0 are not yet supported.- Parameters:
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) – Domain of input DataFrames.
input_metric (Union[tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance, tmlt.core.metrics.IfGroupedBy]) – Distance metric on input DataFrames.
output_measure (Union[tmlt.core.measures.PureDP, tmlt.core.measures.ApproxDP, tmlt.core.measures.RhoZCDP]) – Desired privacy guarantee (one of
PureDP
,RhoZCDP
, orApproxDP
).d_out (tmlt.core.measures.PrivacyBudgetInput) – Desired distance between output distributions w.r.t.
d_in
. This is interpreted as “epsilon” if output_measure isPureDP
, “rho” if it isRhoZCDP
, and (“epsilon”, “delta”) if it isApproxDP
.noise_mechanism (NoiseMechanism) – Noise mechanism to apply.
measure_column (str) – Name to column to compute variance of.
lower (tmlt.core.utils.exact_number.ExactNumberInput) – Lower clipping bound for
measure_column
.upper (tmlt.core.utils.exact_number.ExactNumberInput) – Upper clipping bound for
measure_column
.d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under the
input_metric
. The returned measurement is guaranteed to have output distributions that ared_out
apart for inputs that ared_in
apart. Defaults to 1.groupby_transformation (Optional[tmlt.core.transformations.spark_transformations.groupby.GroupBy]) – If provided, this measurement returns a DataFrame with a noisy variance for each group obtained from the groupby transformation. If None, this measurement outputs a single number - the noisy variance.
variance_column (Optional[str]) – If a
groupby_transformation
is supplied, this is the column name to be used for noisy variance in the DataFrame output by the measurement. If None, this column will be named “var(<measure_column>)”.keep_intermediates (bool) – If True, intermediates (noisy sum of deviations, noisy sum of squared deviations and noisy count) will also be output in addition to the noisy variance.
sum_of_deviations_column (Optional[str]) – If a
groupby_transformation
is supplied andkeep_intermediates
is True, this is the column name to be used for intermediate sums of deviations in the DataFrame output by the measurement. If None, this column will be named “sod(<measure_column>)”.sum_of_squared_deviations_column (Optional[str]) – If a
groupby_transformation
is supplied andkeep_intermediates
is True, this is the column name to be used for intermediate sums of squared deviations in the DataFrame output by the measurement. If None, this column will be named “sos(<measure_column>)”.count_column (Optional[str]) – If a
groupby_transformation
is supplied andkeep_intermediates
is True, this is the column name to be used for intermediate counts in the DataFrame output by the measurement. If None, this column will be named “count”.
- Return type:
Union[tmlt.core.measurements.postprocess.PostProcess, tmlt.core.measurements.converters.PureDPToApproxDP]
- create_standard_deviation_measurement(input_domain, input_metric, output_measure, d_out, noise_mechanism, measure_column, lower, upper, d_in=1, groupby_transformation=None, standard_deviation_column=None, keep_intermediates=False, sum_of_deviations_column=None, sum_of_squared_deviations_column=None, count_column=None)#
Returns a noisy standard deviation measurement.
This function constructs a measurement M with the following privacy contract - for any two inputs x, x’ that are
d_in
-close under theinput_metric
, M(x) and M(x’) are sampled from distributions that ared_out
apart under theoutput_measure
. Noise scale is computed appropriately for the specifiednoise_mechanism
such that the stated privacy property is guaranteed.Note
d_out
is interpreted as the “epsilon” parameter ifoutput_measure
isPureDP
, the “rho” parameter ifoutput_measure
isRhoZCDP
, and (“epsilon”, “delta”) ifoutput_measure
isApproxDP
.Note
ApproxDP
budgets with delta>0 are not yet supported.- Parameters:
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) – Domain of input DataFrames.
input_metric (Union[tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance, tmlt.core.metrics.IfGroupedBy]) – Distance metric on input DataFrames.
output_measure (Union[tmlt.core.measures.PureDP, tmlt.core.measures.ApproxDP, tmlt.core.measures.RhoZCDP]) – Desired privacy guarantee (one of
PureDP
,RhoZCDP
, orApproxDP
).d_out (tmlt.core.measures.PrivacyBudgetInput) – Desired distance between output distributions w.r.t.
d_in
. This is interpreted as “epsilon” if output_measure isPureDP
, “rho” if it isRhoZCDP
, and (“epsilon”, “delta”) if it isApproxDP
.noise_mechanism (NoiseMechanism) – Noise mechanism to apply.
measure_column (str) – Name to column to compute standard deviation of.
lower (tmlt.core.utils.exact_number.ExactNumberInput) – Lower clipping bound for
measure_column
.upper (tmlt.core.utils.exact_number.ExactNumberInput) – Upper clipping bound for
measure_column
.d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under the
input_metric
. The returned measurement is guaranteed to have output distributions that ared_out
apart for inputs that ared_in
apart. Defaults to 1.groupby_transformation (Optional[tmlt.core.transformations.spark_transformations.groupby.GroupBy]) – If provided, this measurement returns a DataFrame with noisy standard deviations for each group obtained by applying the groupby transformation. If None, this measurement outputs a single number - the noisy standard deviation of
measure_column
.standard_deviation_column (Optional[str]) – If a
groupby_transformation
is supplied, this is the column name to be used for noisy standard deviation in the DataFrame output by the measurement. If None, this column will be named “stddev(<measure_column>)”.keep_intermediates (bool) – If True, intermediates (noisy sum of deviations, noisy sum of squared deviations noisy count) will also be output in addition to the noisy standard deviation.
sum_of_deviations_column (Optional[str]) – If a
groupby_transformation
is supplied andkeep_intermediates
is True, this is the column name to be used for intermediate sums of deviations in the DataFrame output by the measurement. If None, this column will be named “sod(<measure_column>)”.sum_of_squared_deviations_column (Optional[str]) – If a
groupby_transformation
is supplied andkeep_intermediates
is True, this is the column name to be used for intermediate sums of squared_deviations in the DataFrame output by the measurement. If None, this column will be named “sos(<measure_column>)”.count_column (Optional[str]) – If a
groupby_transformation
is supplied andkeep_intermediates
is True, this is the column name to be used for intermediate counts in the DataFrame output by the measurement. If None, this column will be named “count”.
- Return type:
Union[tmlt.core.measurements.postprocess.PostProcess, tmlt.core.measurements.converters.PureDPToApproxDP]
- create_quantile_measurement(input_domain, input_metric, output_measure, d_out, measure_column, quantile, lower, upper, d_in=1, groupby_transformation=None, quantile_column=None)#
Returns a noisy quantile measurement.
This function constructs a measurement M with the following privacy contract - for any two inputs x, x’ that are
d_in
-close under theinput_metric
, M(x) and M(x’) are sampled from distributions that ared_out
apart under theoutput_measure
.Note
d_out
is interpreted as the “epsilon” parameter ifoutput_measure
isPureDP
, the “rho” parameter ifoutput_measure
isRhoZCDP
, and (“epsilon”, “delta”) ifoutput_measure
isApproxDP
.Note
ApproxDP
budgets with delta>0 are not yet supported.- Parameters:
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) – Domain of input DataFrames.
input_metric (Union[tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance, tmlt.core.metrics.IfGroupedBy]) – Distance metric on input DataFrames.
output_measure (Union[tmlt.core.measures.PureDP, tmlt.core.measures.ApproxDP, tmlt.core.measures.RhoZCDP]) – Desired privacy guarantee (one of
PureDP
,RhoZCDP
, orApproxDP
).d_out (tmlt.core.measures.PrivacyBudgetInput) – Desired distance between output distributions w.r.t.
d_in
. This is interpreted as “epsilon” if output_measure isPureDP
, “rho” if it isRhoZCDP
, and (“epsilon”, “delta”) if it isApproxDP
.measure_column (str) – Name to column to compute quantile of.
quantile (float) – The quantile to produce.
lower (Union[int, float]) – Lower clipping bound for
measure_column
.upper (Union[int, float]) – Upper clipping bound for
measure_column
.d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under the
input_metric
. The returned measurement is guaranteed to have output distributions that ared_out
apart for inputs that ared_in
apart. Defaults to 1.groupby_transformation (Optional[tmlt.core.transformations.spark_transformations.groupby.GroupBy]) – If provided, this measurement returns a DataFrame with noisy quantiles for each group obtained by applying groupby. If None, this measurement outputs a single number - the noisy quantile.
quantile_column (Optional[str]) – If a
groupby_transformation
is supplied, this is the column name to be used for noisy quantile in the DataFrame output by the measurement. If None, this column will be named “q_(<quantile>)_(<measure_column>)”.
- Return type:
Union[tmlt.core.measurements.postprocess.PostProcess, tmlt.core.measurements.converters.PureDPToApproxDP]
- get_midpoint(lower, upper, integer_midpoint=False)#
Returns the midpoint of lower and upper.
If integer_midpoint is True, the midpoint is rounded to the nearest integer using
round()
.Examples
>>> get_midpoint(1, 2) (1.5, 3/2) >>> get_midpoint(1, 5) (3.0, 3) >>> get_midpoint("0.2", "0.3") (0.25, 1/4) >>> get_midpoint(1, 9, integer_midpoint=True) (5, 5)
- Parameters:
lower (tmlt.core.utils.exact_number.ExactNumberInput) –
upper (tmlt.core.utils.exact_number.ExactNumberInput) –
integer_midpoint (bool) –
- Return type:
Tuple[Union[float, int], tmlt.core.utils.exact_number.ExactNumber]
- create_partition_selection_measurement(input_domain, epsilon, delta, d_in=1, count_column=None)#
Returns a partition selection measurement.
A partition selection measurement created by this function will have a privacy guarantee such that
measurement.privacy_function(d_in) = (epsilon, delta)
.- Parameters:
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) – Domain of the input Spark DataFrames. Input cannot contain floating point columns.
epsilon (tmlt.core.utils.exact_number.ExactNumberInput) – The epsilon portion of the (epsilon, delta) privacy budget that you want this measurement to satisfy.
delta (tmlt.core.utils.exact_number.ExactNumberInput) – The delta portion of the (epsilon, delta) privacy budget that you want this measurement to satisfy.
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – The given d_in such that
measurement.privacy_function(d_in) = (epsilon, delta)
.count_column (Optional[str]) – Column name for output group counts. If None, output column will be named “count”.
- Return type:
tmlt.core.measurements.spark_measurements.GeometricPartitionSelection
- create_bounds_measurement(input_domain, input_metric, output_measure, d_out, measure_column, threshold, d_in=1, groupby_transformation=None, upper_bound_column=None, lower_bound_column=None)#
Returns a bounds measurement.
The bounds measurement returns either a tuple of (lower bound, upper_bound) if no groupby transformation is provided, or a dataframe with one column for the lower bound and one column for the upper bound if a groupby transformation is provided.
The bounds measurement created by this function will have a privacy guarantee such that
measurement.privacy_function(d_in) = d_out
.- Parameters:
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) – Domain of the input Spark DataFrames.
input_metric (Union[tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.IfGroupedBy]) – Distance metric on input DataFrames.
output_measure (Union[tmlt.core.measures.PureDP, tmlt.core.measures.ApproxDP, tmlt.core.measures.RhoZCDP]) – Desired privacy guarantee.
d_out (tmlt.core.measures.PrivacyBudgetInput) – Desired distance between output distributions w.r.t.
d_in
. This is interpreted as “epsilon” if output_measure isPureDP
, “rho” if it isRhoZCDP
, and (“epsilon”, “delta”) if it isApproxDP
.measure_column (str) – Column name to calculate the bounds for. The column must be an integer or floating point column.
threshold (float) – The threshold for the bound selection measurement.
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – The given d_in such that
measurement.privacy_function(d_in) = epsilon
.groupby_transformation (Optional[tmlt.core.transformations.spark_transformations.groupby.GroupBy]) – If provided, this measurement returns a DataFrame with noisy bounds for each group obtained by applying groupby. If None, this measurement outputs a single tuple - the noisy bounds.
upper_bound_column (Optional[str]) – If a groupby_transformation is supplied, this is the column name to be used for the upper bound in the DataFrame output by the measurement. If None, this column will be named “upper_bound(<measure_column>)”.
lower_bound_column (Optional[str]) – If a groupby_transformation is supplied, this is the column name to be used for the lower bound in the DataFrame output by the measurement. If None, this column will be named “lower_bound(<measure_column>)”.
- Return type:
Union[tmlt.core.measurements.postprocess.PostProcess, tmlt.core.measurements.converters.PureDPToApproxDP, tmlt.core.measurements.converters.PureDPToRhoZCDP]
Classes#
Enumerating noise mechanisms. |
- class NoiseMechanism#
Bases:
enum.Enum
Enumerating noise mechanisms.
- check_output_measure(output_measure)#
Checks if the specified output measure is supported.
- Parameters:
output_measure (Union[tmlt.core.measures.PureDP, tmlt.core.measures.RhoZCDP]) –
- Return type:
None
- supported_output_measure()#
Returns a list of output measures supported by this noise mechanism.
- Return type:
List[Union[tmlt.core.measures.PureDP, tmlt.core.measures.RhoZCDP]]
- name()#
The name of the Enum member.
- value()#
The value of the Enum member.