dataframe#

input_domain (PandasDataFrameDomain) – Input domain.
input_metric (Union[HammingDistance, SymmetricDifference]) – Input metric.
output_measure (Measure) – Output measure.
output_schema (StructType) – Spark StructType compatible with the output.

abstract __call__(df)#

Perform measurement.

Parameters:: df (pandas.DataFrame)
Return type:: pandas.DataFrame

privacy_function(d_in)#

Returns the smallest d_out satisfied by the measurement.

See the privacy and stability tutorial (add link?) for more information.

Parameters:: d_in (Any) – Distance between inputs under input_metric.
Raises:: NotImplementedError – If not overridden.
Return type:: Any

privacy_relation(d_in, d_out)#

Return True if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters:

d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_measure.

Return type:

bool

class AggregateByColumn(input_domain, column_to_aggregation, hint=None)#

Bases: Aggregate

Apply Aggregate measurements to columns of a Pandas DataFrame.

Parameters:

input_domain (tmlt.core.domains.pandas_domains.PandasDataFrameDomain)
column_to_aggregation (Mapping[str, tmlt.core.measurements.pandas_measurements.series.Aggregate])
hint (Optional[Callable[[tmlt.core.utils.exact_number.ExactNumberInput, tmlt.core.utils.exact_number.ExactNumberInput], Dict[str, tmlt.core.utils.exact_number.ExactNumberInput]]])

property column_to_aggregation: Dict[str, tmlt.core.measurements.pandas_measurements.series.Aggregate]#

Returns dictionary from column names to aggregation measurements.

Return type:: Dict[str, tmlt.core.measurements.pandas_measurements.series.Aggregate]

property input_domain: tmlt.core.domains.pandas_domains.PandasDataFrameDomain#

Return input domain for the measurement.

Return type:: tmlt.core.domains.pandas_domains.PandasDataFrameDomain

property output_schema: pyspark.sql.types.StructType#

Return the output schema.

Return type:: pyspark.sql.types.StructType

property input_metric: tmlt.core.metrics.Metric#

Distance metric on input domain.

Return type:: tmlt.core.metrics.Metric

property output_measure: tmlt.core.measures.Measure#

Distance measure on output.

Return type:: tmlt.core.measures.Measure

property is_interactive: bool#

Returns true iff the measurement is interactive.

Return type:: bool

__init__(input_domain, column_to_aggregation, hint=None)#

Constructor.

Parameters:

input_domain (PandasDataFrameDomain) – Input domain.
column_to_aggregation (Mapping[str, Aggregate]) – A dictionary mapping column names to aggregation measurements. The provided measurements must all have PureDP or all have RhoZCDP as their output_measure.
hint (Optional[Callable[[Union[ExactNumber, float, int, str, Fraction, Expr], Union[ExactNumber, float, int, str, Fraction, Expr]], Dict[str, Union[ExactNumber, float, int, str, Fraction, Expr]]]]) – An optional hint. A hint is only required if one or more of the measurement’s privacy_function() raise NotImplementedError. The hint takes in the same arguments as privacy_relation()., and should return a d_out for each aggregation to be composed, where all of the d_outs sum to less than the d_out passed into the hint.

privacy_function(d_in)#

Returns the smallest d_out satisfied by the measurement.

Returns the sum of the privacy_function()’s on d_in for all composed measurements.

Parameters:: d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
Raises:: NotImplementedError – If any of the measurements raise NotImplementedError.
Return type:: tmlt.core.utils.exact_number.ExactNumber

privacy_relation(d_in, d_out)#

Returns True only if outputs are close under close inputs.

Let d_outs be the d_out from the privacy_function()’s of all composed measurements or the d_outs from the hint if one of them raises NotImplementedError.

And total_d_out to be the sum of d_outs.

This returns True if total_d_out <= d_out (the input argument) and each composed measurement satisfies its privacy_relation() from d_in to its d_out from d_outs.

Parameters:

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
d_out (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between outputs under output_measure.

Return type:

bool

__call__(df)#

Perform the aggregation.

Parameters:: df (pandas.DataFrame) – The DataFrame to aggregate.
Return type:: pandas.DataFrame

Tumult Core

dataframe#

Classes#