series#

Measurements on Pandas Series.

Classes#

Aggregate

Aggregate a Pandas Series and produce a float or int.

NoisyQuantile

Estimates the quantile of a Pandas Series.

AddNoiseToSeries

A measurement that adds noise to each value in a pandas Series.

class Aggregate(input_domain, input_metric, output_measure, output_spark_type)#

Bases: tmlt.core.measurements.base.Measurement

Aggregate a Pandas Series and produce a float or int.

Parameters
__init__(input_domain, input_metric, output_measure, output_spark_type)#

Constructor.

Parameters
property output_spark_type#

Return the Spark type of the aggregated value.

Return type

pyspark.sql.types.DataType

abstract __call__(data)#

Perform measurement.

Parameters

data (pandas.Series) –

Return type

Union[float, int]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_measure#

Distance measure on output.

Return type

tmlt.core.measures.Measure

property is_interactive#

Returns true iff the measurement is interactive.

Return type

bool

privacy_function(d_in)#

Returns the smallest d_out satisfied by the measurement.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (Any) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

Any

privacy_relation(d_in, d_out)#

Return True if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_measure.

Return type

bool

class NoisyQuantile(input_domain, output_measure, quantile, lower, upper, epsilon)#

Bases: Aggregate

Estimates the quantile of a Pandas Series.

Methods#

quantile()

Returns the quantile to be computed.

lower()

Returns the lower clamping bound.

upper()

Returns the upper clamping bound.

epsilon()

Returns the PureDP privacy budget to be used for producing a quantile.

privacy_function()

Returns the smallest d_out satisfied by the measurement.

__call__()

Return DP answer(float) to quantile query.

output_spark_type()

Return the Spark type of the aggregated value.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_measure()

Distance measure on output.

is_interactive()

Returns true iff the measurement is interactive.

privacy_relation()

Return True if close inputs produce close outputs.

Parameters
__init__(input_domain, output_measure, quantile, lower, upper, epsilon)#

Constructor.

Parameters
property quantile#

Returns the quantile to be computed.

Return type

float

property lower#

Returns the lower clamping bound.

Return type

Union[float, int]

property upper#

Returns the upper clamping bound.

Return type

Union[float, int]

property epsilon#

Returns the PureDP privacy budget to be used for producing a quantile.

Return type

tmlt.core.utils.exact_number.ExactNumber

privacy_function(d_in)#

Returns the smallest d_out satisfied by the measurement.

This algorithm uses the exponential mechanism, so benefits from the same privacy analysis:

If the output measure is PureDP, returns

\(\epsilon \cdot d_{in}\)

If the output measure is RhoZCDP, returns

\(\frac{1}{8}(\epsilon \cdot d_{in})^2\)

where:

  • \(d_{in}\) is the input argument d_in

  • \(\epsilon\) is epsilon

See [CR21] for the zCDP privacy analysis.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Return DP answer(float) to quantile query.

TODO(#792) Add link to open-source paper: See this document for a description of the algorithm.

Parameters

data (pandas.Series) – The Series on which to compute the quantile.

Return type

float

property output_spark_type#

Return the Spark type of the aggregated value.

Return type

pyspark.sql.types.DataType

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_measure#

Distance measure on output.

Return type

tmlt.core.measures.Measure

property is_interactive#

Returns true iff the measurement is interactive.

Return type

bool

privacy_relation(d_in, d_out)#

Return True if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_measure.

Return type

bool

class AddNoiseToSeries(noise_measurement)#

Bases: tmlt.core.measurements.base.Measurement

A measurement that adds noise to each value in a pandas Series.

Parameters

noise_measurement (Union[tmlt.core.measurements.noise_mechanisms.AddLaplaceNoise, tmlt.core.measurements.noise_mechanisms.AddGeometricNoise, tmlt.core.measurements.noise_mechanisms.AddDiscreteGaussianNoise, tmlt.core.measurements.noise_mechanisms.AddGaussianNoise]) –

__init__(noise_measurement)#

Constructor.

Parameters

noise_measurement (AddLaplaceNoise | AddGeometricNoise | AddDiscreteGaussianNoise | AddGaussianNoiseUnion[AddLaplaceNoise, AddGeometricNoise, AddDiscreteGaussianNoise, AddGaussianNoise]) – Noise Measurement to be applied to each element in input pandas Series.

property noise_measurement#

Returns measurement that adds noise to each number in pandas Series.

Return type

Union[tmlt.core.measurements.noise_mechanisms.AddLaplaceNoise, tmlt.core.measurements.noise_mechanisms.AddGeometricNoise, tmlt.core.measurements.noise_mechanisms.AddDiscreteGaussianNoise, tmlt.core.measurements.noise_mechanisms.AddGaussianNoise]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.pandas_domains.PandasSeriesDomain

property output_type#

Return the output data type after being used as a UDF.

Return type

pyspark.sql.types.DataType

privacy_function(d_in)#

Returns the smallest d_out satisfied by the measurement.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(values)#

Adds noise to each number in the input Series.

Parameters

values (pandas.Series) –

Return type

pandas.Series

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_measure#

Distance measure on output.

Return type

tmlt.core.measures.Measure

property is_interactive#

Returns true iff the measurement is interactive.

Return type

bool

privacy_relation(d_in, d_out)#

Return True if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_measure.

Return type

bool