series#
Measurements on Pandas Series.
Classes#
Aggregate a Pandas Series and produce a float or int. |
|
Estimates the quantile of a Pandas Series. |
|
A measurement that adds noise to each value in a pandas Series. |
- class Aggregate(input_domain, input_metric, output_measure, output_spark_type)#
Bases:
tmlt.core.measurements.base.Measurement
Aggregate a Pandas Series and produce a float or int.
- Parameters
input_domain (tmlt.core.domains.pandas_domains.PandasSeriesDomain) –
input_metric (Union[tmlt.core.metrics.HammingDistance, tmlt.core.metrics.SymmetricDifference]) –
output_measure (tmlt.core.measures.Measure) –
output_spark_type (pyspark.sql.types.DataType) –
- __init__(input_domain, input_metric, output_measure, output_spark_type)#
Constructor.
- Parameters
input_domain (
PandasSeriesDomain
PandasSeriesDomain
) – Input domain. Must have type PandasSeriesDomain.input_metric (
HammingDistance
|SymmetricDifference
Union
[HammingDistance
,SymmetricDifference
]) – Input metric.output_spark_type (
DataType
DataType
) – Spark DataType of the output. This is required to use this measurement within a udf.
- property output_spark_type(self)#
Return the Spark type of the aggregated value.
- Return type
- abstract __call__(self, data)#
Perform measurement.
- Parameters
data (pandas.Series) –
- Return type
- property input_domain(self)#
Return input domain for the measurement.
- Return type
- property input_metric(self)#
Distance metric on input domain.
- Return type
- property output_measure(self)#
Distance measure on output.
- Return type
- privacy_function(self, d_in)#
Returns the smallest d_out satisfied by the measurement.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
Any
- privacy_relation(self, d_in, d_out)#
Return True if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_measure.
- Return type
- class NoisyQuantile(input_domain, output_measure, quantile, lower, upper, epsilon)#
Bases:
Aggregate
Estimates the quantile of a Pandas Series.
Methods# Returns the quantile to be computed.
Returns the lower clamping bound.
Returns the upper clamping bound.
Returns the PureDP privacy budget to be used for producing a quantile.
Returns the smallest d_out satisfied by the measurement.
Return DP answer(float) to quantile query.
Return the Spark type of the aggregated value.
Return input domain for the measurement.
Distance metric on input domain.
Distance measure on output.
Returns true iff the measurement is interactive.
Return True if close inputs produce close outputs.
- Parameters
input_domain (tmlt.core.domains.pandas_domains.PandasSeriesDomain) –
output_measure (Union[tmlt.core.measures.PureDP, tmlt.core.measures.RhoZCDP]) –
quantile (float) –
epsilon (tmlt.core.utils.exact_number.ExactNumberInput) –
- __init__(input_domain, output_measure, quantile, lower, upper, epsilon)#
Constructor.
- Parameters
input_domain (
PandasSeriesDomain
PandasSeriesDomain
) – Input domain. Must be PandasSeriesDomain.output_measure (
PureDP
|RhoZCDP
Union
[PureDP
,RhoZCDP
]) – Output measure.lower (
float
|int
Union
[float
,int
]) – The lower clamping bound.upper (
float
|int
Union
[float
,int
]) – The upper clamping bound.epsilon (
ExactNumber
|float
|int
|str
|Fraction
|Expr
Union
[ExactNumber
,float
,int
,str
,Fraction
,Expr
]) – The pure-dp privacy parameter to use to produce the quantile.
- property epsilon(self)#
Returns the PureDP privacy budget to be used for producing a quantile.
- Return type
- privacy_function(self, d_in)#
Returns the smallest d_out satisfied by the measurement.
This algorithm uses the exponential mechanism, so benefits from the same privacy analysis:
If the output measure is
PureDP
, returns\(\epsilon \cdot d_{in}\)
If the output measure is
RhoZCDP
, returns\(\frac{1}{8}(\epsilon \cdot d_{in})^2\)
where:
\(d_{in}\) is the input argument d_in
\(\epsilon\) is
epsilon
See [CR21] for the zCDP privacy analysis.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Return type
- __call__(self, data)#
Return DP answer(float) to quantile query.
TODO(#792) Add link to open-source paper: See this document for a description of the algorithm.
- Parameters
data (pandas.Series) – The Series on which to compute the quantile.
- Return type
- property output_spark_type(self)#
Return the Spark type of the aggregated value.
- Return type
- property input_domain(self)#
Return input domain for the measurement.
- Return type
- property input_metric(self)#
Distance metric on input domain.
- Return type
- property output_measure(self)#
Distance measure on output.
- Return type
- privacy_relation(self, d_in, d_out)#
Return True if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_measure.
- Return type
- class AddNoiseToSeries(noise_measurement)#
Bases:
tmlt.core.measurements.base.Measurement
A measurement that adds noise to each value in a pandas Series.
- Parameters
noise_measurement (Union[tmlt.core.measurements.noise_mechanisms.AddLaplaceNoise, tmlt.core.measurements.noise_mechanisms.AddGeometricNoise, tmlt.core.measurements.noise_mechanisms.AddDiscreteGaussianNoise]) –
- __init__(noise_measurement)#
Constructor.
- Parameters
noise_measurement (
AddLaplaceNoise
|AddGeometricNoise
|AddDiscreteGaussianNoise
Union
[AddLaplaceNoise
,AddGeometricNoise
,AddDiscreteGaussianNoise
]) – Noise Measurement to be applied to each element in input pandas Series.
- property noise_measurement(self)#
Returns measurement that adds noise to each number in pandas Series.
- property input_domain(self)#
Return input domain for the measurement.
- property output_type(self)#
Return the output data type after being used as a UDF.
- Return type
- privacy_function(self, d_in)#
Returns the smallest d_out satisfied by the measurement.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Return type
- __call__(self, values)#
Adds noise to each number in the input Series.
- Parameters
values (pandas.Series) –
- Return type
- property input_metric(self)#
Distance metric on input domain.
- Return type
- property output_measure(self)#
Distance measure on output.
- Return type
- privacy_relation(self, d_in, d_out)#
Return True if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_measure.
- Return type