QueryBuilder.average#

from tmlt.analytics import QueryBuilder

QueryBuilder.average(column, low, high, name=None, mechanism=AverageMechanism.DEFAULT)#

Returns an average query ready to be evaluated.

Note

If the column being measured contains NaN or null values, a drop_null_and_nan() query will be performed first. If the column being measured contains infinite values, a drop_infinity() query will be performed first.

Note

Regarding the clamping bounds:

The values for low and high are a choice the caller must make.
All data will be clamped to lie within this range.
The narrower the range, the less noise. Larger bounds mean more data is kept, but more noise needs to be added to the result.
The clamping bounds are assumed to be public information. Avoid using the private data to set these values.

More information can be found in the Numerical aggregations tutorial.

Example

>>> my_private_data.toPandas()
   A  B  X
0  0  1  0
1  1  0  1
2  1  2  1
>>> budget = PureDPBudget(float("inf"))
>>> sess = Session.from_dataframe(
...     privacy_budget=budget,
...     source_id="my_private_data",
...     dataframe=my_private_data,
...     protected_change=AddOneRow(),
... )
>>> # Building an average query
>>> query = (
...     QueryBuilder("my_private_data")
...     .average(column="B",low=0, high=2)
... )
>>> # Answering the query with infinite privacy budget
>>> answer = sess.evaluate(
...     query,
...     PureDPBudget(float("inf"))
... )
>>> answer.toPandas()
   B_average
0        1.0

Parameters:

column (str) – The column to compute the average over.
low (float) – The lower bound for clamping.
high (float) – The upper bound for clamping. Must be such that low is less than high.
name (Optional[str]) – The name to give the resulting aggregation column. Defaults to f"{column}_average".
mechanism (AverageMechanism) – Choice of noise mechanism. By default, the framework automatically selects an appropriate mechanism.

Return type:

Query

from tmlt.analytics import GroupedQueryBuilder

GroupedQueryBuilder.average(column, low, high, name=None, mechanism=AverageMechanism.DEFAULT)#

Returns a Query with an average query.

Note

If the column being measured contains NaN or null values, a drop_null_and_nan() query will be performed first. If the column being measured contains infinite values, a drop_infinity() query will be performed first.

Note

Regarding the clamping bounds:

The values for low and high are a choice the caller must make.
All data will be clamped to lie within this range.
The narrower the range, the less noise. Larger bounds mean more data is kept, but more noise needs to be added to the result.
The clamping bounds are assumed to be public information. Avoid using the private data to set these values.

More information can be found in the Numerical aggregations tutorial.

Example

>>> my_private_data.toPandas()
   A  B  X
0  0  1  0
1  1  0  1
2  1  2  1
>>> budget = PureDPBudget(float("inf"))
>>> sess = Session.from_dataframe(
...     privacy_budget=budget,
...     source_id="my_private_data",
...     dataframe=my_private_data,
...     protected_change=AddOneRow(),
... )
>>> # Building a groupby average query
>>> query = (
...     QueryBuilder("my_private_data")
...     .groupby(KeySet.from_dict({"A": ["0", "1"]}))
...     .average(column="B",low=0, high=2)
... )
>>> # Answering the query with infinite privacy budget
>>> answer = sess.evaluate(
...     query,
...     PureDPBudget(float("inf"))
... )
>>> answer.sort("A").toPandas()
   A  B_average
0  0        1.0
1  1        1.0

Parameters:

column (str) – The column to compute the average over.
low (float) – The lower bound for clamping.
high (float) – The upper bound for clamping. Must be such that low is less than high.
name (Optional[str]) – The name to give the resulting aggregation column. Defaults to f"{column}_average".
mechanism (AverageMechanism) – Choice of noise mechanism. By default, the framework automatically selects an appropriate mechanism.

Return type:

Query

from tmlt.analytics import AverageMechanism

class tmlt.analytics.AverageMechanism(value)#

Bases: Enum

Possible mechanisms for the average() aggregation.

Currently, the average() aggregation uses an additive noise mechanism to achieve differential privacy.

DEFAULT = 1#: The framework automatically selects an appropriate mechanism. This choice might change over time as additional optimizations are added to the library.

LAPLACE = 2#: Laplace and/or double-sided geometric noise is used, depending on the column type.

GAUSSIAN = 3#: Discrete and/or continuous Gaussian noise is used, depending on the column type. Not compatible with pure DP.

Tumult Platform

QueryBuilder.average#