QueryBuilder.count#

from tmlt.analytics import QueryBuilder
QueryBuilder.count(name=None, mechanism=CountMechanism.DEFAULT)#

Returns a count query ready to be evaluated.

Note

Differentially private counts may return values that are not possible for a non-DP query - including negative values. You can enforce non-negativity once the query returns its results; see the example below.

Example

>>> my_private_data.toPandas()
   A  B  X
0  0  1  0
1  1  0  1
2  1  2  1
>>> budget = PureDPBudget(float("inf"))
>>> sess = Session.from_dataframe(
...     privacy_budget=budget,
...     source_id="my_private_data",
...     dataframe=my_private_data,
...     protected_change=AddOneRow(),
... )
>>> # Building a count query
>>> query = (
...     QueryBuilder("my_private_data")
...     .count()
... )
>>> # Answering the query with infinite privacy budget
>>> answer = sess.evaluate(
...     query,
...     PureDPBudget(float("inf"))
... )
>>> answer.toPandas()
   count
0      3
>>> # Ensuring all results are non-negative
>>> import pyspark.sql.functions as sf
>>> answer = answer.withColumn(
...     "count", sf.when(sf.col("count") < 0, 0).otherwise(
...             sf.col("count")
...     )
... )
>>> answer.toPandas()
   count
0      3
Parameters:
  • name (Optional[str]) – Name for the resulting aggregation column. Defaults to “count”.

  • mechanism (CountMechanism) – Choice of noise mechanism. By default, the framework automatically selects an appropriate mechanism.

Return type:

Query

from tmlt.analytics import GroupedQueryBuilder
GroupedQueryBuilder.count(name=None, mechanism=CountMechanism.DEFAULT)#

Returns a GroupedCountQuery with a count query.

Example

>>> my_private_data.toPandas()
   A  B  X
0  0  1  0
1  1  0  1
2  1  2  1
>>> budget = PureDPBudget(float("inf"))
>>> sess = Session.from_dataframe(
...     privacy_budget=budget,
...     source_id="my_private_data",
...     dataframe=my_private_data,
...     protected_change=AddOneRow(),
... )
>>> # Building a groupby count query
>>> query = (
...     QueryBuilder("my_private_data")
...     .groupby(KeySet.from_dict({"A": ["0", "1"]}))
...     .count()
... )
>>> # Answering the query with infinite privacy budget
>>> answer = sess.evaluate(
...     query,
...     PureDPBudget(float("inf"))
... )
>>> answer.sort("A").toPandas()
   A  count
0  0      1
1  1      2
Parameters:
  • name (Optional[str]) – Name for the resulting aggregation column. Defaults to “count”.

  • mechanism (CountMechanism) – Choice of noise mechanism. By default, the framework automatically selects an appropriate mechanism.

Return type:

GroupbyCountQuery

from tmlt.analytics import CountMechanism
class tmlt.analytics.CountMechanism(value)#

Bases: Enum

Possible mechanisms for the count() aggregation.

Currently, the count() aggregation uses an additive noise mechanism to achieve differential privacy.

DEFAULT = 1#

The framework automatically selects an appropriate mechanism. This choice might change over time as additional optimizations are added to the library.

LAPLACE = 2#

Double-sided geometric noise is used.

GAUSSIAN = 3#

The discrete Gaussian mechanism is used. Not compatible with pure DP.