QueryBuilder.count#
from tmlt.analytics import QueryBuilder
- QueryBuilder.count(name=None, mechanism=CountMechanism.DEFAULT)#
Returns a count query ready to be evaluated.
Note
Differentially private counts may return values that are not possible for a non-DP query - including negative values. You can enforce non-negativity once the query returns its results; see the example below.
Example
>>> my_private_data.toPandas() A B X 0 0 1 0 1 1 0 1 2 1 2 1 >>> budget = PureDPBudget(float("inf")) >>> sess = Session.from_dataframe( ... privacy_budget=budget, ... source_id="my_private_data", ... dataframe=my_private_data, ... protected_change=AddOneRow(), ... ) >>> # Building a count query >>> query = ( ... QueryBuilder("my_private_data") ... .count() ... ) >>> # Answering the query with infinite privacy budget >>> answer = sess.evaluate( ... query, ... PureDPBudget(float("inf")) ... ) >>> answer.toPandas() count 0 3 >>> # Ensuring all results are non-negative >>> import pyspark.sql.functions as sf >>> answer = answer.withColumn( ... "count", sf.when(sf.col("count") < 0, 0).otherwise( ... sf.col("count") ... ) ... ) >>> answer.toPandas() count 0 3
- Parameters:
name (
Optional
[str
]) – Name for the resulting aggregation column. Defaults to “count”.mechanism (
CountMechanism
) – Choice of noise mechanism. By default, the framework automatically selects an appropriate mechanism.
- Return type:
from tmlt.analytics import GroupedQueryBuilder
- GroupedQueryBuilder.count(name=None, mechanism=CountMechanism.DEFAULT)#
Returns a GroupedCountQuery with a count query.
Example
>>> my_private_data.toPandas() A B X 0 0 1 0 1 1 0 1 2 1 2 1 >>> budget = PureDPBudget(float("inf")) >>> sess = Session.from_dataframe( ... privacy_budget=budget, ... source_id="my_private_data", ... dataframe=my_private_data, ... protected_change=AddOneRow(), ... ) >>> # Building a groupby count query >>> query = ( ... QueryBuilder("my_private_data") ... .groupby(KeySet.from_dict({"A": ["0", "1"]})) ... .count() ... ) >>> # Answering the query with infinite privacy budget >>> answer = sess.evaluate( ... query, ... PureDPBudget(float("inf")) ... ) >>> answer.sort("A").toPandas() A count 0 0 1 1 1 2
- Parameters:
name (
Optional
[str
]) – Name for the resulting aggregation column. Defaults to “count”.mechanism (
CountMechanism
) – Choice of noise mechanism. By default, the framework automatically selects an appropriate mechanism.
- Return type:
from tmlt.analytics import CountMechanism
- class tmlt.analytics.CountMechanism(value)#
Bases:
Enum
Possible mechanisms for the count() aggregation.
Currently, the
count()
aggregation uses an additive noise mechanism to achieve differential privacy.- DEFAULT = 1#
The framework automatically selects an appropriate mechanism. This choice might change over time as additional optimizations are added to the library.
- LAPLACE = 2#
Double-sided geometric noise is used.
- GAUSSIAN = 3#
The discrete Gaussian mechanism is used. Not compatible with pure DP.