Session.evaluate#

from tmlt.analytics import Session
Session.evaluate(query_expr, privacy_budget)#

Answers a query within the given privacy budget and returns a Spark dataframe.

The type of privacy budget that you use must match the type your Session was initialized with (i.e., you cannot evaluate a query using RhoZCDPBudget if the Session was initialized with a PureDPBudget, and vice versa).

Example

>>> sess.private_sources
['my_private_data']
>>> sess.get_column_types("my_private_data") 
{'A': ColumnType.VARCHAR, 'B': ColumnType.INTEGER, 'X': ColumnType.INTEGER}
>>> sess.remaining_privacy_budget
PureDPBudget(epsilon=1)
>>> # Evaluate Queries
>>> filter_query = QueryBuilder("my_private_data").filter("A > 0")
>>> count_query = filter_query.groupby(KeySet.from_dict({"X": [0, 1]})).count()
>>> count_answer = sess.evaluate(
...     query_expr=count_query,
...     privacy_budget=PureDPBudget(0.5),
... )
>>> sum_query = filter_query.sum(column="B", low=0, high=1)
>>> sum_answer = sess.evaluate(
...     query_expr=sum_query,
...     privacy_budget=PureDPBudget(0.5),
... )
>>> count_answer # TODO(#798): Seed randomness and change to toPandas()
DataFrame[X: bigint, count: bigint]
>>> sum_answer # TODO(#798): Seed randomness and change to toPandas()
DataFrame[B_sum: bigint]
Parameters:
  • query_expr (Query) – One query expression to answer.

  • privacy_budget (PrivacyBudget) – The privacy budget used for the query.

Return type:

Any