QueryBuilder.get_groups#

from tmlt.analytics import QueryBuilder
QueryBuilder.get_groups(columns=None)#

Returns a query that gets combinations of values in the listed columns.

Note

Because this uses differential privacy, it won’t include all of the values in the input dataset columns, and may even return no results at all on datasets that have few values for each set of group keys.

Example

>>> my_private_data = spark.createDataFrame(
...     pd.DataFrame(
...         [["0", 1, 0] for _ in range(10000)]
...         + [["1", 2, 1] for _ in range(10000)],
...         columns=["A", "B", "X"],
...     )
... )
>>> sess = Session.from_dataframe(
...     privacy_budget=ApproxDPBudget(1, 1e-5),
...     source_id="my_private_data",
...     dataframe=my_private_data,
...     protected_change=AddOneRow(),
... )
>>> # Building a get_groups query
>>> query = (
...     QueryBuilder("my_private_data")
...     .get_groups()
... )
>>> # Answering the query
>>> answer = sess.evaluate(
...     query,
...     sess.remaining_privacy_budget
... )
>>> answer.toPandas()
   A  B  X
0  0  1  0
1  1  2  1
Parameters:

columns (Optional[List[str]]) – Name of the column used to assign bins. If empty or none are provided, all of the columns in the table will be used, excluding any column marked as a privacy ID in a table with a ProtectedChange of AddRowsWithID.

Return type:

Query