QueryBuilder.get_groups#
from tmlt.analytics import QueryBuilder
- QueryBuilder.get_groups(columns=None)#
Returns a query that gets combinations of values in the listed columns.
Note
Because this uses differential privacy, it won’t include all of the values in the input dataset columns, and may even return no results at all on datasets that have few values for each set of group keys.
Example
>>> my_private_data = spark.createDataFrame( ... pd.DataFrame( ... [["0", 1, 0] for _ in range(10000)] ... + [["1", 2, 1] for _ in range(10000)], ... columns=["A", "B", "X"], ... ) ... ) >>> sess = Session.from_dataframe( ... privacy_budget=ApproxDPBudget(1, 1e-5), ... source_id="my_private_data", ... dataframe=my_private_data, ... protected_change=AddOneRow(), ... ) >>> # Building a get_groups query >>> query = ( ... QueryBuilder("my_private_data") ... .get_groups() ... ) >>> # Answering the query >>> answer = sess.evaluate( ... query, ... sess.remaining_privacy_budget ... ) >>> answer.toPandas() A B X 0 0 1 0 1 1 2 1
- Parameters:
columns (
Optional
[List
[str
]]) – Name of the column used to assign bins. If empty or none are provided, all of the columns in the table will be used, excluding any column marked as a privacy ID in a table with aProtectedChange
ofAddRowsWithID
.- Return type: