QueryBuilder.flat_map_by_id#

from tmlt.analytics import QueryBuilder
QueryBuilder.flat_map_by_id(f, new_column_types)#

Applies a transformation to each group of records sharing an ID.

Transforms groups of records that all share a common ID into a new group of records with that same ID based on a user-provided function. The number of rows produced does not have to match the number of input rows. The ID column is automatically added to the output of f, but all other input columns are lost unless f copies them into its output.

Note that this transformation is only valid on tables with the AddRowsWithID protected change.

If you provide only a ColumnType for the new column types, Analytics assumes that all new columns created may contain null values (and that DECIMAL columns may contain NaN or infinite values).

Example

>>> my_private_data.toPandas()
  id  A
0  0  1
1  1  0
2  1  1
3  1  4
>>> budget = PureDPBudget(float("inf"))
>>> sess = Session.from_dataframe(
...     privacy_budget=budget,
...     source_id="my_private_data",
...     dataframe=my_private_data,
...     protected_change=AddRowsWithID("id"),
... )
>>> # Using flat_map_by_id, each ID's records are pre-summed before
>>> # computing a total sum, allowing less data loss than truncating
>>> # and clamping each row individually without having to add
>>> # more noise.
>>> query = (
...     QueryBuilder("my_private_data")
...     .flat_map_by_id(
...         lambda rows: [{"per_id_sum": sum(r["A"] for r in rows)}],
...         new_column_types={
...             "per_id_sum": ColumnDescriptor(
...                 ColumnType.INTEGER, allow_null=False,
...             )
...         },
...     )
...     .enforce(MaxRowsPerID(1))
...     .sum("per_id_sum", low=0, high=5, name="sum")
... )
>>> # Answering the query with infinite privacy budget
>>> answer = sess.evaluate(
...     query,
...     PureDPBudget(float("inf"))
... )
>>> answer.toPandas()
   sum
0    6
Parameters:
  • f (Callable[[List[Dict[str, Any]]], List[Dict[str, Any]]]) – The function to be applied to each group of rows. The function’s input is a list of dictionaries, each with one key/value pair per column. This function should return a list of dictionaries. Those dictionaries must each have one key/value pair for each column types specified in new_column_types, and the values’ types must match the column types. The function must not have any side effects (in particular, it must not raise exceptions), and must be deterministic (running it multiple times on a fixed input should always return the same output).

  • new_column_types (Mapping[str, Union[str, ColumnType, ColumnDescriptor]]) – Mapping from column names to types for the new columns produced by f. Using ColumnDescriptor is preferred. Note that while the result of this transformation includes the ID column, the ID column must not be in new_column_types, and must not be included in the output rows from f.

Return type:

QueryBuilder