QueryBuilder.flat_map#

from tmlt.analytics import QueryBuilder

QueryBuilder.flat_map(f, new_column_types, augment=False, grouping=False, max_rows=None, max_num_rows=None)#

Applies a mapping function to each row, returning zero or more rows.

If the new column types are specified using ColumnType and not ColumnDescriptor, Tumult Analytics assumes that all new columns created may contain null values, and that DECIMAL columns may contain NaN or infinite values.

The max_rows argument is ignored if the table was initialized with the AddRowsWithID ProtectedChange. Otherwise, it is required (and enforced).

The Simple transformations and Doing more with privacy IDs tutorials contain illustrated examples of flat maps.

Example

>>> my_private_data.toPandas()
   A  B  X
0  0  1  0
1  1  0  1
2  1  2  1
3  1  3  1
>>> budget = PureDPBudget(float("inf"))
>>> sess = Session.from_dataframe(
...     privacy_budget=budget,
...     source_id="my_private_data",
...     dataframe=my_private_data,
...     protected_change=AddOneRow(),
... )
>>> # Building a query with a flat map transformation
>>> query = (
...     QueryBuilder("my_private_data")
...     .flat_map(
...         lambda row: [{"i_B": i} for i in range(int(row["B"])+1)],
...         new_column_types={"i_B": ColumnDescriptor(
...             ColumnType.INTEGER, allow_null=False,
...         )},
...         augment=True,
...         grouping=False,
...         max_rows=3,
...     )
...     .groupby(KeySet.from_dict({"B": [0, 1, 2, 3]}))
...     .count()
... )
>>> # Answering the query with infinite privacy budget
>>> answer = sess.evaluate(
...     query,
...     PureDPBudget(float("inf"))
... )
>>> answer.sort("B").toPandas()
   B  count
0  0      1
1  1      2
2  2      3
3  3      3

Parameters:

f (Callable[[Dict[str, Any]], List[Dict[str, Any]]]) – The function to be applied to each row. The function’s input is a dictionary matching a column name to its value for that row. This function should return a list of dictionaries. Those dictionaries should always have the same keys regardless of input, and the values in those dictionaries should match the column types specified in new_column_types. The function should not have any side effects (in particular, f must not raise exceptions), and must be deterministic (running it multiple times on a fixed input should always return the same output).
new_column_types (Mapping[str, Union[str, ColumnType, ColumnDescriptor]]) – Mapping from column names to types, for new columns produced by f. Using ColumnDescriptor is preferred.
augment (bool) – If True, add new columns to the existing dataframe (so new schema = old schema + schema_new_columns). If False, make the new dataframe with schema = schema_new_columns
grouping (bool) – Whether this produces a new column that we want to groupby. If True, this requires that any groupby aggregations following this query include the new column as a groupby column. Only one new column is supported, and the new column must have distinct values for each input row.
max_rows (Optional[int]) – The enforced limit on the number of rows from each f(row). If f produces more rows than this, only the first max_rows rows will be in the output.
max_num_rows (Optional[int]) – Deprecated synonym for max_rows.

Return type:

QueryBuilder