QueryBuilder.map#

from tmlt.analytics import QueryBuilder
QueryBuilder.map(f, new_column_types, augment=False)#

Applies a mapping function to each row.

If the new column types are specified using ColumnType and not ColumnDescriptor, Tumult Analytics assumes that all new columns created may contain null values, and that DECIMAL columns may contain NaN or infinite values.

An illustrated example can be found in the Simple transformations tutorial.

Example

>>> my_private_data.toPandas()
   A  B  X
0  0  1  0
1  1  0  1
2  1  2  1
>>> budget = PureDPBudget(float("inf"))
>>> sess = Session.from_dataframe(
...     privacy_budget=budget,
...     source_id="my_private_data",
...     dataframe=my_private_data,
...     protected_change=AddOneRow(),
... )
>>> # Building a query with a map transformation
>>> query = (
...     QueryBuilder("my_private_data")
...     .map(
...         lambda row: {"new": row["B"]*2},
...         new_column_types={"new": ColumnType.INTEGER},
...         augment=True
...     )
...     .groupby(KeySet.from_dict({"new": [0, 1, 2, 3, 4]}))
...     .count()
... )
>>> # Answering the query with infinite privacy budget
>>> answer = sess.evaluate(
...     query,
...     PureDPBudget(float("inf"))
... )
>>> answer.sort("new").toPandas()
   new  count
0    0      1
1    1      0
2    2      1
3    3      0
4    4      1
Parameters:
  • f (Callable[[Dict[str, Any]], Dict[str, Any]]) – The function to be applied to each row. The function’s input is a dictionary matching each column name to its value for that row. This function should return a dictionary, which should always have the same keys regardless of input, and the values in that dictionary should match the column types specified in new_column_types. The function should not have any side effects (in particular, f cannot raise exceptions).

  • new_column_types (Mapping[str, Union[ColumnDescriptor, ColumnType]]) – Mapping from column names to types, for new columns produced by f. Using ColumnDescriptor is preferred.

  • augment (bool) – If True, add new columns to the existing dataframe (so new schema = old schema + schema_new_columns). If False, make the new dataframe with schema = schema_new_columns

Return type:

QueryBuilder