QueryBuilder.flat_map#
from tmlt.analytics import QueryBuilder
- QueryBuilder.flat_map(f, new_column_types, augment=False, grouping=False, max_rows=None, max_num_rows=None)#
Applies a mapping function to each row, returning zero or more rows.
If the new column types are specified using
ColumnTypeand notColumnDescriptor, Tumult Analytics assumes that all new columns created may contain null values, and that DECIMAL columns may contain NaN or infinite values.The
max_rowsargument is ignored if the table was initialized with theAddRowsWithIDProtectedChange. Otherwise, it is required (and enforced).The Simple transformations and Doing more with privacy IDs tutorials contain illustrated examples of flat maps.
Example
>>> my_private_data.toPandas() A B X 0 0 1 0 1 1 0 1 2 1 2 1 3 1 3 1 >>> budget = PureDPBudget(float("inf")) >>> sess = Session.from_dataframe( ... privacy_budget=budget, ... source_id="my_private_data", ... dataframe=my_private_data, ... protected_change=AddOneRow(), ... ) >>> # Building a query with a flat map transformation >>> query = ( ... QueryBuilder("my_private_data") ... .flat_map( ... lambda row: [{"i_B": i} for i in range(int(row["B"])+1)], ... new_column_types={"i_B": ColumnDescriptor( ... ColumnType.INTEGER, allow_null=False, ... )}, ... augment=True, ... grouping=False, ... max_rows=3, ... ) ... .groupby(KeySet.from_dict({"B": [0, 1, 2, 3]})) ... .count() ... ) >>> # Answering the query with infinite privacy budget >>> answer = sess.evaluate( ... query, ... PureDPBudget(float("inf")) ... ) >>> answer.sort("B").toPandas() B count 0 0 1 1 1 2 2 2 3 3 3 3
- Parameters:
f (
Callable[[Dict[str,Any]],List[Dict[str,Any]]]) – The function to be applied to each row. The function’s input is a dictionary matching a column name to its value for that row. This function should return a list of dictionaries. Those dictionaries should always have the same keys regardless of input, and the values in those dictionaries should match the column types specified innew_column_types. The function should not have any side effects (in particular,fmust not raise exceptions), and must be deterministic (running it multiple times on a fixed input should always return the same output).new_column_types (
Mapping[str,Union[str,ColumnType,ColumnDescriptor]]) – Mapping from column names to types, for new columns produced byf. UsingColumnDescriptoris preferred.augment (
bool) – If True, add new columns to the existing dataframe (so new schema = old schema + schema_new_columns). If False, make the new dataframe with schema = schema_new_columnsgrouping (
bool) – Whether this produces a new column that we want to groupby. If True, this requires that any groupby aggregations following this query include the new column as a groupby column. Only one new column is supported, and the new column must have distinct values for each input row.max_rows (
Optional[int]) – The enforced limit on the number of rows from eachf(row). Iffproduces more rows than this, only the firstmax_rowsrows will be in the output.max_num_rows (
Optional[int]) – Deprecated synonym formax_rows.
- Return type: