QueryBuilder.flat_map#
from tmlt.analytics import QueryBuilder
- QueryBuilder.flat_map(f, new_column_types, augment=False, grouping=False, max_rows=None, max_num_rows=None)#
Applies a mapping function to each row, returning zero or more rows.
If the new column types are specified using
ColumnType
and notColumnDescriptor
, Tumult Analytics assumes that all new columns created may contain null values, and that DECIMAL columns may contain NaN or infinite values.The
max_rows
argument is ignored if the table was initialized with theAddRowsWithID
ProtectedChange
. Otherwise, it is required (and enforced).The Simple transformations and Doing more with privacy IDs tutorials contain illustrated examples of flat maps.
Example
>>> my_private_data.toPandas() A B X 0 0 1 0 1 1 0 1 2 1 2 1 3 1 3 1 >>> budget = PureDPBudget(float("inf")) >>> sess = Session.from_dataframe( ... privacy_budget=budget, ... source_id="my_private_data", ... dataframe=my_private_data, ... protected_change=AddOneRow(), ... ) >>> # Building a query with a flat map transformation >>> query = ( ... QueryBuilder("my_private_data") ... .flat_map( ... lambda row: [{"i_B": i} for i in range(int(row["B"])+1)], ... new_column_types={"i_B": ColumnDescriptor( ... ColumnType.INTEGER, allow_null=False, ... )}, ... augment=True, ... grouping=False, ... max_rows=3, ... ) ... .groupby(KeySet.from_dict({"B": [0, 1, 2, 3]})) ... .count() ... ) >>> # Answering the query with infinite privacy budget >>> answer = sess.evaluate( ... query, ... PureDPBudget(float("inf")) ... ) >>> answer.sort("B").toPandas() B count 0 0 1 1 1 2 2 2 3 3 3 3
- Parameters:
f (
Callable
[[Dict
[str
,Any
]],List
[Dict
[str
,Any
]]]) – The function to be applied to each row. The function’s input is a dictionary matching a column name to its value for that row. This function should return a list of dictionaries. Those dictionaries should always have the same keys regardless of input, and the values in those dictionaries should match the column types specified innew_column_types
. The function should not have any side effects (in particular,f
must not raise exceptions), and must be deterministic (running it multiple times on a fixed input should always return the same output).new_column_types (
Mapping
[str
,Union
[str
,ColumnType
,ColumnDescriptor
]]) – Mapping from column names to types, for new columns produced byf
. UsingColumnDescriptor
is preferred.augment (
bool
) – If True, add new columns to the existing dataframe (so new schema = old schema + schema_new_columns). If False, make the new dataframe with schema = schema_new_columnsgrouping (
bool
) – Whether this produces a new column that we want to groupby. If True, this requires that any groupby aggregations following this query include the new column as a groupby column. Only one new column is supported, and the new column must have distinct values for each input row.max_rows (
Optional
[int
]) – The enforced limit on the number of rows from eachf(row)
. Iff
produces more rows than this, only the firstmax_rows
rows will be in the output.max_num_rows (
Optional
[int
]) – Deprecated synonym formax_rows
.
- Return type: