QueryBuilder.drop_null_and_nan#
from tmlt.analytics import QueryBuilder
- QueryBuilder.drop_null_and_nan(columns)#
Removes rows containing null or NaN values.
Note
Null values cannot be dropped in the ID column of a table initialized with a
AddRowsWithID
ProtectedChange
, nor on a column generated by aflat_map()
with the grouping parameter set to True.Warning
If null and NaN values are dropped from a column, then Analytics will raise an error if a
KeySet
that contains a null value is used for that column.Example
>>> my_private_data.toPandas() A B X 0 a1 2.0 0.0 1 a1 NaN 1.1 2 a2 2.0 NaN >>> budget = PureDPBudget(float("inf")) >>> sess = Session.from_dataframe( ... privacy_budget=budget, ... source_id="my_private_data", ... dataframe=my_private_data, ... protected_change=AddOneRow(), ... ) >>> # Count query on the original data >>> query = ( ... QueryBuilder("my_private_data") ... .groupby(KeySet.from_dict({"A": ["a1", "a2"], "B": [None, 2]})) ... .count() ... ) >>> # Answering the query with infinite privacy budget >>> answer = sess.evaluate( ... query, ... PureDPBudget(float("inf")) ... ) >>> answer.sort("A", "B").toPandas() A B count 0 a1 NaN 1 1 a1 2.0 1 2 a2 NaN 0 3 a2 2.0 1 >>> # Building a query with a transformation >>> query = ( ... QueryBuilder("my_private_data") ... .drop_null_and_nan(columns=["B"]) ... .groupby(KeySet.from_dict({"A": ["a1", "a2"]})) ... .count() ... ) >>> # Answering the query with infinite privacy budget >>> answer = sess.evaluate( ... query, ... PureDPBudget(float("inf")) ... ) >>> answer.sort("A").toPandas() A count 0 a1 1 1 a2 1