QueryBuilder.drop_null_and_nan#

from tmlt.analytics import QueryBuilder
QueryBuilder.drop_null_and_nan(columns)#

Removes rows containing null or NaN values.

Note

Null values cannot be dropped in the ID column of a table initialized with a AddRowsWithID ProtectedChange, nor on a column generated by a flat_map() with the grouping parameter set to True.

Warning

If null and NaN values are dropped from a column, then Analytics will raise an error if a KeySet that contains a null value is used for that column.

Example

>>> my_private_data.toPandas()
    A    B    X
0  a1  2.0  0.0
1  a1  NaN  1.1
2  a2  2.0  NaN
>>> budget = PureDPBudget(float("inf"))
>>> sess = Session.from_dataframe(
...     privacy_budget=budget,
...     source_id="my_private_data",
...     dataframe=my_private_data,
...     protected_change=AddOneRow(),
... )
>>> # Count query on the original data
>>> query = (
...     QueryBuilder("my_private_data")
...     .groupby(KeySet.from_dict({"A": ["a1", "a2"], "B": [None, 2]}))
...     .count()
... )
>>> # Answering the query with infinite privacy budget
>>> answer = sess.evaluate(
...     query,
...     PureDPBudget(float("inf"))
... )
>>> answer.sort("A", "B").toPandas()
    A    B  count
0  a1  NaN      1
1  a1  2.0      1
2  a2  NaN      0
3  a2  2.0      1
>>> # Building a query with a transformation
>>> query = (
...     QueryBuilder("my_private_data")
...     .drop_null_and_nan(columns=["B"])
...     .groupby(KeySet.from_dict({"A": ["a1", "a2"]}))
...     .count()
... )
>>> # Answering the query with infinite privacy budget
>>> answer = sess.evaluate(
...     query,
...     PureDPBudget(float("inf"))
... )
>>> answer.sort("A").toPandas()
    A  count
0  a1      1
1  a2      1
Parameters:

columns (Optional[List[str]]) – A list of columns in which to look for null and NaN values. If None or an empty list, then all columns will be considered, meaning that if any column has a null/NaN value then the row it is in will be dropped.

Return type:

QueryBuilder