QueryBuilder.filter#

from tmlt.analytics import QueryBuilder
QueryBuilder.filter(condition)#

Filter rows matching a condition.

The condition parameter accepts the same syntax as in PySpark’s filter() method: valid expressions are those that can be used in a WHERE clause in Spark SQL. Examples of valid conditions include:

  • age < 42

  • age BETWEEN 17 AND 42

  • age < 42 OR (age < 60 AND gender IS NULL)

  • LENGTH(name) > 17

  • favorite_color IN ('blue', 'red')

  • date = '2022-03-14'

  • time < '2022-01-01T12:45:00'

Example

>>> my_private_data.toPandas()
   A  B  X
0  0  1  0
1  1  0  1
2  1  2  1
>>> budget = PureDPBudget(float("inf"))
>>> sess = Session.from_dataframe(
...     privacy_budget=budget,
...     source_id="my_private_data",
...     dataframe=my_private_data,
...     protected_change=AddOneRow(),
... )
>>> # Building a query with a filter transformation
>>> query = (
...     QueryBuilder("my_private_data")
...     .filter("A == '0'")
...     .count()
... )
>>> # Answering the query with infinite privacy budget
>>> answer = sess.evaluate(
...     query,
...     PureDPBudget(float("inf"))
... )
>>> answer.toPandas()
   count
0      1
Parameters:

condition (str) – A string of SQL expressions specifying the filter to apply to the data. For example, the string “A > B” matches rows where column A is greater than column B.

Return type:

QueryBuilder