QueryBuilder.filter#
from tmlt.analytics import QueryBuilder
- QueryBuilder.filter(condition)#
Filter rows matching a condition.
The
condition
parameter accepts the same syntax as in PySpark’sfilter()
method: valid expressions are those that can be used in a WHERE clause in Spark SQL. Examples of valid conditions include:age < 42
age BETWEEN 17 AND 42
age < 42 OR (age < 60 AND gender IS NULL)
LENGTH(name) > 17
favorite_color IN ('blue', 'red')
date = '2022-03-14'
time < '2022-01-01T12:45:00'
Example
>>> my_private_data.toPandas() A B X 0 0 1 0 1 1 0 1 2 1 2 1 >>> budget = PureDPBudget(float("inf")) >>> sess = Session.from_dataframe( ... privacy_budget=budget, ... source_id="my_private_data", ... dataframe=my_private_data, ... protected_change=AddOneRow(), ... ) >>> # Building a query with a filter transformation >>> query = ( ... QueryBuilder("my_private_data") ... .filter("A == '0'") ... .count() ... ) >>> # Answering the query with infinite privacy budget >>> answer = sess.evaluate( ... query, ... PureDPBudget(float("inf")) ... ) >>> answer.toPandas() count 0 1
- Parameters:
condition (
str
) – A string of SQL expressions specifying the filter to apply to the data. For example, the string “A > B” matches rows where column A is greater than column B.- Return type: