QueryBuilder.quantile#

from tmlt.analytics import QueryBuilder

QueryBuilder.quantile(column, quantile, low, high, name=None)#

Returns a quantile query ready to be evaluated.

Note

If the column being measured contains NaN or null values, a drop_null_and_nan() query will be performed first. If the column being measured contains infinite values, a drop_infinity() query will be performed first.

Example

>>> my_private_data.toPandas()
   A  B  X
0  0  1  0
1  1  0  1
2  1  2  1
>>> budget = PureDPBudget(float("inf"))
>>> sess = Session.from_dataframe(
...     privacy_budget=budget,
...     source_id="my_private_data",
...     dataframe=my_private_data,
...     protected_change=AddOneRow(),
... )
>>> # Building a quantile query
>>> query = (
...     QueryBuilder("my_private_data")
...     .quantile(column="B", quantile=0.6, low=0, high=2)
... )
>>> # Answering the query with infinite privacy budget
>>> answer = sess.evaluate(
...     query,
...     PureDPBudget(float("inf"))
... )
>>> answer.toPandas() 
   B_quantile(0.6)
0         1.331107

Parameters:

column (str) – The column to compute the quantile over.
quantile (float) – A number between 0 and 1 specifying the quantile to compute. For example, 0.5 would compute the median.
low (float) – The lower bound for clamping.
high (float) – The upper bound for clamping. Must be such that low is less than high.
name (Optional[str]) – The name to give the resulting aggregation column. Defaults to f"{column}_quantile({quantile})".

Return type:

Query

from tmlt.analytics import GroupedQueryBuilder

GroupedQueryBuilder.quantile(column, quantile, low, high, name=None)#

Returns a Query with a quantile query.

Note

If the column being measured contains NaN or null values, a drop_null_and_nan() query will be performed first. If the column being measured contains infinite values, a drop_infinity() query will be performed first.

Example

>>> my_private_data.toPandas()
   A  B  X
0  0  1  0
1  1  0  1
2  1  2  1
>>> budget = PureDPBudget(float("inf"))
>>> sess = Session.from_dataframe(
...     privacy_budget=budget,
...     source_id="my_private_data",
...     dataframe=my_private_data,
...     protected_change=AddOneRow(),
... )
>>> # Building a groupby quantile query
>>> query = (
...     QueryBuilder("my_private_data")
...     .groupby(KeySet.from_dict({"A": ["0", "1"]}))
...     .quantile(column="B", quantile=0.6, low=0, high=2)
... )
>>> # Answering the query with infinite privacy budget
>>> answer = sess.evaluate(
...     query,
...     PureDPBudget(float("inf"))
... )
>>> answer.sort("A").toPandas() 
   A  B_quantile(0.6)
0  0         1.331107
1  1         1.331107

Parameters:

column (str) – The column to compute the quantile over.
quantile (float) – A number between 0 and 1 specifying the quantile to compute. For example, 0.5 would compute the median.
low (float) – The lower bound for clamping.
high (float) – The upper bound for clamping. Must be such that low is less than high.
name (Optional[str]) – The name to give the resulting aggregation column. Defaults to f"{column}_quantile({quantile})".

Return type:

Query

Tumult Platform

QueryBuilder.quantile#