QueryBuilder.bin_column#

from tmlt.analytics import QueryBuilder
QueryBuilder.bin_column(column, spec, name=None)#

Creates a new column by assigning the values in a given column to bins.

An illustrated example can be found in the Simple transformations tutorial.

Example

>>> my_private_data.toPandas()
   age  income
0   11       0
1   17       6
2   30      54
3   18      14
4   59     126
5   48     163
6   76     151
7   91      18
8   48      97
9   53      85
>>> from tmlt.analytics import BinningSpec
>>> sess = Session.from_dataframe(
...     PureDPBudget(float("inf")),
...     source_id="private_data",
...     dataframe=my_private_data,
...     protected_change=AddOneRow(),
... )
>>> age_binspec = BinningSpec(
...     [0, 18, 65, 100], include_both_endpoints=False
... )
>>> income_tax_rate_binspec = BinningSpec(
...     [0, 10, 40, 86, 165], names=[10, 12, 22, 24]
... )
>>> keys = KeySet.from_dict(
...     {
...         "age_binned": age_binspec.bins(),
...         "marginal_tax_rate": income_tax_rate_binspec.bins()
...     }
... )
>>> query = (
...     QueryBuilder("private_data")
...     .bin_column("age", age_binspec)
...     .bin_column(
...         "income", income_tax_rate_binspec, name="marginal_tax_rate"
...     )
...     .groupby(keys).count()
... )
>>> answer = sess.evaluate(query, PureDPBudget(float("inf")))
>>> answer.sort("age_binned", "marginal_tax_rate").toPandas()
   age_binned  marginal_tax_rate  count
0     (0, 18]                 10      2
1     (0, 18]                 12      1
2     (0, 18]                 22      0
3     (0, 18]                 24      0
4    (18, 65]                 10      0
5    (18, 65]                 12      0
6    (18, 65]                 22      2
7    (18, 65]                 24      3
8   (65, 100]                 10      0
9   (65, 100]                 12      1
10  (65, 100]                 22      0
11  (65, 100]                 24      1
Parameters:
  • column (str) – Name of the column used to assign bins.

  • spec (BinningSpec) – A BinningSpec that defines the binning operation to be performed.

  • name (Optional[str]) – The name of the column that will be created. If None (the default), the input column name with _binned appended to it.

Return type:

QueryBuilder