QueryBuilder.histogram#
from tmlt.analytics import QueryBuilder
- QueryBuilder.histogram(column, bin_edges, name=None)#
Returns a count query containing the frequency of values in specified column.
Example
>>> from tmlt.analytics import BinningSpec >>> private_data = spark.createDataFrame( ... pd.DataFrame( ... { ... "income_thousands": [83, 85, 86, 73, 82, 95, ... 74, 92, 71, 86, 97] ... } ... ) ... ) >>> session = Session.from_dataframe( ... privacy_budget=PureDPBudget(epsilon=float('inf')), ... source_id="private_data", ... dataframe=private_data, ... protected_change=AddOneRow(), ... ) >>> income_binspec = BinningSpec( ... bin_edges=[i for i in range(70,110,10)], ... include_both_endpoints=False ... ) >>> binned_income_count_query = ( ... QueryBuilder("private_data") ... .histogram("income_thousands", income_binspec, "income_binned") ... ) >>> binned_income_counts = session.evaluate( ... binned_income_count_query, ... privacy_budget=PureDPBudget(epsilon=10), ... ) >>> print(binned_income_counts.sort("income_binned").toPandas()) income_binned count 0 (70, 80] 3 1 (80, 90] 5 2 (90, 100] 3
- Parameters:
column (
str
) – Name of the column used to assign bins.bin_edges (
Union
[Sequence
[TypeVar
(BinT
,str
,Union
[int
,float
],date
,datetime
)],BinningSpec
]) – The bin edges for the histogram; provided as either aBinningSpec
or as a list ofsupported data types
. Values outside the range of the provided bins,None
types, and NaN values are all mapped toNone
(null
in Spark).name (
Optional
[str
]) – The name of the column that will be created. If None (the default), the input column name with_binned
appended to it.
- Return type: