binning_spec#
A BinningSpec defines a binning operation on a column.
Data#
- BinT#
The type of the value being binned.
- BinNameT#
The type of the bin name column.
Classes#
- class BinningSpec(bin_edges, names=None, right=True, include_both_endpoints=True, nan_bin=None)#
Bases:
Generic
[BinT
,BinNameT
]A spec object defining an operation where values are assigned to bins.
A BinningSpec divides values into bins based on a list of bin edges, for use with the
bin_column()
method. Allsupported data types
can be binned using a BinningSpec.Values outside the range of the provided bins and
None
types are all mapped toNone
(null
in Spark), as are NaN values by default. Bin names are generated based on the bin edges, but custom names can be provided.By default, the right edge of each bin is included in that bin: using edges [0, 5, 10] will lead to bins [0, 5] and (5, 10]. To include the left edge instead, set the
right
parameter to False.Examples
>>> spec = BinningSpec([0,5,10]) >>> spec.bins() ['[0, 5]', '(5, 10]'] >>> spec(0) '[0, 5]' >>> spec(5) '[0, 5]' >>> spec(6) '(5, 10]' >>> spec(10) '(5, 10]' >>> spec(11) is None True
- Parameters:
- property input_type: tmlt.analytics._schema.ColumnType#
Return the ColumnType of the column this binning can be applied to.
- Return type:
- property column_descriptor: tmlt.analytics._schema.ColumnDescriptor#
Return the ColumnDescriptor that results from applying this binning.
- Return type:
- __init__(bin_edges, names=None, right=True, include_both_endpoints=True, nan_bin=None)#
Initialize a BinningSpec.
- Parameters:
bin_edges (
Sequence
[TypeVar
(BinT
,str
,Union
[int
,float
],date
,datetime
)]) – A list of the bin edges, sorted in ascending order.names (
Optional
[Sequence
[Optional
[TypeVar
(BinNameT
,str
,int
,float
,date
,datetime
)]]]) – If given, used as the names of bins. Must be one element shorter thanbin_edges
. Duplicate values are allowed, which will place non-contiguous ranges of values into the same bin. Note that while using floats and timestamps as bin names is allowed here, grouping on the resulting column is not allowed.right (
bool
) – When True, the right edge of each bin is included in that bin; otherwise, the left edge is. Defaults to True.include_both_endpoints (
bool
) – When True, the outer edges of both the first and last bins will be included in their respective bins; when False, these edges are treated the same as the other bins, i.e. only one will be included based on howright
is set. Defaults to True.nan_bin (
Optional
[TypeVar
(BinNameT
,str
,int
,float
,date
,datetime
)]) – If binning over a float-valued column, all NaNs will be placed in a bin with this name. The default value,None
, causes these values to be placed in the same bin with out-of-range and null values.
- __eq__(other)#
Adds equality comparison to the BinningSpec class.
- Parameters:
other (Any) –
- bins(include_null=False)#
Return a list of all the bin names that could result from the binning.
The returned list is guaranteed to contain unique elements, even if multiple bins were mapped to the same name. The NaN bin, if one was specified, is included. If
include_null
is true, the null bin is included as well; by default, it is not included.- Parameters:
include_null (bool) –
- Return type:
List[Optional[BinNameT]]
- __call__(val)#
Given a value to bin, return its bin name.
In most cases this method only needs to be used internally, but it can be called on its own to test the binning that will be performed.
- Parameters:
val (Optional[BinT]) – The value to be assigned to a bin.
- Return type:
Any