_schema#

Schema management for private and public tables.

The schema represents the column types of the underlying table. This allows for seamless transitions of the data representation type.

Functions#

column_type_to_py_type()

Converts a ColumnType to a python type.

analytics_to_py_types()

Returns the mapping from column names to supported python types.

analytics_to_spark_schema()

Convert an Analytics schema to a Spark schema.

analytics_to_spark_columns_descriptor()

Convert a schema in Analytics representation to a Spark columns descriptor.

spark_schema_to_analytics_columns()

Convert Spark schema to Analytics columns.

spark_dataframe_domain_to_analytics_columns()

Convert a Spark dataframe domain to Analytics columns.

column_type_to_py_type(column_type)#

Converts a ColumnType to a python type.

Parameters

column_type (ColumnType) –

Return type

type

analytics_to_py_types(analytics_schema)#

Returns the mapping from column names to supported python types.

Parameters

analytics_schema (Schema) –

Return type

Dict[str, type]

analytics_to_spark_schema(analytics_schema)#

Convert an Analytics schema to a Spark schema.

Parameters

analytics_schema (Schema) –

Return type

pyspark.sql.types.StructType

analytics_to_spark_columns_descriptor(analytics_schema)#

Convert a schema in Analytics representation to a Spark columns descriptor.

Parameters

analytics_schema (Schema) –

Return type

tmlt.core.domains.spark_domains.SparkColumnsDescriptor

spark_schema_to_analytics_columns(spark_schema)#

Convert Spark schema to Analytics columns.

Parameters

spark_schema (pyspark.sql.types.StructType) –

Return type

Dict[str, ColumnDescriptor]

spark_dataframe_domain_to_analytics_columns(domain)#

Convert a Spark dataframe domain to Analytics columns.

Parameters

domain (tmlt.core.domains.base.Domain) –

Return type

Dict[str, ColumnDescriptor]

Classes#

ColumnType

The supported SQL92 column types for Analytics data.

ColumnDescriptor

Information about a column.

Schema

Schema class describing the column information of the data.

class ColumnType#

Bases: enum.Enum

The supported SQL92 column types for Analytics data.

INTEGER#

Integer column type.

DECIMAL#

Floating-point column type.

VARCHAR#

String column type.

DATE#

Date column type.

TIMESTAMP#

Timestamp column type.

name()#

The name of the Enum member.

value()#

The value of the Enum member.

class ColumnDescriptor#

Information about a column.

ColumnDescriptors have the following attributes:

column_type#

A ColumnType, specifying what type this column has.

allow_null#

bool. If True, this column allows null values.

allow_nan#

bool. If True, this column allows NaN values.

allow_inf#

bool. If True, this column allows infinite values.

class Schema(column_descs, grouping_column=None, id_column=None, id_space=None, default_allow_null=False, default_allow_nan=False, default_allow_inf=False)#

Bases: collections.abc.Mapping

Schema class describing the column information of the data.

The following SQL92 types are currently supported:

INTEGER, DECIMAL, VARCHAR, DATE, TIMESTAMP

Methods#

columns()

Return the names of the columns in the schema.

column_descs()

Returns a mapping from column name to column descriptor.

column_types()

Returns a mapping from column name to column type.

grouping_column()

Returns the optional column that must be grouped by.

id_column()

Return whether the grouping column is an ID column.

id_space()

Return the ID space for this schema.

__eq__()

Returns True if schemas are equal.

__getitem__()

Returns the data type for the given column.

__iter__()

Return an iterator over the columns in the schema.

__len__()

Return the number of columns in the schema.

get()

D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.

keys()

D.keys() -> a set-like object providing a view on D’s keys

items()

D.items() -> a set-like object providing a view on D’s items

values()

D.values() -> an object providing a view on D’s values

Parameters
__init__(column_descs, grouping_column=None, id_column=None, id_space=None, default_allow_null=False, default_allow_nan=False, default_allow_inf=False)#

Constructor.

Parameters
  • column_descs (MappingMapping[str, Union[str, ColumnType, ColumnDescriptor]]) – Mapping from column names to supported types.

  • grouping_column (str | NoneOptional[str] (default: None)) – Optional column that must be grouped by in this query.

  • id_column (str | NoneOptional[str] (default: None)) – The ID column on this table, if one exists.

  • id_space (str | NoneOptional[str] (default: None)) – The ID space for this table, if one exists.

  • default_allow_null (boolbool (default: False)) – When a ColumnType or string is used as the value in the ColumnDescriptors mapping, the column will allow_null if default_allow_null is True.

  • default_allow_nan (boolbool (default: False)) – When a ColumnType or string is used as the value in the ColumnDescriptors mapping, the column will allow_nan if default_allow_nan is True.

  • default_allow_inf (boolbool (default: False)) – When a ColumnType or string is used as the value in the ColumnDescriptors mapping, the column will allow_inf if default_allow_inf is True.

property columns#

Return the names of the columns in the schema.

property column_descs#

Returns a mapping from column name to column descriptor.

Return type

Dict[str, ColumnDescriptor]

property column_types#

Returns a mapping from column name to column type.

Return type

Dict[str, str]

property grouping_column#

Returns the optional column that must be grouped by.

Return type

Optional[str]

property id_column#

Return whether the grouping column is an ID column.

Return type

Optional[str]

property id_space#

Return the ID space for this schema.

Return type

Optional[str]

__eq__(other)#

Returns True if schemas are equal.

Parameters

other (object) – Schema to check against.

Return type

bool

__getitem__(column)#

Returns the data type for the given column.

Parameters

column (str) – The column to get the data type for.

Return type

ColumnDescriptor

__iter__()#

Return an iterator over the columns in the schema.

Return type

Iterator[str]

__len__()#

Return the number of columns in the schema.

Return type

int

get(key, default=None)#

D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.

keys()#

D.keys() -> a set-like object providing a view on D’s keys

items()#

D.items() -> a set-like object providing a view on D’s items

values()#

D.values() -> an object providing a view on D’s values