_coerce_spark_schema#
Logic for coercing Spark dataframes into forms usable by Tumult Analytics.
Data#
- SUPPORTED_SPARK_TYPES#
Set of Spark data types supported by Tumult Analytics.
Support for Spark data types in Analytics is currently as follows:
Type
Supported
yes
yes, by coercion to
LongType
yes
yes, by coercion to
DoubleType
yes
yes
yes
Other Spark types
no
Columns with unsupported types must be dropped or converted to supported ones before loading the data into Analytics.
- TYPE_COERCION_MAP :Dict[pyspark.sql.types.DataType, pyspark.sql.types.DataType]#
Mapping describing how Spark’s data types are coerced by Tumult Analytics.
Functions#
Returns a new DataFrame where all column data types are supported. |
- coerce_spark_schema_or_fail(dataframe)#
Returns a new DataFrame where all column data types are supported.
- In particular, this function raises an error:
- if
dataframe
contains a column type not listed in SUPPORTED_SPARK_TYPES
- if
if
dataframe
contains a column named “” (the empty string)
- This function returns a DataFrame where all column types
are coerced according to TYPE_COERCION_MAP if necessary
- Parameters
dataframe (pyspark.sql.DataFrame) –
- Return type