_coerce_spark_schema#

Logic for coercing Spark dataframes into forms usable by Tumult Analytics.

Data#

SUPPORTED_SPARK_TYPES#

Set of Spark data types supported by Tumult Analytics.

Support for Spark data types in Analytics is currently as follows:

Type

Supported

LongType

yes

IntegerType

yes, by coercion to LongType

DoubleType

yes

FloatType

yes, by coercion to DoubleType

StringType

yes

DateType

yes

TimestampType

yes

Other Spark types

no

Columns with unsupported types must be dropped or converted to supported ones before loading the data into Analytics.

TYPE_COERCION_MAP :Dict[pyspark.sql.types.DataType, pyspark.sql.types.DataType]#

Mapping describing how Spark’s data types are coerced by Tumult Analytics.

Functions#

coerce_spark_schema_or_fail()

Returns a new DataFrame where all column data types are supported.

coerce_spark_schema_or_fail(dataframe)#

Returns a new DataFrame where all column data types are supported.

In particular, this function raises an error:
  • if dataframe contains a column type not listed in

    SUPPORTED_SPARK_TYPES

  • if dataframe contains a column named “” (the empty string)

This function returns a DataFrame where all column types
  • are coerced according to TYPE_COERCION_MAP if necessary

Parameters

dataframe (pyspark.sql.DataFrame) –

Return type

pyspark.sql.DataFrame