rename#
Transformations for renaming Spark DataFrame columns.
See the architecture overview for more information.
Classes#
Rename one or more columns in a Spark DataFrame. |
- class Rename(input_domain, metric, rename_mapping)#
Bases:
tmlt.core.transformations.base.Transformation
Rename one or more columns in a Spark DataFrame.
Example
>>> # Example input >>> print_sdf(spark_dataframe) A B 0 a1 b1 1 a2 b1 2 a3 b2 3 a3 b2 >>> rename_b_to_c = Rename( ... input_domain=SparkDataFrameDomain( ... { ... "A": SparkStringColumnDescriptor(), ... "B": SparkStringColumnDescriptor(), ... } ... ), ... metric=SymmetricDifference(), ... rename_mapping={"B": "C"}, ... ) >>> # Apply transformation to data >>> renamed_spark_dataframe = rename_b_to_c(spark_dataframe) >>> print_sdf(renamed_spark_dataframe) A C 0 a1 b1 1 a2 b1 2 a3 b2 3 a3 b2
- Transformation Contract:
Input domain -
SparkDataFrameDomain
Output domain -
SparkDataFrameDomain
Input metric -
SymmetricDifference
,HammingDistance
, orIfGroupedBy
Output metric -
SymmetricDifference
,HammingDistance
, orIfGroupedBy
. Matches input metric, unlessIfGroupedBy
and the grouping column is renamed.
>>> rename_b_to_c.input_domain SparkDataFrameDomain(schema={'A': SparkStringColumnDescriptor(allow_null=False), 'B': SparkStringColumnDescriptor(allow_null=False)}) >>> rename_b_to_c.output_domain SparkDataFrameDomain(schema={'A': SparkStringColumnDescriptor(allow_null=False), 'C': SparkStringColumnDescriptor(allow_null=False)}) >>> rename_b_to_c.input_metric SymmetricDifference() >>> rename_b_to_c.output_metric SymmetricDifference()
- Stability Guarantee:
Rename
’sstability_function()
returns d_in.>>> rename_b_to_c.stability_function(1) 1 >>> rename_b_to_c.stability_function(2) 2
- Parameters
input_domain (tmlt.core.domains.spark_domains.SparkDataFrameDomain) –
metric (Union[tmlt.core.metrics.SymmetricDifference, tmlt.core.metrics.HammingDistance, tmlt.core.metrics.IfGroupedBy]) –
- __init__(input_domain, metric, rename_mapping)#
Constructor.
- Parameters
input_domain (
SparkDataFrameDomain
SparkDataFrameDomain
) – Domain of input DataFrame.metric (
SymmetricDifference
|HammingDistance
|IfGroupedBy
Union
[SymmetricDifference
,HammingDistance
,IfGroupedBy
]) – Distance metric for input DataFrames.rename_mapping ({
str
:str
}Dict
[str
,str
]) – Dictionary from existing column names to target column names.
- property rename_mapping#
Returns mapping from old column names to new column names.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the architecture overview for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Return type
- __call__(sdf)#
Renames columns.
- Parameters
sdf (pyspark.sql.DataFrame) –
- Return type
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.