rename#

Transformations for renaming Spark DataFrame columns.

See the architecture overview for more information.

Classes#

Rename

Rename one or more columns in a Spark DataFrame.

class Rename(input_domain, metric, rename_mapping)#

Bases: tmlt.core.transformations.base.Transformation

Rename one or more columns in a Spark DataFrame.

Example

>>> # Example input
>>> print_sdf(spark_dataframe)
    A   B
0  a1  b1
1  a2  b1
2  a3  b2
3  a3  b2
>>> rename_b_to_c = Rename(
...     input_domain=SparkDataFrameDomain(
...         {
...             "A": SparkStringColumnDescriptor(),
...             "B": SparkStringColumnDescriptor(),
...         }
...     ),
...     metric=SymmetricDifference(),
...     rename_mapping={"B": "C"},
... )
>>> # Apply transformation to data
>>> renamed_spark_dataframe = rename_b_to_c(spark_dataframe)
>>> print_sdf(renamed_spark_dataframe)
    A   C
0  a1  b1
1  a2  b1
2  a3  b2
3  a3  b2
Transformation Contract:
>>> rename_b_to_c.input_domain
SparkDataFrameDomain(schema={'A': SparkStringColumnDescriptor(allow_null=False), 'B': SparkStringColumnDescriptor(allow_null=False)})
>>> rename_b_to_c.output_domain
SparkDataFrameDomain(schema={'A': SparkStringColumnDescriptor(allow_null=False), 'C': SparkStringColumnDescriptor(allow_null=False)})
>>> rename_b_to_c.input_metric
SymmetricDifference()
>>> rename_b_to_c.output_metric
SymmetricDifference()
Stability Guarantee:

Rename’s stability_function() returns d_in.

>>> rename_b_to_c.stability_function(1)
1
>>> rename_b_to_c.stability_function(2)
2
Parameters
__init__(input_domain, metric, rename_mapping)#

Constructor.

Parameters
property rename_mapping#

Returns mapping from old column names to new column names.

Return type

Dict[str, str]

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the architecture overview for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(sdf)#

Renames columns.

Parameters

sdf (pyspark.sql.DataFrame) –

Return type

pyspark.sql.DataFrame

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.