add_remove_keys#

Transformations that transform dictionaries using AddRemoveKeys.

Note that several of the transformations in dictionary also support AddRemoveKeys. In particular

The transformations defined in this module are required because AugmentDictTransformation is not stable under AddRemoveKeys for all transformations.

For example, consider the following example:

>>> # Create transformation
>>> input_domain = SparkDataFrameDomain(
...     {
...         "A": SparkStringColumnDescriptor(),
...         "B": SparkStringColumnDescriptor(),
...     }
... )
>>> input_metric = IfGroupedBy("A", SymmetricDifference())
>>> truncate = LimitRowsPerGroup(
...     input_domain=input_domain,
...     output_metric=SymmetricDifference(),
...     grouping_column="A",
...     threshold=1,
... )
>>> rename = Rename(
...     input_domain=input_domain,
...     metric=SymmetricDifference(),
...     rename_mapping={"A": "C", "B": "D"},
... )
>>> create_unique_column = AddUniqueColumn(
...     input_domain=rename.output_domain,
...     column="A",
... )
>>> transformation = truncate | rename | create_unique_column
>>> # Create data
>>> x1 = spark.createDataFrame(
...     [["a", "1"], ["b", "2"], ["c", "3"]], ["A", "B"]
... )
>>> x2 = spark.createDataFrame(
...     [["b", "2"], ["c", "3"]], ["A", "B"]
... )
>>> print_sdf(x1)
   A  B
0  a  1
1  b  2
2  c  3
>>> print_sdf(x2)
   A  B
0  b  2
1  c  3
>>> y1 = transformation(x1)
>>> y2 = transformation(x2)
>>> print_sdf(y1)  # Note that the values below are in fact unique (after the 5B226)
   C  D                           A
0  a  1  5B2261222C2231222C2231225D
1  b  2  5B2262222C2232222C2231225D
2  c  3  5B2263222C2233222C2231225D
>>> print_sdf(y2)
   C  D                           A
0  b  2  5B2262222C2232222C2231225D
1  c  3  5B2263222C2233222C2231225D
>>> # Check stability
>>> input_metric.distance(x1, x2, input_domain)
1
>>> input_metric.distance(y1, y2, transformation.output_domain)
1
>>> # Check stability as if it was Augmented using AugmentDictTransformation
>>> dict_x1 = {"start": x1}
>>> dict_x2 = {"start": x2}
>>> dict_y1 = {"start": x1, "end": y1}
>>> dict_y2 = {"start": x2, "end": y2}
>>> dict_input_domain = DictDomain({"start": input_domain})
>>> dict_input_metric = AddRemoveKeys({"start": "A"})
>>> dict_output_domain = DictDomain(
...     {
...         "start": input_domain,
...         "end": transformation.output_domain
...     }
... )
>>> dict_output_metric = AddRemoveKeys({"start": "A", "end": "A"})
>>> # Naively you would expect the stability to be 1, but in this example it is 2
>>> dict_input_metric.distance(dict_x1, dict_x2, dict_input_domain)
1
>>> dict_output_metric.distance(dict_y1, dict_y2, dict_output_domain)
2

Conceptually, what is happening in the example above is that the transformation is changing the meaning of the key column. The column “A” that is in the input data is not the same as the column “A” that is in the output data, so removing one value, “a”, in the input dictionary results in both “a” and “a,1” being removed in the output dictionary.

Classes#

TransformValue

Base class transforming a specified key using an existing transformation.

LimitRowsPerGroupValue

Applies a LimitRowsPerGroup to the specified key.

LimitKeysPerGroupValue

Applies a LimitKeysPerGroup to the specified key.

LimitRowsPerKeyPerGroupValue

Applies a LimitRowsPerKeyPerGroup to the specified key.

FilterValue

Applies a Filter to create a new element from specified value.

PublicJoinValue

Applies a PublicJoin to create a new element from specified value.

FlatMapValue

Applies a FlatMap to create a new element from specified value.

MapValue

Applies a Map to create a new element from specified value.

DropInfsValue

Applies a DropInfs to create a new element from specified value.

DropNaNsValue

Applies a DropNaNs to create a new element from specified value.

DropNullsValue

Applies a DropNulls to create a new element from specified value.

ReplaceInfsValue

Applies a ReplaceInfs to create a new element from specified value.

ReplaceNaNsValue

Applies a ReplaceNaNs to create a new element from specified value.

ReplaceNullsValue

Applies a ReplaceNulls to create a new element from specified value.

PersistValue

Applies a Persist to create a new element from specified value.

UnpersistValue

Applies a Unpersist to create a new element from specified value.

SparkActionValue

Applies a SparkAction to create a new element from specified value.

RenameValue

Applies a Rename to create a new element from specified value.

SelectValue

Applies a Select to create a new element from specified value.

class TransformValue(input_domain, input_metric, transformation, key, new_key)#

Bases: tmlt.core.transformations.base.Transformation

Base class transforming a specified key using an existing transformation.

This class can be subclassed for the purposes of making a claim that a kind of Transformation (like Filter) can be applied to a DataFrame and augment the input dictionary with the output without violating the closeness of neighboring dataframes with AddRemoveKeys.

NOTE: This class cannot be instantiated directly.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, transformation, key, new_key)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • transformation (TransformationTransformation) – The DataFrame to DataFrame transformation to apply. Input and output metric must both be IfGroupedBy(column, SymmetricDifference()) using the same column.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class LimitRowsPerGroupValue(input_domain, input_metric, key, new_key, threshold)#

Bases: TransformValue

Applies a LimitRowsPerGroup to the specified key.

See TransformValue and LimitRowsPerGroup for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, threshold)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – Domain of input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – Input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • threshold (intint) – The maximum number of rows per group after truncation.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class LimitKeysPerGroupValue(input_domain, input_metric, key, new_key, key_column, threshold)#

Bases: TransformValue

Applies a LimitKeysPerGroup to the specified key.

See TransformValue and LimitKeysPerGroup for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, key_column, threshold)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – Domain of input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – Input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • key_column (strstr) – Name of column defining the keys.

  • threshold (intint) – The maximum number of keys per group after truncation.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class LimitRowsPerKeyPerGroupValue(input_domain, input_metric, key, new_key, key_column, threshold)#

Bases: TransformValue

Applies a LimitRowsPerKeyPerGroup to the specified key.

See TransformValue and LimitRowsPerKeyPerGroup for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, key_column, threshold)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – Domain of input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – Input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • key_column (strstr) – Name of column defining the keys.

  • threshold (intint) – The maximum number of rows each unique (key, grouping column value) pair may appear in after truncation.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class FilterValue(input_domain, input_metric, key, new_key, filter_expr)#

Bases: TransformValue

Applies a Filter to create a new element from specified value.

See TransformValue, and Filter for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, filter_expr)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • filter_expr (strstr) – A string of SQL expression specifying the filter to apply to the data. The language is the same as the one used by pyspark.sql.DataFrame.filter().

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class PublicJoinValue(input_domain, input_metric, key, new_key, public_df, public_df_domain=None, join_cols=None, join_on_nulls=False)#

Bases: TransformValue

Applies a PublicJoin to create a new element from specified value.

See TransformValue, and PublicJoin for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, public_df, public_df_domain=None, join_cols=None, join_on_nulls=False)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • public_df (DataFrameDataFrame) – A Spark DataFrame to join with.

  • public_df_domain (SparkDataFrameDomain | NoneOptional[SparkDataFrameDomain] (default: None)) – Domain of public DataFrame to join with. If this domain indicates that a float column does not allow nans (or infs), all rows in public_df containing a nan (or an inf) in that column will be dropped. If None, domain is inferred from the schema of public_df and any float column will be marked as allowing inf and nan values.

  • join_cols (List[str] | NoneOptional[List[str]] (default: None)) – Names of columns to join on. If None, a natural join is performed.

  • join_on_nulls (boolbool (default: False)) – If True, null values on corresponding join columns of the public and private dataframes will be considered to be equal.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class FlatMapValue(input_domain, input_metric, key, new_key, row_transformer, max_num_rows)#

Bases: TransformValue

Applies a FlatMap to create a new element from specified value.

See TransformValue, and FlatMap for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, row_transformer, max_num_rows)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • row_transformer (RowToRowsTransformationRowToRowsTransformation) – Transformation to apply to each row.

  • max_num_rows (int | NoneOptional[int]) – The maximum number of rows to allow from row_transformer. If more rows are output, the additional rows are suppressed. If this value is None, the transformation will not impose a limit on the number of rows.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class MapValue(input_domain, input_metric, key, new_key, row_transformer)#

Bases: TransformValue

Applies a Map to create a new element from specified value.

See TransformValue, and Map for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, row_transformer)#

Constructor.

Parameters
property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class DropInfsValue(input_domain, input_metric, key, new_key, columns)#

Bases: TransformValue

Applies a DropInfs to create a new element from specified value.

See TransformValue, and DropInfs for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, columns)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • columns (List[str]List[str]) – Columns to drop +inf and -inf from.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class DropNaNsValue(input_domain, input_metric, key, new_key, columns)#

Bases: TransformValue

Applies a DropNaNs to create a new element from specified value.

See TransformValue, and DropNaNs for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, columns)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • columns (List[str]List[str]) – Columns to drop NaNs from.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class DropNullsValue(input_domain, input_metric, key, new_key, columns)#

Bases: TransformValue

Applies a DropNulls to create a new element from specified value.

See TransformValue, and DropNulls for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, columns)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • columns (List[str]List[str]) – Columns to drop nulls from.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class ReplaceInfsValue(input_domain, input_metric, key, new_key, replace_map)#

Bases: TransformValue

Applies a ReplaceInfs to create a new element from specified value.

See TransformValue, and ReplaceInfs for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, replace_map)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • replace_map ({str: Tuple[float, float]}Dict[str, Tuple[float, float]]) – Dictionary mapping column names to a tuple. The first value in the tuple will be used to replace -inf in that column, and the second value in the tuple will be used to replace +inf in that column.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class ReplaceNaNsValue(input_domain, input_metric, key, new_key, replace_map)#

Bases: TransformValue

Applies a ReplaceNaNs to create a new element from specified value.

See TransformValue, and ReplaceNaNs for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, replace_map)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • replace_map ({str: Any}Dict[str, Any]) – Dictionary mapping column names to value to be used for replacing NaNs in that column.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class ReplaceNullsValue(input_domain, input_metric, key, new_key, replace_map)#

Bases: TransformValue

Applies a ReplaceNulls to create a new element from specified value.

See TransformValue, and ReplaceNulls for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, replace_map)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • replace_map ({str: Any}Dict[str, Any]) – Dictionary mapping column names to value to be used for replacing nulls in that column.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class PersistValue(input_domain, input_metric, key, new_key)#

Bases: TransformValue

Applies a Persist to create a new element from specified value.

See TransformValue, and Persist for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class UnpersistValue(input_domain, input_metric, key, new_key)#

Bases: TransformValue

Applies a Unpersist to create a new element from specified value.

See TransformValue, and Unpersist for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class SparkActionValue(input_domain, input_metric, key, new_key)#

Bases: TransformValue

Applies a SparkAction to create a new element from specified value.

See TransformValue, and SparkAction for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class RenameValue(input_domain, input_metric, key, new_key, rename_mapping)#

Bases: TransformValue

Applies a Rename to create a new element from specified value.

See TransformValue, and Rename for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, rename_mapping)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • rename_mapping ({str: str}Dict[str, str]) – Dictionary from existing column names to target column names.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.

class SelectValue(input_domain, input_metric, key, new_key, columns)#

Bases: TransformValue

Applies a Select to create a new element from specified value.

See TransformValue, and Select for more information.

Methods#

transformation()

Returns the transformation that will be applied to create the new element.

key()

Returns the key for the DataFrame to transform.

new_key()

Returns the new key for the transformed DataFrame.

stability_function()

Returns the smallest d_out satisfied by the transformation.

__call__()

Returns a new dictionary augmented with the transformed DataFrame.

input_domain()

Return input domain for the measurement.

input_metric()

Distance metric on input domain.

output_domain()

Return input domain for the measurement.

output_metric()

Distance metric on input domain.

stability_relation()

Returns True only if close inputs produce close outputs.

__or__()

Return this transformation chained with another component.

Parameters
__init__(input_domain, input_metric, key, new_key, columns)#

Constructor.

Parameters
  • input_domain (DictDomainDictDomain) – The Domain of the input dictionary of Spark DataFrames.

  • input_metric (AddRemoveKeysAddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.

  • key (AnyAny) – The key for the DataFrame to transform.

  • new_key (AnyAny) – The key to put the transformed output in. The key must not already be in the input domain.

  • columns (List[str]List[str]) – A list of existing column names to keep.

property transformation#

Returns the transformation that will be applied to create the new element.

Return type

tmlt.core.transformations.base.Transformation

property key#

Returns the key for the DataFrame to transform.

Return type

Any

property new_key#

Returns the new key for the transformed DataFrame.

Return type

Any

stability_function(d_in)#

Returns the smallest d_out satisfied by the transformation.

See the privacy and stability tutorial (add link?) for more information.

Parameters

d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.

Raises

NotImplementedError – If not overridden.

Return type

tmlt.core.utils.exact_number.ExactNumber

__call__(data)#

Returns a new dictionary augmented with the transformed DataFrame.

Parameters

data (Dict[Any, pyspark.sql.DataFrame]) –

Return type

Dict[Any, pyspark.sql.DataFrame]

property input_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property input_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

property output_domain#

Return input domain for the measurement.

Return type

tmlt.core.domains.base.Domain

property output_metric#

Distance metric on input domain.

Return type

tmlt.core.metrics.Metric

stability_relation(d_in, d_out)#

Returns True only if close inputs produce close outputs.

See the privacy and stability tutorial (add link?) for more information.

Parameters
  • d_in (Any) – Distance between inputs under input_metric.

  • d_out (Any) – Distance between outputs under output_metric.

Return type

bool

__or__(other: Transformation) Transformation#
__or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement

Return this transformation chained with another component.