add_remove_keys#
Transformations that transform dictionaries using AddRemoveKeys.
Note that several of the transformations in dictionary also support
AddRemoveKeys. In particular
The transformations defined in this module are required because
AugmentDictTransformation is not stable under AddRemoveKeys for
all transformations.
For example, consider the following example:
>>> # Create transformation
>>> input_domain = SparkDataFrameDomain(
... {
... "A": SparkStringColumnDescriptor(),
... "B": SparkStringColumnDescriptor(),
... }
... )
>>> input_metric = IfGroupedBy("A", SymmetricDifference())
>>> truncate = LimitRowsPerGroup(
... input_domain=input_domain,
... output_metric=SymmetricDifference(),
... grouping_column="A",
... threshold=1,
... )
>>> rename = Rename(
... input_domain=input_domain,
... metric=SymmetricDifference(),
... rename_mapping={"A": "C", "B": "D"},
... )
>>> create_unique_column = AddUniqueColumn(
... input_domain=rename.output_domain,
... column="A",
... )
>>> transformation = truncate | rename | create_unique_column
>>> # Create data
>>> x1 = spark.createDataFrame(
... [["a", "1"], ["b", "2"], ["c", "3"]], ["A", "B"]
... )
>>> x2 = spark.createDataFrame(
... [["b", "2"], ["c", "3"]], ["A", "B"]
... )
>>> print_sdf(x1)
A B
0 a 1
1 b 2
2 c 3
>>> print_sdf(x2)
A B
0 b 2
1 c 3
>>> y1 = transformation(x1)
>>> y2 = transformation(x2)
>>> print_sdf(y1) # Note that the values below are in fact unique (after the 5B226)
C D A
0 a 1 5B2261222C2231222C2231225D
1 b 2 5B2262222C2232222C2231225D
2 c 3 5B2263222C2233222C2231225D
>>> print_sdf(y2)
C D A
0 b 2 5B2262222C2232222C2231225D
1 c 3 5B2263222C2233222C2231225D
>>> # Check stability
>>> input_metric.distance(x1, x2, input_domain)
1
>>> input_metric.distance(y1, y2, transformation.output_domain)
1
>>> # Check stability as if it was Augmented using AugmentDictTransformation
>>> dict_x1 = {"start": x1}
>>> dict_x2 = {"start": x2}
>>> dict_y1 = {"start": x1, "end": y1}
>>> dict_y2 = {"start": x2, "end": y2}
>>> dict_input_domain = DictDomain({"start": input_domain})
>>> dict_input_metric = AddRemoveKeys({"start": "A"})
>>> dict_output_domain = DictDomain(
... {
... "start": input_domain,
... "end": transformation.output_domain
... }
... )
>>> dict_output_metric = AddRemoveKeys({"start": "A", "end": "A"})
>>> # Naively you would expect the stability to be 1, but in this example it is 2
>>> dict_input_metric.distance(dict_x1, dict_x2, dict_input_domain)
1
>>> dict_output_metric.distance(dict_y1, dict_y2, dict_output_domain)
2
Conceptually, what is happening in the example above is that the transformation is changing the meaning of the key column. The column “A” that is in the input data is not the same as the column “A” that is in the output data, so removing one value, “a”, in the input dictionary results in both “a” and “a,1” being removed in the output dictionary.
Classes#
Base class transforming a specified key using an existing transformation. |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
- class TransformValue(input_domain, input_metric, transformation, key, new_key)#
Bases:
tmlt.core.transformations.base.TransformationBase class transforming a specified key using an existing transformation.
This class can be subclassed for the purposes of making a claim that a kind of Transformation (like
Filter) can be applied to a DataFrame and augment the input dictionary with the output without violating the closeness of neighboring dataframes withAddRemoveKeys.NOTE: This class cannot be instantiated directly.
- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
transformation (tmlt.core.transformations.base.Transformation)
key (Any)
new_key (Any)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, transformation, key, new_key)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.transformation (
Transformation) – The DataFrame to DataFrame transformation to apply. Input and output metric must both beIfGroupedBy(column, SymmetricDifference())using the samecolumn.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class LimitRowsPerGroupValue(input_domain, input_metric, key, new_key, threshold)#
Bases:
TransformValueApplies a
LimitRowsPerGroupto the specified key.See
TransformValueandLimitRowsPerGroupfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
threshold (int)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, threshold)#
Constructor.
- Parameters:
input_domain (
DictDomain) – Domain of input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – Input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.threshold (
int) – The maximum number of rows per group after truncation.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class LimitKeysPerGroupValue(input_domain, input_metric, key, new_key, key_column, threshold)#
Bases:
TransformValueApplies a
LimitKeysPerGroupto the specified key.See
TransformValueandLimitKeysPerGroupfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
key_column (str)
threshold (int)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, key_column, threshold)#
Constructor.
- Parameters:
input_domain (
DictDomain) – Domain of input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – Input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.key_column (
str) – Name of column defining the keys.threshold (
int) – The maximum number of keys per group after truncation.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class LimitRowsPerKeyPerGroupValue(input_domain, input_metric, key, new_key, key_column, threshold)#
Bases:
TransformValueApplies a
LimitRowsPerKeyPerGroupto the specified key.See
TransformValueandLimitRowsPerKeyPerGroupfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
key_column (str)
threshold (int)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, key_column, threshold)#
Constructor.
- Parameters:
input_domain (
DictDomain) – Domain of input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – Input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.key_column (
str) – Name of column defining the keys.threshold (
int) – The maximum number of rows each unique (key, grouping column value) pair may appear in after truncation.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class FilterValue(input_domain, input_metric, key, new_key, filter_expr)#
Bases:
TransformValueApplies a
Filterto create a new element from specified value.See
TransformValue, andFilterfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
filter_expr (str)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, filter_expr)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.filter_expr (
str) – A string of SQL expression specifying the filter to apply to the data. The language is the same as the one used bypyspark.sql.DataFrame.filter().
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class PublicJoinValue(input_domain, input_metric, key, new_key, public_df, public_df_domain=None, join_cols=None, join_on_nulls=False)#
Bases:
TransformValueApplies a
PublicJointo create a new element from specified value.See
TransformValue, andPublicJoinfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
public_df (pyspark.sql.DataFrame)
public_df_domain (Optional[tmlt.core.domains.spark_domains.SparkDataFrameDomain])
join_cols (Optional[List[str]])
join_on_nulls (bool)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, public_df, public_df_domain=None, join_cols=None, join_on_nulls=False)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.public_df (
DataFrame) – A Spark DataFrame to join with.public_df_domain (
Optional[SparkDataFrameDomain]) – Domain of public DataFrame to join with. If this domain indicates that a float column does not allow nans (or infs), all rows inpublic_dfcontaining a nan (or an inf) in that column will be dropped. If None, domain is inferred from the schema ofpublic_dfand any float column will be marked as allowing inf and nan values.join_cols (
Optional[List[str]]) – Names of columns to join on. If None, a natural join is performed.join_on_nulls (
bool) – If True, null values on corresponding join columns of the public and private dataframes will be considered to be equal.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class FlatMapByKeyValue(input_domain, input_metric, key, new_key, row_transformer)#
Bases:
TransformValueApplies a
FlatMapByKeyto create a new element from specified value.See
TransformValueandFlatMapByKeyfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
row_transformer (tmlt.core.transformations.spark_transformations.map.RowsToRowsTransformation)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, row_transformer)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.row_transformer (
RowsToRowsTransformation) – Transformation to apply to each group of rows.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class FlatMapValue(input_domain, input_metric, key, new_key, row_transformer, max_num_rows)#
Bases:
TransformValueApplies a
FlatMapto create a new element from specified value.See
TransformValue, andFlatMapfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
row_transformer (tmlt.core.transformations.spark_transformations.map.RowToRowsTransformation)
max_num_rows (Optional[int])
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, row_transformer, max_num_rows)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.row_transformer (
RowToRowsTransformation) – Transformation to apply to each row.max_num_rows (
Optional[int]) – The maximum number of rows to allow fromrow_transformer. If more rows are output, the additional rows are suppressed. If this value is None, the transformation will not impose a limit on the number of rows.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class MapValue(input_domain, input_metric, key, new_key, row_transformer)#
Bases:
TransformValueApplies a
Mapto create a new element from specified value.See
TransformValue, andMapfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
row_transformer (tmlt.core.transformations.spark_transformations.map.RowToRowTransformation)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, row_transformer)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.row_transformer (
RowToRowTransformation) – Transformation to apply to each row.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class DropInfsValue(input_domain, input_metric, key, new_key, columns)#
Bases:
TransformValueApplies a
DropInfsto create a new element from specified value.See
TransformValue, andDropInfsfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
columns (List[str])
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, columns)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class DropNaNsValue(input_domain, input_metric, key, new_key, columns)#
Bases:
TransformValueApplies a
DropNaNsto create a new element from specified value.See
TransformValue, andDropNaNsfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
columns (List[str])
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, columns)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class DropNullsValue(input_domain, input_metric, key, new_key, columns)#
Bases:
TransformValueApplies a
DropNullsto create a new element from specified value.See
TransformValue, andDropNullsfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
columns (List[str])
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, columns)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class ReplaceInfsValue(input_domain, input_metric, key, new_key, replace_map)#
Bases:
TransformValueApplies a
ReplaceInfsto create a new element from specified value.See
TransformValue, andReplaceInfsfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, replace_map)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.replace_map (
Dict[str,Tuple[float,float]]) – Dictionary mapping column names to a tuple. The first value in the tuple will be used to replace -inf in that column, and the second value in the tuple will be used to replace +inf in that column.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class ReplaceNaNsValue(input_domain, input_metric, key, new_key, replace_map)#
Bases:
TransformValueApplies a
ReplaceNaNsto create a new element from specified value.See
TransformValue, andReplaceNaNsfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
replace_map (Dict[str, Any])
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, replace_map)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.replace_map (
Dict[str,Any]) – Dictionary mapping column names to value to be used for replacing NaNs in that column.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class ReplaceNullsValue(input_domain, input_metric, key, new_key, replace_map)#
Bases:
TransformValueApplies a
ReplaceNullsto create a new element from specified value.See
TransformValue, andReplaceNullsfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
replace_map (Dict[str, Any])
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, replace_map)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.replace_map (
Dict[str,Any]) – Dictionary mapping column names to value to be used for replacing nulls in that column.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class PersistValue(input_domain, input_metric, key, new_key)#
Bases:
TransformValueApplies a
Persistto create a new element from specified value.See
TransformValue, andPersistfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class UnpersistValue(input_domain, input_metric, key, new_key)#
Bases:
TransformValueApplies a
Unpersistto create a new element from specified value.See
TransformValue, andUnpersistfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class SparkActionValue(input_domain, input_metric, key, new_key)#
Bases:
TransformValueApplies a
SparkActionto create a new element from specified value.See
TransformValue, andSparkActionfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class RenameValue(input_domain, input_metric, key, new_key, rename_mapping)#
Bases:
TransformValueApplies a
Renameto create a new element from specified value.See
TransformValue, andRenamefor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, rename_mapping)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.rename_mapping (
Dict[str,str]) – Dictionary from existing column names to target column names.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class SelectValue(input_domain, input_metric, key, new_key, columns)#
Bases:
TransformValueApplies a
Selectto create a new element from specified value.See
TransformValue, andSelectfor more information.- Parameters:
input_domain (tmlt.core.domains.collections.DictDomain)
input_metric (tmlt.core.metrics.AddRemoveKeys)
key (Any)
new_key (Any)
columns (List[str])
- property transformation: tmlt.core.transformations.base.Transformation#
Returns the transformation that will be applied to create the new element.
- Return type:
- property key: Any#
Returns the key for the DataFrame to transform.
- Return type:
Any
- property new_key: Any#
Returns the new key for the transformed DataFrame.
- Return type:
Any
- property input_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property input_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- property output_domain: tmlt.core.domains.base.Domain#
Return input domain for the measurement.
- Return type:
- property output_metric: tmlt.core.metrics.Metric#
Distance metric on input domain.
- Return type:
- __init__(input_domain, input_metric, key, new_key, columns)#
Constructor.
- Parameters:
input_domain (
DictDomain) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys) – The input metric for the outer dictionary to dictionary transformation.key (
Any) – The key for the DataFrame to transform.new_key (
Any) – The key to put the transformed output in. The key must not already be in the input domain.columns (
List[str]) – A list of existing column names to keep.
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises:
NotImplementedError – If not overridden.
- Return type:
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters:
data (Dict[Any, pyspark.sql.DataFrame])
- Return type:
Dict[Any, pyspark.sql.DataFrame]
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters:
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type:
- __or__(other: Transformation) Transformation#
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.