add_remove_keys#
Transformations that transform dictionaries using AddRemoveKeys
.
Note that several of the transformations in dictionary
also support
AddRemoveKeys
. In particular
The transformations defined in this module are required because
AugmentDictTransformation
is not stable under AddRemoveKeys
for
all transformations.
For example, consider the following example:
>>> # Create transformation
>>> input_domain = SparkDataFrameDomain(
... {
... "A": SparkStringColumnDescriptor(),
... "B": SparkStringColumnDescriptor(),
... }
... )
>>> input_metric = IfGroupedBy("A", SymmetricDifference())
>>> truncate = LimitRowsPerGroup(
... input_domain=input_domain,
... output_metric=SymmetricDifference(),
... grouping_column="A",
... threshold=1,
... )
>>> rename = Rename(
... input_domain=input_domain,
... metric=SymmetricDifference(),
... rename_mapping={"A": "C", "B": "D"},
... )
>>> create_unique_column = AddUniqueColumn(
... input_domain=rename.output_domain,
... column="A",
... )
>>> transformation = truncate | rename | create_unique_column
>>> # Create data
>>> x1 = spark.createDataFrame(
... [["a", "1"], ["b", "2"], ["c", "3"]], ["A", "B"]
... )
>>> x2 = spark.createDataFrame(
... [["b", "2"], ["c", "3"]], ["A", "B"]
... )
>>> print_sdf(x1)
A B
0 a 1
1 b 2
2 c 3
>>> print_sdf(x2)
A B
0 b 2
1 c 3
>>> y1 = transformation(x1)
>>> y2 = transformation(x2)
>>> print_sdf(y1) # Note that the values below are in fact unique (after the 5B226)
C D A
0 a 1 5B2261222C2231222C2231225D
1 b 2 5B2262222C2232222C2231225D
2 c 3 5B2263222C2233222C2231225D
>>> print_sdf(y2)
C D A
0 b 2 5B2262222C2232222C2231225D
1 c 3 5B2263222C2233222C2231225D
>>> # Check stability
>>> input_metric.distance(x1, x2, input_domain)
1
>>> input_metric.distance(y1, y2, transformation.output_domain)
1
>>> # Check stability as if it was Augmented using AugmentDictTransformation
>>> dict_x1 = {"start": x1}
>>> dict_x2 = {"start": x2}
>>> dict_y1 = {"start": x1, "end": y1}
>>> dict_y2 = {"start": x2, "end": y2}
>>> dict_input_domain = DictDomain({"start": input_domain})
>>> dict_input_metric = AddRemoveKeys({"start": "A"})
>>> dict_output_domain = DictDomain(
... {
... "start": input_domain,
... "end": transformation.output_domain
... }
... )
>>> dict_output_metric = AddRemoveKeys({"start": "A", "end": "A"})
>>> # Naively you would expect the stability to be 1, but in this example it is 2
>>> dict_input_metric.distance(dict_x1, dict_x2, dict_input_domain)
1
>>> dict_output_metric.distance(dict_y1, dict_y2, dict_output_domain)
2
Conceptually, what is happening in the example above is that the transformation is changing the meaning of the key column. The column “A” that is in the input data is not the same as the column “A” that is in the output data, so removing one value, “a”, in the input dictionary results in both “a” and “a,1” being removed in the output dictionary.
Classes#
Base class transforming a specified key using an existing transformation. |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
|
Applies a |
- class TransformValue(input_domain, input_metric, transformation, key, new_key)#
Bases:
tmlt.core.transformations.base.Transformation
Base class transforming a specified key using an existing transformation.
This class can be subclassed for the purposes of making a claim that a kind of Transformation (like
Filter
) can be applied to a DataFrame and augment the input dictionary with the output without violating the closeness of neighboring dataframes withAddRemoveKeys
.NOTE: This class cannot be instantiated directly.
# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
transformation (tmlt.core.transformations.base.Transformation) –
key (Any) –
new_key (Any) –
- __init__(input_domain, input_metric, transformation, key, new_key)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.transformation (
Transformation
Transformation
) – The DataFrame to DataFrame transformation to apply. Input and output metric must both be IfGroupedBy(column, SymmetricDifference()) using the same column.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class LimitRowsPerGroupValue(input_domain, input_metric, key, new_key, threshold)#
Bases:
TransformValue
Applies a
LimitRowsPerGroup
to the specified key.See
TransformValue
andLimitRowsPerGroup
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
threshold (int) –
- __init__(input_domain, input_metric, key, new_key, threshold)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – Domain of input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – Input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.threshold (
int
int
) – The maximum number of rows per group after truncation.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class LimitKeysPerGroupValue(input_domain, input_metric, key, new_key, key_column, threshold)#
Bases:
TransformValue
Applies a
LimitKeysPerGroup
to the specified key.See
TransformValue
andLimitKeysPerGroup
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
key_column (str) –
threshold (int) –
- __init__(input_domain, input_metric, key, new_key, key_column, threshold)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – Domain of input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – Input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.threshold (
int
int
) – The maximum number of keys per group after truncation.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class LimitRowsPerKeyPerGroupValue(input_domain, input_metric, key, new_key, key_column, threshold)#
Bases:
TransformValue
Applies a
LimitRowsPerKeyPerGroup
to the specified key.See
TransformValue
andLimitRowsPerKeyPerGroup
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
key_column (str) –
threshold (int) –
- __init__(input_domain, input_metric, key, new_key, key_column, threshold)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – Domain of input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – Input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.threshold (
int
int
) – The maximum number of rows each unique (key, grouping column value) pair may appear in after truncation.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class FilterValue(input_domain, input_metric, key, new_key, filter_expr)#
Bases:
TransformValue
Applies a
Filter
to create a new element from specified value.See
TransformValue
, andFilter
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
filter_expr (str) –
- __init__(input_domain, input_metric, key, new_key, filter_expr)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.filter_expr (
str
str
) – A string of SQL expression specifying the filter to apply to the data. The language is the same as the one used bypyspark.sql.DataFrame.filter()
.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class PublicJoinValue(input_domain, input_metric, key, new_key, public_df, public_df_domain=None, join_cols=None, join_on_nulls=False)#
Bases:
TransformValue
Applies a
PublicJoin
to create a new element from specified value.See
TransformValue
, andPublicJoin
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
public_df (pyspark.sql.DataFrame) –
public_df_domain (Optional[tmlt.core.domains.spark_domains.SparkDataFrameDomain]) –
join_cols (Optional[List[str]]) –
join_on_nulls (bool) –
- __init__(input_domain, input_metric, key, new_key, public_df, public_df_domain=None, join_cols=None, join_on_nulls=False)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.public_df (
DataFrame
DataFrame
) – A Spark DataFrame to join with.public_df_domain (
SparkDataFrameDomain
|None
Optional
[SparkDataFrameDomain
] (default:None
)) – Domain of public DataFrame to join with. If this domain indicates that a float column does not allow nans (or infs), all rows in public_df containing a nan (or an inf) in that column will be dropped. If None, domain is inferred from the schema of public_df and any float column will be marked as allowing inf and nan values.join_cols (
List
[str
] |None
Optional
[List
[str
]] (default:None
)) – Names of columns to join on. If None, a natural join is performed.join_on_nulls (
bool
bool
(default:False
)) – If True, null values on corresponding join columns of the public and private dataframes will be considered to be equal.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class FlatMapValue(input_domain, input_metric, key, new_key, row_transformer, max_num_rows)#
Bases:
TransformValue
Applies a
FlatMap
to create a new element from specified value.See
TransformValue
, andFlatMap
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
row_transformer (tmlt.core.transformations.spark_transformations.map.RowToRowsTransformation) –
max_num_rows (Optional[int]) –
- __init__(input_domain, input_metric, key, new_key, row_transformer, max_num_rows)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.row_transformer (
RowToRowsTransformation
RowToRowsTransformation
) – Transformation to apply to each row.max_num_rows (
int
|None
Optional
[int
]) – The maximum number of rows to allow from row_transformer. If more rows are output, the additional rows are suppressed. If this value is None, the transformation will not impose a limit on the number of rows.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class MapValue(input_domain, input_metric, key, new_key, row_transformer)#
Bases:
TransformValue
Applies a
Map
to create a new element from specified value.See
TransformValue
, andMap
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
row_transformer (tmlt.core.transformations.spark_transformations.map.RowToRowTransformation) –
- __init__(input_domain, input_metric, key, new_key, row_transformer)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.row_transformer (
RowToRowTransformation
RowToRowTransformation
) – Transformation to apply to each row.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class DropInfsValue(input_domain, input_metric, key, new_key, columns)#
Bases:
TransformValue
Applies a
DropInfs
to create a new element from specified value.See
TransformValue
, andDropInfs
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
columns (List[str]) –
- __init__(input_domain, input_metric, key, new_key, columns)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.columns (
List
[str
]List
[str
]) – Columns to drop +inf and -inf from.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class DropNaNsValue(input_domain, input_metric, key, new_key, columns)#
Bases:
TransformValue
Applies a
DropNaNs
to create a new element from specified value.See
TransformValue
, andDropNaNs
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
columns (List[str]) –
- __init__(input_domain, input_metric, key, new_key, columns)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class DropNullsValue(input_domain, input_metric, key, new_key, columns)#
Bases:
TransformValue
Applies a
DropNulls
to create a new element from specified value.See
TransformValue
, andDropNulls
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
columns (List[str]) –
- __init__(input_domain, input_metric, key, new_key, columns)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class ReplaceInfsValue(input_domain, input_metric, key, new_key, replace_map)#
Bases:
TransformValue
Applies a
ReplaceInfs
to create a new element from specified value.See
TransformValue
, andReplaceInfs
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
- __init__(input_domain, input_metric, key, new_key, replace_map)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.replace_map ({
str
:Tuple
[float
,float
]}Dict
[str
,Tuple
[float
,float
]]) – Dictionary mapping column names to a tuple. The first value in the tuple will be used to replace -inf in that column, and the second value in the tuple will be used to replace +inf in that column.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class ReplaceNaNsValue(input_domain, input_metric, key, new_key, replace_map)#
Bases:
TransformValue
Applies a
ReplaceNaNs
to create a new element from specified value.See
TransformValue
, andReplaceNaNs
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
replace_map (Dict[str, Any]) –
- __init__(input_domain, input_metric, key, new_key, replace_map)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.replace_map ({
str
:Any
}Dict
[str
,Any
]) – Dictionary mapping column names to value to be used for replacing NaNs in that column.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class ReplaceNullsValue(input_domain, input_metric, key, new_key, replace_map)#
Bases:
TransformValue
Applies a
ReplaceNulls
to create a new element from specified value.See
TransformValue
, andReplaceNulls
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
replace_map (Dict[str, Any]) –
- __init__(input_domain, input_metric, key, new_key, replace_map)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.replace_map ({
str
:Any
}Dict
[str
,Any
]) – Dictionary mapping column names to value to be used for replacing nulls in that column.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class PersistValue(input_domain, input_metric, key, new_key)#
Bases:
TransformValue
Applies a
Persist
to create a new element from specified value.See
TransformValue
, andPersist
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
- __init__(input_domain, input_metric, key, new_key)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class UnpersistValue(input_domain, input_metric, key, new_key)#
Bases:
TransformValue
Applies a
Unpersist
to create a new element from specified value.See
TransformValue
, andUnpersist
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
- __init__(input_domain, input_metric, key, new_key)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class SparkActionValue(input_domain, input_metric, key, new_key)#
Bases:
TransformValue
Applies a
SparkAction
to create a new element from specified value.See
TransformValue
, andSparkAction
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
- __init__(input_domain, input_metric, key, new_key)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class RenameValue(input_domain, input_metric, key, new_key, rename_mapping)#
Bases:
TransformValue
Applies a
Rename
to create a new element from specified value.See
TransformValue
, andRename
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
- __init__(input_domain, input_metric, key, new_key, rename_mapping)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.rename_mapping ({
str
:str
}Dict
[str
,str
]) – Dictionary from existing column names to target column names.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.
- class SelectValue(input_domain, input_metric, key, new_key, columns)#
Bases:
TransformValue
Applies a
Select
to create a new element from specified value.See
TransformValue
, andSelect
for more information.# Returns the transformation that will be applied to create the new element.
Returns the key for the DataFrame to transform.
Returns the new key for the transformed DataFrame.
Returns the smallest d_out satisfied by the transformation.
Returns a new dictionary augmented with the transformed DataFrame.
Return input domain for the measurement.
Distance metric on input domain.
Return input domain for the measurement.
Distance metric on input domain.
Returns True only if close inputs produce close outputs.
Return this transformation chained with another component.
- Parameters
input_domain (tmlt.core.domains.collections.DictDomain) –
input_metric (tmlt.core.metrics.AddRemoveKeys) –
key (Any) –
new_key (Any) –
columns (List[str]) –
- __init__(input_domain, input_metric, key, new_key, columns)#
Constructor.
- Parameters
input_domain (
DictDomain
DictDomain
) – The Domain of the input dictionary of Spark DataFrames.input_metric (
AddRemoveKeys
AddRemoveKeys
) – The input metric for the outer dictionary to dictionary transformation.new_key (
Any
Any
) – The key to put the transformed output in. The key must not already be in the input domain.columns (
List
[str
]List
[str
]) – A list of existing column names to keep.
- property transformation#
Returns the transformation that will be applied to create the new element.
- Return type
- property key#
Returns the key for the DataFrame to transform.
- Return type
Any
- property new_key#
Returns the new key for the transformed DataFrame.
- Return type
Any
- stability_function(d_in)#
Returns the smallest d_out satisfied by the transformation.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (tmlt.core.utils.exact_number.ExactNumberInput) – Distance between inputs under input_metric.
- Raises
NotImplementedError – If not overridden.
- Return type
- __call__(data)#
Returns a new dictionary augmented with the transformed DataFrame.
- Parameters
data (Dict[Any, pyspark.sql.DataFrame]) –
- Return type
Dict[Any, pyspark.sql.DataFrame]
- property input_domain#
Return input domain for the measurement.
- Return type
- property input_metric#
Distance metric on input domain.
- Return type
- property output_domain#
Return input domain for the measurement.
- Return type
- property output_metric#
Distance metric on input domain.
- Return type
- stability_relation(d_in, d_out)#
Returns True only if close inputs produce close outputs.
See the privacy and stability tutorial (add link?) for more information.
- Parameters
d_in (Any) – Distance between inputs under input_metric.
d_out (Any) – Distance between outputs under output_metric.
- Return type
- __or__(other: Transformation) Transformation #
- __or__(other: tmlt.core.measurements.base.Measurement) tmlt.core.measurements.base.Measurement
Return this transformation chained with another component.