Changelog#

0.18.0 - 2025-01-14#

This release drops support for older versions of Python and Spark, improves the performance of bounds-finding, and makes additional minor miscellaneous changes.

Added#

  • join() now supports left_anti joins. Note that the Core join transformations still do not support left_anti joins.

Changed#

Removed#

  • Python 3.8 and PySpark versions earlier than 3.3.1 are no longer supported.

Fixed#

  • Fixed a bug in NoisyBounds, now SparseVectorPrefixSums, that would try to select an upper bound larger than the maximum 64-bit integer, leading to an overflow.

Changed#

  • Improved performance of noise addition mechanisms under infinite budgets.

0.17.0 - 2024-10-02#

This release changes the behavior of RowToRowTransformation, RowToRowsTransformation, and RowsToRowsTransformation (and thus Map, FlatMap, and FlatMapByKey) so that they catch many function outputs that would be invalid under their output domains.

Note

Tumult Core 0.17 will be the last minor version to support Python 3.8 and PySpark versions below 3.3.1. If you are using Python 3.8 or one of these versions of PySpark, you will need to upgrade them in order to use Tumult Core 0.18.0.

Fixed#

0.16.5 - 2024-08-29#

This release fixes a bug in 0.16.3. CI problems meant 0.16.4 was unavailable.

Fixed#

  • Fixed an incorrect type declaration that caused typeguard errors.

0.16.3 - 2024-08-22#

0.16.3 was yanked. The changes have been incorporated into 0.16.5.

This is a maintenance release that does not include user-visible changes.

0.16.2 - 2024-08-14#

Fixed#

  • The FlatMapByKey transformation was incorrectly turning some NaNs into nulls and vice versa when converting the input dataframe into the input for the user-defined transformer function and when converting the output of that function back into a dataframe. This should no longer occur.

0.16.1 - 2024-08-01#

Fixed#

  • Fixed bug in lower and upper bound tuple value ordering in create_bounds_measurement(). The lower bound is now the first element and the upper bound is the second element.

0.16.0 - 2024-07-29#

Added#

  • Added a way to construct a bounds measurement per-group using create_bounds_measurement().

  • Added FlatMapByKey, a transformation for combining all records sharing a key under the IfGroupedBy("key", SymmetricDifference()) metric into an arbitrary collection of other records with the same key using a user-defined function. In addition, added the FlatMapByKeyValue transformation, which performs this same operation on a table under an AddRemoveKeys metric.

  • Added RowsToRowsTransformation, a transformation mapping a set of records to another set of records using a user-defined function.

Changed#

  • Refactored bounds measurement to use a Pandas UDF. BoundSelection measurement was removed and equivalent NoisyBounds was added.

  • Renamed create_bound_selection_measurement to create_bounds_measurement(). The bound_column parameter was renamed to measure_column.

Removed#

  • Removed support for Pandas 1.2 and 1.3 due to a known bug in Pandas versions below 1.4.

0.15.2 - 2024-07-15#

Fixed#

0.15.1 - 2024-07-05#

This release replaces Tumult Core 0.15.0, which was yanked. Support for Pandas 2.0 has been reverted due to conflicts with PySpark. Python 3.12 support should be considered experimental; a version with official support will be released once PySpark 4.0 becomes available.

0.15.0 - 2024-06-26#

Note

Tumult Core 0.15.0 was yanked due to conflicts between PySpark and Pandas 2.0.

Added#

  • Added support for Python 3.12.

Removed#

  • Removed support for Python 3.7.

0.14.2 - 2024-06-17#

Added#

  • Added support for left public joins to PublicJoin, previously only inner joins were supported.

0.14.1 - 2024-06-04#

Added#

  • Tumult Core now runs natively on Apple silicon, supporting Python 3.9 and above.

Removed#

  • Provided binary wheels for macOS now support only macOS 12 (Monterey) and above.

0.14.0 - 2024-05-16#

Added#

Fixed#

  • Stopped trying to set extra options for Java 11 and removed error when options are not set. Removed both check_java11() function and SparkConfigError exception.

  • Updated minimum supported Spark version to 3.1.1 to prevent Java 11 error.

0.13.0 - 2024-04-03#

Changed#

Fixed#

  • SumGrouped now correctly handles the case with both empty input dataframes and empty group keys.

  • SumGrouped, CountDistinct, and CountDistinctGrouped now always returns the correct output datatypes.

  • tmlt.core.domains.collections.DictDomain.validate() will no longer raise a TypeError when its dictionary keys cannot be sorted.

0.12.0 - 2024-02-26#

Added#

  • Added a non-truncating truncation strategy with infinite stability.

  • Added functions implementing various mechanisms to support slow scaling PRDP.

Changed#

Fixed#

0.11.6 - 2024-02-21#

0.11.6 was yanked. Those changes will be released in 0.12.0.

0.11.5 - 2023-11-29#

Fixed#

  • Addressed a serious security vulnerability in PyArrow: CVE-2023-47248.

    • Python 3.8+ now requires PyArrow 14.0.1 or higher, which is the recommended fix and addresses the vulnerability.

    • Python 3.7 uses the hotfix, as PyArrow 14.0.1 is not compatible with Python 3.7. Note that if you are using 3.7 the hotfix must be imported before your Spark code. Core imports the hotfix, so importing Core before Spark will also work.

    • It is strongly recommended to upgrade if you are using an older version of Core.

    • Also see the GitHub Advisory entry for more information.

  • Fixed a reference to an uninitialized variable that could cause arb_union() to crash the Python interpreter.

0.11.4 - 2023-11-01#

Fixed a typo that prevented PyArrow from being installed on Python 3.8.

0.11.3 - 2023-10-31#

Fixed a typo that prevented PySpark from being installed on Python 3.8.

0.11.2 - 2023-10-27#

Added#

  • Added support for Python 3.11.

0.11.1 - 2023-09-25#

Added#

  • Added documentation for known vulnerabilities related to Parallel Composition and the use of SymPy.

0.11.0 - 2023-08-15#

Changed#

0.10.2 - 2023-07-18#

Changed#

  • Build wheels for macOS 11 instead of macOS 13.

  • Updated dependency version for typing_extensions to 4.1.0

0.10.1 - 2023-06-08#

Added#

Changed#

  • Restructured the repository to keep code under the src/ directory.

0.10.0 - 2023-05-17#

Added#

  • Added the BoundSelection spark measurement.

Changed#

  • Replaced many existing exceptions in Core with new classes that contain metadata about the inputs causing the exception.

Fixed#

  • Fixed bug in limit_keys_per_group().

  • Fixed bug in gaussian().

  • cleanup() now emits a warning rather than an exception if it fails to get a Spark session. This should prevent unexpected exceptions in the atexit cleanup handler.

0.9.2 - 2023-05-16#

0.9.2 was yanked, as it contained breaking changes. Those changes will be released in 0.10.0.

0.9.1 - 2023-04-20#

Added#

  • Subclasses of Measure now have equations defining the distance they represent.

0.9.0 - 2023-04-14#

Added#

  • join, which contains utilities for validating join parameters, propogating domains through joins, and joining dataframes.

Changed#

Fixed#

  • groupby no longer outputs nan values when both tables are views on the same original table

  • private join no longer drops Nulls on non-join columns when join_on_nulls=False

  • groupby average and variance no longer drops groups containing null values

0.8.3 - 2023-03-08#

Changed#

0.8.2 - 2023-03-02#

Added#

Changed#

  • Updated LimitKeysPerGroup to require an output metric, and to support the IfGroupedBy(grouping_column, SymmetricDifference()) output metric. Dropped the use_l2 parameter.

0.8.1 - 2023-02-24#

Added#

Changed#

0.8.0 - 2023-02-14#

Added#

Changed#

  • Updated LimitRowsPerGroup to require an output metric, and to support the IfGroupedBy(column, SymmetricDifference()) output metric.

  • Added a check so that TransformValue can no longer be instantiated without subclassing.

0.7.0 - 2023-02-02#

Added#

  • Added measurement for adding Gaussian noise.

0.6.3 - 2022-12-20#

Changed#

  • On Linux, Core previously used MPIR as a multi-precision arithmetic library to support FLINT and Arb. MPIR is no longer maintained, so Core now uses GMP instead. This change does not affect macOS builds, which have always used GMP, and does not change Core’s Python API.

Fixed#

  • Fixed a bug where PrivateJoin’s privacy relation would only accept string keys in the d_in. It now accepts any type of key.

0.6.2 - 2022-12-07#

This is a maintenance release which introduces a number of documentation improvements, but has no publicly-visible API changes.

Fixed#

  • tmlt.core.utils.configuration.check_java11() now has the correct behavior when Java is not installed.

0.6.1 - 2022-12-05#

Added#

  • Added approximate DP support to interactive mechanisms.

  • Added support for Spark 3.1 through 3.3, in addition to existing support for Spark 3.0.

Fixed#

  • Validation for SparkedGroupDataFrameDomains used to fail with a Spark AnalysisException in some environments. That should no longer happen.

0.6.0 - 2022-11-14#

Added#

  • Added new PrivateJoinOnKey transformation that works with AddRemoveKeys.

  • Added inverse CDF methods to noise mechanisms.

0.5.1 - 2022-11-03#

Fixed#

  • Domains and metrics make copies of mutable constructor arguments and return copies of mutable properties.

0.5.0 - 2022-10-14#

Changed#

  • Core no longer depends on the python-flint package, and instead packages libflint and libarb itself. Binary wheels are available, and the source distribution includes scripting to build these dependencies from source.

Fixed#

  • Equality checks on SparkGroupedDataFrameDomains used to occasionally fail with a Spark AnalysisException in some environments. That should no longer happen.

  • AddRemoveKeys now allows different names for the key column in each dataframe.

0.4.3 - 2022-09-01#

  • Core now checks to see if the user is running Java 11 or higher. If they are, Core either sets the appropriate Spark options (if Spark is not yet running) or raises an informative exception (if Spark is running and configured incorrectly).

0.4.2 - 2022-08-24#

Changed#

0.4.1 - 2022-07-25#

Added#

  • Added an alternate prng for non-intel architectures that don’t support RDRAND.

  • Add new metric AddRemoveKeys for multiple tables using IfGroupedBy(X, SymmetricDifference()).

  • Add new TransformValue base class for wrapping transformations to support AddRemoveKeys.

  • Add many new transformations using TransformValue: FilterValue, PublicJoinValue, FlatMapValue, MapValue, DropInfsValue, DropNaNsValue, DropNullsValue, ReplaceInfsValue, ReplaceNaNsValue, ReplaceNullsValue, PersistValue, UnpersistValue, SparkActionValue, RenameValue, SelectValue.

Changed#

  • Fixed bug in ReplaceNulls to not allow replacing values for grouping column in IfGroupedBy.

  • Changed ReplaceNulls, ReplaceNaNs, and ReplaceInfs to only support specific IfGroupedBy metrics.

0.3.2 - 2022-06-23#

Changed#

  • Moved IMMUTABLE_TYPES from utils/testing.py to utils/type_utils.py to avoid importing nose when accessing IMMUTABLE_TYPES.

0.3.1 - 2022-06-23#

Changed#

  • Fixed copy_if_mutable so that it works with containers that can’t be deep-copied.

  • Reverted change from 0.3.0 “Add checks in ParallelComposition constructor to only permit L1/L2 over SymmetricDifference or AbsoluteDifference.”

  • Temporarily disabled flaky statistical tests.

0.3.0 - 2022-06-22#

Added#

  • Added new transformations DropInfs and ReplaceInfs for handling infinities in data.

  • Added IfGroupedBy(X, SymmetricDifference()) input metric.

    • Added support for this metric to Filter, Map, FlatMap, PublicJoin, Select, Rename, DropNaNs, DropNulls, DropInfs, ReplaceNulls, ReplaceNaNs, and ReplaceInfs.

  • Added new truncation transformations for IfGroupedBy(X, SymmetricDifference()): LimitRowsPerGroup, LimitKeysPerGroup

  • Added AddUniqueColumn for switching from SymmetricDifference to IfGroupedBy(X, SymmetricDifference()).

  • Added a topic guide around NaNs, nulls and infinities.

Changed#

  • Moved truncation transformations used by PrivateJoin to be functions (now in utils/truncation.py).

  • Change GroupBy and PartitionByKeys to have an use_l2 argument instead of output_metric.

  • Fixed bug in AddUniqueColumn.

  • Operations that group on null values are now supported.

  • Modify CountDistinctGrouped and CountDistinct so they work as expected with null values.

  • Changed ReplaceNulls, ReplaceNaNs, and ReplaceInfs to only support specific IfGroupedBy metrics.

  • Fixed bug in ReplaceNulls to not allow replacing values for grouping column in IfGroupedBy.

  • PrivateJoin has a new parameter for __init__: join_on_nulls. When join_on_nulls is True, the PrivateJoin can join null values between both dataframes.

  • Changed transformations and measurements to make a copy of mutable constructor arguments.

  • Add checks in ParallelComposition constructor to only permit L1/L2 over SymmetricDifference or AbsoluteDifference.

Removed#

  • Removed old examples from examples/. Future examples will be added directly to the documentation.

0.2.0 - 2022-04-12 (internal release)#

Added#

  • Added SparkDateColumnDescriptor and SparkTimestampColumnDescriptor, enabling support for Spark dates and timestamps.

  • Added two exception types, InsufficientBudgetError and InactiveAccountantError, to PrivacyAccountants.

  • Future documentation will include any exceptions defined in this library.

  • Added cleanup.remove_all_temp_tables() function, which will remove all temporary tables created by Core.

  • Added new components DropNaNs, DropNulls, ReplaceNulls, and ReplaceNaNs.

0.1.1 - 2022-02-24 (internal release)#

Added#

  • Added new implementations for SequentialComposition and ParallelComposition.

  • Added new spark transformations: Persist, Unpersist and SparkAction.

  • Added PrivacyAccountant.

  • Installation on Python 3.7.1 through 3.7.3 is now allowed.

  • Added DecorateQueryable, DecoratedQueryable and create_adaptive_composition components.

Changed#

  • Fixed a bug where create_quantile_measurement would always be created with PureDP as the output measure.

  • PySparkTest now runs tmlt.core.utils.cleanup.cleanup() during tearDownClass.

  • Refactored noise distribution tests.

  • Remove sorting from GroupedDataFrame.apply_in_pandas and GroupedDataFrame.agg.

  • Repartition DataFrames output by SparkMeasurement to prevent privacy violation.

  • Updated repartitioning in SparkMeasurement to use a random column.

  • Changed quantile implementation to use arblib.

  • Changed Laplace implementation to use arblib.

Removed#

  • Removed ExponentialMechanism and PermuteAndFlip components.

  • Removed AddNoise, AddLaplaceNoise, AddGeometricNoise, and AddDiscreteGaussianNoise from tmlt.core.measurements.pandas.series.

  • Removed SequentialComposition, ParallelComposition and corresponding Queryables from tmlt.core.measurements.composition.

  • Removed tmlt.core.transformations.cache.

0.1.0 - 2022-02-14 (internal release)#

Added#

  • Initial release.