0.4.2 - 2022-09-06#
Switched to Core version 0.4.3 to avoid warnings when evaluating some queries.
0.4.1 - 2022-08-25#
QueryBuilder.histogramfunction, which provides a shorthand for generating binned data counts.
Analytics now checks to see if the user is running Java 11 or higher. If they are, Analytics either sets the appropriate Spark options (if Spark is not yet running) or raises an informative exception (if Spark is running and configured incorrectly).
Improved documentation for
Switched to Core version 0.4.2, which contains a fix for an issue that sometimes caused queries to fail to be compiled.
0.4.0 - 2022-07-22#
Session.Builder.with_private_dataframenow have a
grouping_columnoption and support non-integer stabilities. This allows setting up grouping columns like those that result from grouping flatmaps when loading data. This is an advanced feature, and should be used carefully.
0.3.0 - 2022-06-23#
QueryBuilder.bin_columnand an associated
Dates may now be used in
Added support for DataFrames containing NaN and null values. Columns created by Map and FlatMap are now marked as potentially containing NaN and null values.
QueryBuilder.replace_null_and_nanfunction, which replaces null and NaN values with specified defaults.
QueryBuilder.replace_infinitefunction, which replaces positive and negative infinity values with specified defaults.
QueryBuilder.drop_null_and_nanfunction, which drops null and NaN values for specified columns.
QueryBuilder.drop_infinitefunction, which drops infinite values for specified columns.
Aggregations (sum, quantile, average, variance, and standard deviation) now silently drop null and NaN values before being performed.
Aggregations (sum, quantile, average, variance, and standard deviation) now silently clamp infinite values (+infinity and -infinity) to the query’s lower and upper bounds.
cleanupmodule with two functions: a
cleanupfunction to remove the current temporary table (which should be called before
spark.stop()), and a
remove_all_temp_tablesfunction that removes all temporary tables ever created by Analytics.
Added a topic guide in the documentation for Tumult Analytics’ treatment of null, NaN, and infinite values.
Backwards-incompatible: Sessions no longer allow DataFrames to contain a column named
""(the empty string).
Backwards-incompatible: You can no longer call
Session.Builder.with_privacy_budgetmultiple times on the same builder.
Backwards-incompatible: You can no longer call
Session.add_private_datamultiple times with the same source id.
Backwards-incompatible: Sessions now use the DataFrame’s schema to determine which columns are nullable.
Session.from_csvand CSV-related methods on
Session.Builderhave been removed. Instead, use
Session.from_dataframeand other dataframe-based methods.
KeySets now explicitly check for and disallow the use of floats and timestamps as keys. This has always been the intended behavior, but it was previously not checked for and could work or cause non-obvious errors depending on the situation.
KeySet.dataframe()now always returns a dataframe where all rows are distinct.
Under certain circumstances, evaluating a
GroupByCountDistinctquery expression used to modify the input
QueryExpr. This no longer occurs.
It is now possible to partition on a column created by a grouping flat map, which used to raise exception from Core.
0.2.1 - 2022-04-14 (internal release)#
Added support for basic operations (filter, map, etc.) on Spark date and timestamp columns.
ColumnTypehas two new variants,
TIMESTAMP, to support these.
Future documentation will now include any exceptions defined in Analytics.
Switch session to use Persist/Unpersist instead of Cache.
0.2.0 - 2022-03-28 (internal release)#
Multi-query evaluate support is entirely removed.
Columns that are neither floats nor doubles will no longer be checked for NaN values.
BITvariant of the
ColumnTypeenum was removed, as it was not supported elsewhere in Analytics.
JoinPublicquery expression can now accept public tables specified as Spark dataframes. The existing behavior using public source IDs is still supported, but the
public_idparameter/property is now called
Installation on Python 3.7.1 through 3.7.3 is now allowed.
KeySets now do type coercion on creation, matching the type coercion that Sessions do for private sources.
Sessions created by
partition_and_createmust be used in the order they were created, and using the parent session will forcibly close all child sessions. Sessions can be manually closed with
Joining with a public table that contains no NaNs, but has a column where NaNs are allowed, previously caused an error when compiling queries. This is now handled correctly.
0.1.1 - 2022-02-28 (internal release)#
KeySetclass, which will eventually be used for all GroupBy queries.
QueryBuilder.groupby(), a new group-by based on
The Analytics library now uses
QueryBuilder.groupby()for all GroupBy queries.
Sessionmethods for loading in data from CSV no longer support loading the data’s schema from a file.
Made Session return a more user-friendly error message when the user provides a privacy budget of 0.
Removed all instances of the old name of this library, and replaced them with “Analytics”
QueryBuilder.groupby_public_source()are now deprecated in favor of using
KeySets. They will be removed in a future version.
0.1.0 - 2022-02-15 (internal release)#