AddMaxRowsInMaxGroups#
from tmlt.analytics import AddMaxRowsInMaxGroups
- class tmlt.analytics.AddMaxRowsInMaxGroups(grouping_column, max_groups, max_rows_per_group)#
Bases:
ProtectedChange
Protects the addition or removal of rows across a finite number of groups.
AddMaxRowsInMaxGroups
provides a similar guarantee toAddMaxRows
, but it uses some additional information to apply less noise in some cases. That information is about groups: collections of rows which share the same value in a particular column. That column would typically be some kind of categorical value, for example a state where a person lives or has lived. Instead of specifying a maximum total number of rows that may be added or removed,AddMaxRowsInMaxGroups
limits the number of rows that may be added or removed in any particular group, as well as the maximum total number of groups that may be affected. If these limits are meant to correspond to the maximum contribution of a specific entity to the dataset, that must be enforced before the data is passed to Tumult Analytics.AddMaxRowsInMaxGroups
is intended for advanced use cases, and its use should be considered carefully. Note that it only provides improved accuracy when used with zCDP – with pure DP, it is equivalent to usingAddMaxRows
with the same total number of rows to be added/removed.The most common case where
AddMaxRowsInMaxGroups
is useful is for dealing with datasets that have already undergone some type of preprocessing before being turned over to an analyst. Where possible, it is preferred to do such processing inside of Tumult Analytics instead, as it allows specifying a simpler protected change (e.g.AddRowsWithID
) and relying on Analytics’ privacy tracking to handle the complex parts of the analysis.-
max_rows_per_group:
int
# The maximum number of rows which may be added to or removed from each group.
- __post_init__()#
Validate attributes.
-
max_rows_per_group: