AddMaxRowsInMaxGroups#
from tmlt.analytics import AddMaxRowsInMaxGroups
- class tmlt.analytics.AddMaxRowsInMaxGroups(grouping_column, max_groups, max_rows_per_group)#
Bases:
ProtectedChangeProtects the addition or removal of rows across a finite number of groups.
AddMaxRowsInMaxGroupsprovides a similar guarantee toAddMaxRows, but it uses some additional information to apply less noise in some cases. That information is about groups: collections of rows which share the same value in a particular column. That column would typically be some kind of categorical value, for example a state where a person lives or has lived. Instead of specifying a maximum total number of rows that may be added or removed,AddMaxRowsInMaxGroupslimits the number of rows that may be added or removed in any particular group, as well as the maximum total number of groups that may be affected. If these limits are meant to correspond to the maximum contribution of a specific entity to the dataset, that must be enforced before the data is passed to Tumult Analytics.AddMaxRowsInMaxGroupsis intended for advanced use cases, and its use should be considered carefully. Note that it only provides improved accuracy when used with zCDP – with pure DP, it is equivalent to usingAddMaxRowswith the same total number of rows to be added/removed.The most common case where
AddMaxRowsInMaxGroupsis useful is for dealing with datasets that have already undergone some type of preprocessing before being turned over to an analyst. Where possible, it is preferred to do such processing inside of Tumult Analytics instead, as it allows specifying a simpler protected change (e.g.AddRowsWithID) and relying on Analytics’ privacy tracking to handle the complex parts of the analysis.-
max_rows_per_group:
int# The maximum number of rows which may be added to or removed from each group.
- __post_init__()#
Validate attributes.
-
max_rows_per_group: