Session.partition_and_create#
from tmlt.analytics import Session
- Session.partition_and_create(source_id, privacy_budget, column, splits)#
Returns new sessions from a partition mapped to split name/
source_id
.The type of privacy budget that you use must match the type your Session was initialized with (i.e., you cannot use a
RhoZCDPBudget
to partition your Session if the Session was created using aPureDPBudget
, and vice versa).The sessions returned must be used in the order that they were created. Using this session again or calling stop() will stop all partition sessions.
Example
This example partitions the session into two sessions, one with A = “0” and one with A = “1”. Due to parallel composition, each of these sessions are given the same budget, while only one count of that budget is deducted from session.
>>> sess.private_sources ['my_private_data'] >>> sess.get_column_types("my_private_data") {'A': ColumnType.VARCHAR, 'B': ColumnType.INTEGER, 'X': ColumnType.INTEGER} >>> sess.remaining_privacy_budget PureDPBudget(epsilon=1) >>> # Partition the Session >>> new_sessions = sess.partition_and_create( ... "my_private_data", ... privacy_budget=PureDPBudget(0.75), ... column="A", ... splits={"part0":"0", "part1":"1"} ... ) >>> sess.remaining_privacy_budget PureDPBudget(epsilon=0.25) >>> new_sessions["part0"].private_sources ['part0'] >>> new_sessions["part0"].get_column_types("part0") {'A': ColumnType.VARCHAR, 'B': ColumnType.INTEGER, 'X': ColumnType.INTEGER} >>> new_sessions["part0"].remaining_privacy_budget PureDPBudget(epsilon=0.75) >>> new_sessions["part1"].private_sources ['part1'] >>> new_sessions["part1"].get_column_types("part1") {'A': ColumnType.VARCHAR, 'B': ColumnType.INTEGER, 'X': ColumnType.INTEGER} >>> new_sessions["part1"].remaining_privacy_budget PureDPBudget(epsilon=0.75)
When you are done with a new session, you can use the
stop()
method to allow the next one to become active:>>> new_sessions["part0"].stop() >>> new_sessions["part1"].private_sources ['part1'] >>> count_query = QueryBuilder("part1").count() >>> count_answer = new_sessions["part1"].evaluate( ... count_query, ... PureDPBudget(0.75), ... ) >>> count_answer.toPandas() count 0 ...
- Parameters:
source_id (
str
) – The private source to partition.privacy_budget (
PrivacyBudget
) – Privacy budget to pass to each new session.column (
str
) – The name of the column partitioning on.splits (
Union
[Dict
[str
,str
],Dict
[str
,int
]]) – Mapping of split name to value of partition. Split name issource_id
in new session.
- Return type: