Session.partition_and_create#

from tmlt.analytics import Session
Session.partition_and_create(source_id, privacy_budget, column, splits)#

Returns new sessions from a partition mapped to split name/source_id.

The type of privacy budget that you use must match the type your Session was initialized with (i.e., you cannot use a RhoZCDPBudget to partition your Session if the Session was created using a PureDPBudget, and vice versa).

The sessions returned must be used in the order that they were created. Using this session again or calling stop() will stop all partition sessions.

Example

This example partitions the session into two sessions, one with A = “0” and one with A = “1”. Due to parallel composition, each of these sessions are given the same budget, while only one count of that budget is deducted from session.

>>> sess.private_sources
['my_private_data']
>>> sess.get_column_types("my_private_data") 
{'A': ColumnType.VARCHAR, 'B': ColumnType.INTEGER, 'X': ColumnType.INTEGER}
>>> sess.remaining_privacy_budget
PureDPBudget(epsilon=1)
>>> # Partition the Session
>>> new_sessions = sess.partition_and_create(
...     "my_private_data",
...     privacy_budget=PureDPBudget(0.75),
...     column="A",
...     splits={"part0":"0", "part1":"1"}
... )
>>> sess.remaining_privacy_budget
PureDPBudget(epsilon=0.25)
>>> new_sessions["part0"].private_sources
['part0']
>>> new_sessions["part0"].get_column_types("part0") 
{'A': ColumnType.VARCHAR, 'B': ColumnType.INTEGER, 'X': ColumnType.INTEGER}
>>> new_sessions["part0"].remaining_privacy_budget
PureDPBudget(epsilon=0.75)
>>> new_sessions["part1"].private_sources
['part1']
>>> new_sessions["part1"].get_column_types("part1") 
{'A': ColumnType.VARCHAR, 'B': ColumnType.INTEGER, 'X': ColumnType.INTEGER}
>>> new_sessions["part1"].remaining_privacy_budget
PureDPBudget(epsilon=0.75)

When you are done with a new session, you can use the stop() method to allow the next one to become active:

>>> new_sessions["part0"].stop()
>>> new_sessions["part1"].private_sources
['part1']
>>> count_query = QueryBuilder("part1").count()
>>> count_answer = new_sessions["part1"].evaluate(
...     count_query,
...     PureDPBudget(0.75),
... )
>>> count_answer.toPandas() 
   count
0    ...
Parameters:
  • source_id (str) – The private source to partition.

  • privacy_budget (PrivacyBudget) – Privacy budget to pass to each new session.

  • column (str) – The name of the column partitioning on.

  • splits (Union[Dict[str, str], Dict[str, int]]) – Mapping of split name to value of partition. Split name is source_id in new session.

Return type:

Dict[str, Session]