Session.from_dataframe#

from tmlt.analytics import Session
classmethod Session.from_dataframe(privacy_budget, source_id, dataframe, protected_change)#

Initializes a DP session from a Spark dataframe.

Only one private data source is supported with this initialization method; if you need multiple data sources, use Builder.

Not all Spark column types are supported in private sources; see ColumnType for information about which types are supported.

Example

>>> spark_data.toPandas()
   A  B  X
0  0  1  0
1  1  0  1
2  1  2  1
>>> # Declare budget for the session.
>>> session_budget = PureDPBudget(1)
>>> # Set up Session
>>> sess = Session.from_dataframe(
...     privacy_budget=session_budget,
...     source_id="my_private_data",
...     dataframe=spark_data,
...     protected_change=AddOneRow(),
... )
>>> sess.private_sources
['my_private_data']
>>> sess.get_column_types("my_private_data") 
{'A': ColumnType.VARCHAR, 'B': ColumnType.INTEGER, 'X': ColumnType.INTEGER}
Parameters:
  • privacy_budget (PrivacyBudget) – The total privacy budget allocated to this session.

  • source_id (str) – The source id for the private source dataframe.

  • dataframe (DataFrame) – The private source dataframe to perform queries on, corresponding to the source_id.

  • protected_change (ProtectedChange) – A ProtectedChange specifying what changes to the input data the resulting Session should protect.

Return type:

Session