Session.from_dataframe#
from tmlt.analytics import Session
- classmethod Session.from_dataframe(privacy_budget, source_id, dataframe, protected_change)#
Initializes a DP session from a Spark dataframe.
Only one private data source is supported with this initialization method; if you need multiple data sources, use
Builder
.Not all Spark column types are supported in private sources; see
ColumnType
for information about which types are supported.Example
>>> spark_data.toPandas() A B X 0 0 1 0 1 1 0 1 2 1 2 1 >>> # Declare budget for the session. >>> session_budget = PureDPBudget(1) >>> # Set up Session >>> sess = Session.from_dataframe( ... privacy_budget=session_budget, ... source_id="my_private_data", ... dataframe=spark_data, ... protected_change=AddOneRow(), ... ) >>> sess.private_sources ['my_private_data'] >>> sess.get_column_types("my_private_data") {'A': ColumnType.VARCHAR, 'B': ColumnType.INTEGER, 'X': ColumnType.INTEGER}
- Parameters:
privacy_budget (
PrivacyBudget
) – The total privacy budget allocated to this session.source_id (
str
) – The source id for the private source dataframe.dataframe (
DataFrame
) – The private source dataframe to perform queries on, corresponding to the source_id.protected_change (
ProtectedChange
) – AProtectedChange
specifying what changes to the input data the resultingSession
should protect.
- Return type: