joined_output_metric#

from tmlt.tune import joined_output_metric
tmlt.tune.joined_output_metric(name, join_columns, description=None, baseline=None, output=None, grouping_columns=None, measure_column=None, empty_value=None, join_how='inner', dropna_columns=None, indicator_column_name=None)#

Decorator to define a custom JoinedOutputMetric.

The decorated function must have the following parameters:

  • joined_output: A DataFrame created by joining the selected DP and baseline outputs.

It may also have the following optional parameters:

  • result_column_name: if the function returns a dataframe, the metric results should be in a column with this name

  • unprotected_inputs: A dictionary containing the program’s unprotected inputs.

  • parameters: A dictionary containing the program’s parameters.

The function should return a single numeric value if there are no grouping columns, or a dataframe with one column for each grouping column, and one numeric result column with the specified name.

If the metric does not have grouping columns, the function must return a numeric value, a boolean, or a string. If the metric has grouping columns, then it must return a DataFrame. This DataFrame should contain the grouping columns, and exactly one additional column containing the metric value for each group. This column’s type should be numeric, boolean, or string.

To use the built-in metrics in addition to this custom metric, you can separately specify metrics class variable.

Parameters:
  • name (str) – A name for the metric.

  • join_columns (List[str]) – The columns to join on.

  • description (Optional[str]) – A description of the metric.

  • baseline (Optional[str]) – The name of the baseline program used for the error report. If None, the tuner must have a single baseline (which will be used).

  • output (Optional[str]) – The name of the program output to be used for the metric. If None, the program must have only one output (which will be used).

  • grouping_columns (Optional[List[str]]) – If specified, the metric should group the outputs by the given columns, and calculate the metric for each group.

  • measure_column (Optional[str]) – If specified, the column in the outputs to measure.

  • empty_value (Optional[Any]) – If all dp and baseline outputs are empty, the metric will return this value.

  • join_how (str) – The type of join to perform. Must be one of “left”, “right”, “inner”, “outer”. Defaults to “inner”.

  • dropna_columns (Optional[List[str]]) – If specified, rows with nulls in these columns will be dropped.

  • indicator_column_name (Optional[str]) – If specified, we will add a column with the specified name to the joined data that contains either “dp”, “baseline”, or “both” to indicate where the values in the row came from.

>>> from tmlt.analytics import Session
>>> from tmlt.tune import MedianAbsoluteError
>>> from pyspark.sql import DataFrame
>>> from typing import Dict
>>> class Program(SessionProgram):
...     class ProtectedInputs:
...         protected_df: DataFrame
...     class UnprotectedInputs:
...         unprotected_df: DataFrame
...     class Outputs:
...         output_df: DataFrame
...     def session_interaction(self, session: Session):
...         return {"output_df": dp_output}
>>> class Tuner(SessionProgramTuner, program=Program):
...     @joined_output_metric(name="custom_metric", join_columns=["join_column"])
...     @staticmethod
...     def custom_metric(
...         joined_output: DataFrame,
...     ):
...         # If the program has unprotected inputs and/or parameters, the custom
...         #  metric method can take them as an argument.
...         ...
...     metrics = [
...         MedianAbsoluteError(
...             output="output_df",
...             join_columns=["join_column"],
...             measure_column="Y"
...         ),
...     ]  # You can mix custom and built-in metrics.