equipy.graphs package#

Module contents#

The module enables the visualization of fairness achievement through the presentation of various graphs:

  • Representation of the probability distribution of predictions as a function of the value of the sensitive attribute.

  • Arrow plot of the fairness-performance relationship.

  • Representation of sequential gain in fairness using waterfall plots.

equipy.graphs.fair_arrow_plot(sensitive_features_calib: ~pandas.core.frame.DataFrame, sensitive_features_test: ~pandas.core.frame.DataFrame, y_calib: ~numpy.ndarray, y_test: ~numpy.ndarray, y_true_test: ~numpy.ndarray, epsilon: float | None = None, metric: ~typing.Callable = <function mean_squared_error>, threshold: float | None = None, positive_class: int | str = 1) Axes[source]#

Generates an arrow plot representing the fairness-performance combinations step by step (by sensitive attribute) to reach fairness.

Parameters:
  • sensitive_features_calib (pd.DataFrame) – Sensitive features for calibration.

  • sensitive_features_test (pd.DataFrame) – Sensitive features for testing.

  • y_calib (numpy.ndarray) – Predictions for calibration.

  • y_test (numpy.ndarray) – Predictions for testing.

  • y_true_test (numpy.ndarray) – True labels for testing.

  • epsilon (float, optional, default = None) – Epsilon value for calculating Wasserstein distance

  • metric (Callable, default = sklearn.mean_squared_error) – The metric used to evaluate performance.

  • threshold (float, default = None) – The threshold used to transform scores from binary classification into labels for evaluation of performance.

  • positive_class (int or str, optional, default=1) – The positive class label used for applying threshold in the case of binary classification. Can be either an integer or a string.

Returns:

arrows representing the fairness-performance combinations step by step (by sensitive attribute) to reach fairness.

Return type:

matplotlib.axes.Axes

Note

This function uses a global variable ax for plotting, ensuring compatibility with external code.

Examples

>>> from sklearn.metrics import f1_score
>>> sensitive_features_calib = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]})
>>> sensitive_features_test = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [3, 2, 1, 2]})
>>> y_calib = np.array([0.6, 0.43, 0.32, 0.8])
>>> y_test = np.array([0.8, 0.35, 0.23, 0.2])
>>> y_true_test = np.array(['no', 'no', 'yes', 'no'])
>>> fair_arrow_plot(sensitive_features_calib, sensitive_features_test, y_calib, y_test, y_true_test, f1_score, threshold=0.5, positive_class='yes')
equipy.graphs.fair_density_plot(sensitive_features_calib: ndarray, sensitive_features_test: ndarray, y_calib: ndarray, y_test: ndarray, epsilon: float | None = None) Axes[source]#

Visualizes the distribution of predictions based on different sensitive features using kernel density estimates (KDE).

Parameters:
  • sensitive_features_calib (numpy.ndarray) – Sensitive features for calibration.

  • sensitive_features_test (numpy.ndarray) – Sensitive features for testing.

  • y_calib (numpy.ndarray) – Predictions for calibration.

  • y_test (numpy.ndarray) – Predictions for testing.

  • epsilon (float, optional, default = None) – Epsilon value for calculating Wasserstein distance

Returns:

The density function for predictions based on different sensitive features and fairness.

Return type:

matplotlib.axes.Axes

Raises:

ValueError – If the input data is not in the expected format.

Examples

>>> sensitive_features_calib = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]})
>>> sensitive_features_test = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [3, 2, 1, 2]})
>>> y_calib = np.array([0.6, 0.43, 0.32, 0.8])
>>> y_test = np.array([0.8, 0.35, 0.23, 0.2])
>>> epsilon = [0, 0.5]
>>> fair_density_plot(sensitive_features_calib, sensitive_features_test, scores_calib, scores_test, epsilon)
equipy.graphs.fair_multiple_arrow_plot(sensitive_features_calib: ~pandas.core.frame.DataFrame, sensitive_features_test: ~pandas.core.frame.DataFrame, y_calib: ~numpy.ndarray, y_test: ~numpy.ndarray, y_true_test: ~numpy.ndarray, epsilon: float | None = None, metric: ~typing.Callable = <function mean_squared_error>, threshold: float | None = None, positive_class: int | str = 1) Axes[source]#

Plot arrows representing the fairness-performance combinations step by step (by sensitive attribute) to reach fairness for different permutations.

Parameters:
  • sensitive_features_calib (pd.DataFrame) – Sensitive features for calibration.

  • sensitive_features_test (pd.DataFrame) – Sensitive features for testing.

  • y_calib (numpy.ndarray) – Predictions for calibration.

  • y_test (numpy.ndarray) – Predictions for testing.

  • y_true_test (numpy.ndarray) – True labels for testing.

  • epsilon (float, optional, default = None) – Epsilon value for calculating Wasserstein distance

  • metric (Callable, default = sklearn.mean_squared_error) – The metric used to evaluate performance.

  • threshold (float, default = None) – The threshold used to transform scores from binary classification into labels for evaluation of performance.

  • positive_class (int or str, optional, default=1) – The positive class label used for applying threshold in the case of binary classification. Can be either an integer or a string.

Returns:

Arrows representing the fairness-performance combinations step by step (by sensitive attribute) to reach fairness for different permutations.

Return type:

matplotlib.axes.Axes

Note

This function uses a global variable ax for plotting, ensuring compatibility with external code.

Examples

>>> from sklearn.metrics import f1_score
>>> sensitive_features_calib = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]})
>>> sensitive_features_test = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [3, 2, 1, 2]})
>>> y_calib = np.array([0.6, 0.43, 0.32, 0.8])
>>> y_test = np.array([0.8, 0.35, 0.23, 0.2])
>>> y_true_test = np.array(['no', 'no', 'yes', 'no'])
>>> fair_multiple_arrow_plot(sensitive_features_calib, sensitive_features_test, y_calib, y_test, y_true_test, f1_score, threshold=0.5, positive_class='yes')
equipy.graphs.fair_waterfall_plot(sensitive_features_calib: ndarray, sensitive_features_test: ndarray, y_calib: ndarray, y_test: ndarray, epsilon: float | None = None) Axes[source]#

Generate a waterfall plot illustrating the sequential fairness in a model.

Parameters:
  • sensitive_features_calib (numpy.ndarray) – Sensitive features for calibration.

  • sensitive_features_test (numpy.ndarray) – Sensitive features for testing.

  • y_calib (numpy.ndarray) – Predictions for calibration.

  • y_test (numpy.ndarray) – Predictions for testing.

  • epsilon (float, optional, default = None) – Epsilon value for calculating Wasserstein distance

Returns:

The Figure object representing the waterfall plot.

Return type:

matplotlib.axes.Axes

Notes

The function creates a waterfall plot with bars representing the fairness values at each step. If both exact and approximate fairness values are provided, bars are color-coded and labeled accordingly. The legend is added to distinguish between different bars in the plot.

Examples

>>> from sklearn.metrics import f1_score
>>> sensitive_features_calib = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]})
>>> sensitive_features_test = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [3, 2, 1, 2]})
>>> y_calib = np.array([0.6, 0.43, 0.32, 0.8])
>>> y_test = np.array([0.8, 0.35, 0.23, 0.2])
>>> y_true_test = np.array(['no', 'no', 'yes', 'no'])
>>> fair_waterfall_plot(sensitive_features_calib, sensitive_features_test, y_calib, y_test, epsilon=[0, 0.5])