equipy.graphs package#

Module contents#

The module enables the visualization of fairness achievement through the presentation of various graphs:

Representation of the probability distribution of predictions as a function of the value of the sensitive attribute.
Arrow plot of the fairness-performance relationship.
Representation of sequential gain in fairness using waterfall plots.

equipy.graphs.fair_arrow_plot(sensitive_features_calib: ~numpy.ndarray | ~pandas.core.frame.DataFrame, sensitive_features_test: ~numpy.ndarray | ~pandas.core.frame.DataFrame, y_calib: ~numpy.ndarray, y_test: ~numpy.ndarray, y_true_test: ~numpy.ndarray, epsilon: float | None = None, metric: ~typing.Callable = <function mean_squared_error>, threshold: float | None = None, positive_class: int | str = 1, figsize=(30, 10)) → Axes[source]#

Generates an arrow plot representing the fairness-performance combinations step by step (by sensitive attribute) to reach fairness.

Parameters:

sensitive_features_calib (Union[np.ndarray, pd.DataFrame]) – Sensitive features for calibration.
sensitive_features_test (Union[np.ndarray, pd.DataFrame]) – Sensitive features for testing.
y_calib (numpy.ndarray) – Predictions for calibration.
y_test (numpy.ndarray) – Predictions for testing.
y_true_test (numpy.ndarray) – True labels for testing.
epsilon (float, optional, default = None) – Epsilon value for calculating Wasserstein distance
metric (Callable, default = sklearn.mean_squared_error) – The metric used to evaluate performance.
threshold (float, default = None) – The threshold used to transform scores from binary classification into labels for evaluation of performance.
positive_class (int or str, optional, default=1) – The positive class label used for applying threshold in the case of binary classification. Can be either an integer or a string.

Returns:

arrows representing the fairness-performance combinations step by step (by sensitive attribute) to reach fairness.

Return type:

matplotlib.axes.Axes

Note

This function uses a global variable ax for plotting, ensuring compatibility with external code.

Examples

>>> from sklearn.metrics import f1_score
>>> sensitive_features_calib = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]})
>>> sensitive_features_test = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [3, 2, 1, 2]})
>>> y_calib = np.array([0.6, 0.43, 0.32, 0.8])
>>> y_test = np.array([0.8, 0.35, 0.23, 0.2])
>>> y_true_test = np.array(['no', 'no', 'yes', 'no'])
>>> fair_arrow_plot(sensitive_features_calib, sensitive_features_test, y_calib, y_test, y_true_test, f1_score, threshold=0.5, positive_class='yes')

equipy.graphs.fair_density_plot(sensitive_features_calib: ndarray | DataFrame, sensitive_features_test: ndarray | DataFrame, y_calib: ndarray, y_test: ndarray, epsilon: float | None = None, figsize=(26, 18)) → Axes[source]#

Visualizes the distribution of predictions based on different sensitive features using kernel density estimates (KDE).

Parameters:

sensitive_features_calib (Union[np.ndarray, pd.DataFrame]) – Sensitive features for calibration.
sensitive_features_test (Union[np.ndarray, pd.DataFrame]) – Sensitive features for testing.
y_calib (numpy.ndarray) – Predictions for calibration.
y_test (numpy.ndarray) – Predictions for testing.
epsilon (float, optional, default = None) – Epsilon value for calculating Wasserstein distance

Returns:

The density function for predictions based on different sensitive features and fairness.

Return type:

matplotlib.axes.Axes

Raises:

ValueError – If the input data is not in the expected format.

Examples

>>> sensitive_features_calib = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]})
>>> sensitive_features_test = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [3, 2, 1, 2]})
>>> y_calib = np.array([0.6, 0.43, 0.32, 0.8])
>>> y_test = np.array([0.8, 0.35, 0.23, 0.2])
>>> epsilon = [0, 0.5]
>>> fair_density_plot(sensitive_features_calib, sensitive_features_test, scores_calib, scores_test, epsilon)

equipy.graphs.fair_multiple_arrow_plot(sensitive_features_calib: ~numpy.ndarray | ~pandas.core.frame.DataFrame, sensitive_features_test: ~numpy.ndarray | ~pandas.core.frame.DataFrame, y_calib: ~numpy.ndarray, y_test: ~numpy.ndarray, y_true_test: ~numpy.ndarray, epsilon: float | None = None, metric: ~typing.Callable = <function mean_squared_error>, threshold: float | None = None, positive_class: int | str = 1, figsize=(30, 10)) → Axes[source]#

Plot arrows representing the fairness-performance combinations step by step (by sensitive attribute) to reach fairness for different permutations.

Parameters:

sensitive_features_calib (Union[np.ndarray, pd.DataFrame]) – Sensitive features for calibration.
sensitive_features_test (Union[np.ndarray, pd.DataFrame]) – Sensitive features for testing.
y_calib (numpy.ndarray) – Predictions for calibration.
y_test (numpy.ndarray) – Predictions for testing.
y_true_test (numpy.ndarray) – True labels for testing.
epsilon (float, optional, default = None) – Epsilon value for calculating Wasserstein distance
metric (Callable, default = sklearn.mean_squared_error) – The metric used to evaluate performance.
threshold (float, default = None) – The threshold used to transform scores from binary classification into labels for evaluation of performance.
positive_class (int or str, optional, default=1) – The positive class label used for applying threshold in the case of binary classification. Can be either an integer or a string.

Returns:

Arrows representing the fairness-performance combinations step by step (by sensitive attribute) to reach fairness for different permutations.

Return type:

matplotlib.axes.Axes

Note

This function uses a global variable ax for plotting, ensuring compatibility with external code.

Examples

>>> from sklearn.metrics import f1_score
>>> sensitive_features_calib = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]})
>>> sensitive_features_test = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [3, 2, 1, 2]})
>>> y_calib = np.array([0.6, 0.43, 0.32, 0.8])
>>> y_test = np.array([0.8, 0.35, 0.23, 0.2])
>>> y_true_test = np.array(['no', 'no', 'yes', 'no'])
>>> fair_multiple_arrow_plot(sensitive_features_calib, sensitive_features_test, y_calib, y_test, y_true_test, f1_score, threshold=0.5, positive_class='yes')

equipy.graphs.fair_waterfall_plot(sensitive_features_calib: ndarray | DataFrame, sensitive_features_test: ndarray | DataFrame, y_calib: ndarray, y_test: ndarray, epsilon: float | None = None) → Axes[source]#

Generate a waterfall plot illustrating the sequential fairness in a model.

Parameters:

sensitive_features_calib (Union[np.ndarray, pd.DataFrame]) – Sensitive features for calibration.
sensitive_features_test (Union[np.ndarray, pd.DataFrame]) – Sensitive features for testing.
y_calib (numpy.ndarray) – Predictions for calibration.
y_test (numpy.ndarray) – Predictions for testing.
epsilon (float, optional, default = None) – Epsilon value for calculating Wasserstein distance

Returns:

The Figure object representing the waterfall plot.

Return type:

matplotlib.axes.Axes

Notes

The function creates a waterfall plot with bars representing the fairness values at each step. If both exact and approximate fairness values are provided, bars are color-coded and labeled accordingly. The legend is added to distinguish between different bars in the plot.

Examples

>>> from sklearn.metrics import f1_score
>>> sensitive_features_calib = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]})
>>> sensitive_features_test = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [3, 2, 1, 2]})
>>> y_calib = np.array([0.6, 0.43, 0.32, 0.8])
>>> y_test = np.array([0.8, 0.35, 0.23, 0.2])
>>> y_true_test = np.array(['no', 'no', 'yes', 'no'])
>>> fair_waterfall_plot(sensitive_features_calib, sensitive_features_test, y_calib, y_test, epsilon=[0, 0.5])

equipy.graphs package

Contents

equipy.graphs package#

Module contents#