equipy.graphs package#
Module contents#
The module enables the visualization of fairness achievement through the presentation of various graphs:
Representation of the probability distribution of predictions as a function of the value of the sensitive attribute.
Arrow plot of the fairness-performance relationship.
Representation of sequential gain in fairness using waterfall plots.
- equipy.graphs.fair_arrow_plot(sensitive_features_calib: ~pandas.core.frame.DataFrame, sensitive_features_test: ~pandas.core.frame.DataFrame, y_calib: ~numpy.ndarray, y_test: ~numpy.ndarray, y_true_test: ~numpy.ndarray, epsilon: float | None = None, metric: ~typing.Callable = <function mean_squared_error>, threshold: float | None = None, positive_class: int | str = 1) Axes [source]#
Generates an arrow plot representing the fairness-performance combinations step by step (by sensitive attribute) to reach fairness.
- Parameters:
sensitive_features_calib (pd.DataFrame) – Sensitive features for calibration.
sensitive_features_test (pd.DataFrame) – Sensitive features for testing.
y_calib (numpy.ndarray) – Predictions for calibration.
y_test (numpy.ndarray) – Predictions for testing.
y_true_test (numpy.ndarray) – True labels for testing.
epsilon (float, optional, default = None) – Epsilon value for calculating Wasserstein distance
metric (Callable, default = sklearn.mean_squared_error) – The metric used to evaluate performance.
threshold (float, default = None) – The threshold used to transform scores from binary classification into labels for evaluation of performance.
positive_class (int or str, optional, default=1) – The positive class label used for applying threshold in the case of binary classification. Can be either an integer or a string.
- Returns:
arrows representing the fairness-performance combinations step by step (by sensitive attribute) to reach fairness.
- Return type:
matplotlib.axes.Axes
Note
This function uses a global variable ax for plotting, ensuring compatibility with external code.
Examples
>>> from sklearn.metrics import f1_score >>> sensitive_features_calib = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]}) >>> sensitive_features_test = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [3, 2, 1, 2]}) >>> y_calib = np.array([0.6, 0.43, 0.32, 0.8]) >>> y_test = np.array([0.8, 0.35, 0.23, 0.2]) >>> y_true_test = np.array(['no', 'no', 'yes', 'no']) >>> fair_arrow_plot(sensitive_features_calib, sensitive_features_test, y_calib, y_test, y_true_test, f1_score, threshold=0.5, positive_class='yes')
- equipy.graphs.fair_density_plot(sensitive_features_calib: ndarray, sensitive_features_test: ndarray, y_calib: ndarray, y_test: ndarray, epsilon: float | None = None) Axes [source]#
Visualizes the distribution of predictions based on different sensitive features using kernel density estimates (KDE).
- Parameters:
sensitive_features_calib (numpy.ndarray) – Sensitive features for calibration.
sensitive_features_test (numpy.ndarray) – Sensitive features for testing.
y_calib (numpy.ndarray) – Predictions for calibration.
y_test (numpy.ndarray) – Predictions for testing.
epsilon (float, optional, default = None) – Epsilon value for calculating Wasserstein distance
- Returns:
The density function for predictions based on different sensitive features and fairness.
- Return type:
matplotlib.axes.Axes
- Raises:
ValueError – If the input data is not in the expected format.
Examples
>>> sensitive_features_calib = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]}) >>> sensitive_features_test = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [3, 2, 1, 2]}) >>> y_calib = np.array([0.6, 0.43, 0.32, 0.8]) >>> y_test = np.array([0.8, 0.35, 0.23, 0.2]) >>> epsilon = [0, 0.5] >>> fair_density_plot(sensitive_features_calib, sensitive_features_test, scores_calib, scores_test, epsilon)
- equipy.graphs.fair_multiple_arrow_plot(sensitive_features_calib: ~pandas.core.frame.DataFrame, sensitive_features_test: ~pandas.core.frame.DataFrame, y_calib: ~numpy.ndarray, y_test: ~numpy.ndarray, y_true_test: ~numpy.ndarray, epsilon: float | None = None, metric: ~typing.Callable = <function mean_squared_error>, threshold: float | None = None, positive_class: int | str = 1) Axes [source]#
Plot arrows representing the fairness-performance combinations step by step (by sensitive attribute) to reach fairness for different permutations.
- Parameters:
sensitive_features_calib (pd.DataFrame) – Sensitive features for calibration.
sensitive_features_test (pd.DataFrame) – Sensitive features for testing.
y_calib (numpy.ndarray) – Predictions for calibration.
y_test (numpy.ndarray) – Predictions for testing.
y_true_test (numpy.ndarray) – True labels for testing.
epsilon (float, optional, default = None) – Epsilon value for calculating Wasserstein distance
metric (Callable, default = sklearn.mean_squared_error) – The metric used to evaluate performance.
threshold (float, default = None) – The threshold used to transform scores from binary classification into labels for evaluation of performance.
positive_class (int or str, optional, default=1) – The positive class label used for applying threshold in the case of binary classification. Can be either an integer or a string.
- Returns:
Arrows representing the fairness-performance combinations step by step (by sensitive attribute) to reach fairness for different permutations.
- Return type:
matplotlib.axes.Axes
Note
This function uses a global variable ax for plotting, ensuring compatibility with external code.
Examples
>>> from sklearn.metrics import f1_score >>> sensitive_features_calib = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]}) >>> sensitive_features_test = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [3, 2, 1, 2]}) >>> y_calib = np.array([0.6, 0.43, 0.32, 0.8]) >>> y_test = np.array([0.8, 0.35, 0.23, 0.2]) >>> y_true_test = np.array(['no', 'no', 'yes', 'no']) >>> fair_multiple_arrow_plot(sensitive_features_calib, sensitive_features_test, y_calib, y_test, y_true_test, f1_score, threshold=0.5, positive_class='yes')
- equipy.graphs.fair_waterfall_plot(sensitive_features_calib: ndarray, sensitive_features_test: ndarray, y_calib: ndarray, y_test: ndarray, epsilon: float | None = None) Axes [source]#
Generate a waterfall plot illustrating the sequential fairness in a model.
- Parameters:
sensitive_features_calib (numpy.ndarray) – Sensitive features for calibration.
sensitive_features_test (numpy.ndarray) – Sensitive features for testing.
y_calib (numpy.ndarray) – Predictions for calibration.
y_test (numpy.ndarray) – Predictions for testing.
epsilon (float, optional, default = None) – Epsilon value for calculating Wasserstein distance
- Returns:
The Figure object representing the waterfall plot.
- Return type:
matplotlib.axes.Axes
Notes
The function creates a waterfall plot with bars representing the fairness values at each step. If both exact and approximate fairness values are provided, bars are color-coded and labeled accordingly. The legend is added to distinguish between different bars in the plot.
Examples
>>> from sklearn.metrics import f1_score >>> sensitive_features_calib = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]}) >>> sensitive_features_test = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [3, 2, 1, 2]}) >>> y_calib = np.array([0.6, 0.43, 0.32, 0.8]) >>> y_test = np.array([0.8, 0.35, 0.23, 0.2]) >>> y_true_test = np.array(['no', 'no', 'yes', 'no']) >>> fair_waterfall_plot(sensitive_features_calib, sensitive_features_test, y_calib, y_test, epsilon=[0, 0.5])