equipy.fairness package#

Module contents#

Main Classes to make predictions fair.

The module structure is as follows:

  • The FairWasserstein base Class implements fairness adjustment related to a single sensitive attribute, using Wasserstein distance for both binary classification and regression tasks. In the case of binary classification, this class supports scores instead of classes. For more details, see E. Chzhen, C. Denis, M. Hebiri, L. Oneto and M. Pontil, “Fair Regression with Wasserstein Barycenters” (NeurIPS20).

  • MultiWasserstein Class extends FairWasserstein for multi-sensitive attribute fairness adjustment in a sequential framework. For more details, see F. Hu, P. Ratz, A. Charpentier, “A Sequentially Fair Mechanism for Multiple Sensitive Attributes” (AAAI24).

class equipy.fairness.FairWasserstein(sigma: float = 0.0001)[source]#

Bases: BaseHelper

Class implementing Wasserstein distance-based fairness adjustment for binary classification and regression tasks regarding a single sensitive attribute.

Parameters:

sigma (float, optional (default=0.0001)) – Standard deviation of the random noise added during fairness adjustment.

sigma#

Standard deviation of the random noise added during fairness adjustment.

Type:

float

modalities_calib#

Dictionary storing modality values obtained from calibration data.

Type:

dict

weights#

Dictionary storing weights (probabilities) for each modality based on their occurrences in calibration data.

Type:

dict

ecdf#

Dictionary storing ECDF (Empirical Cumulative Distribution Function) objects for each sensitive modality.

Type:

dict

eqf#

Dictionary storing EQF (Empirical Quantile Function) objects for each sensitive modality.

Type:

dict

fit(y: ndarray, sensitive_feature: DataFrame) None[source]#

Perform fit on the calibration data and save the ECDF, EQF, and weights of the sensitive variable.

Parameters:
  • y (np.ndarray, shape (n_samples,)) – The calibration labels.

  • sensitive_feature (pd.DataFrame, shape (n_samples, 1)) – The calibration samples representing one single sensitive attribute.

Return type:

None

Notes

This method computes the ECDF (Empirical Cumulative Distribution Function), EQF (Empirical Quantile Function), and weights for the sensitive variable based on the provided calibration data. These computed values are used during the transformation process to ensure fairness in predictions.

Examples

>>> wasserstein = FairWasserstein(sigma=0.001)
>>> y = np.array([0.0, 1.0, 1.0, 0.0])
>>> sensitive_feature = pd.DataFrame({'nb_child': [1, 2, 0, 2]})
>>> wasserstein.fit(y, sensitive_feature)
transform(y: ndarray, sensitive_feature: DataFrame, epsilon: float = 0) ndarray[source]#

Transform the test data to enforce fairness using Wasserstein distance.

Parameters:
  • y (np.ndarray, shape (n_samples,)) – The target values of the test data.

  • sensitive_feature (pd.DataFrame, shape (n_samples, 1)) – The test samples representing a single sensitive attribute.

  • epsilon (float, optional (default=0)) – The fairness parameter controlling the trade-off between fairness and accuracy. It represents the fraction of the original predictions retained after fairness adjustment. Epsilon should be a value between 0 and 1, where 0 means full fairness and 1 means no fairness constraint.

Returns:

y_fair – Fair predictions for the test data after enforcing fairness constraints.

Return type:

np.ndarray, shape (n_samples,)

Notes

This method applies Wasserstein distance-based fairness adjustment to the test data using the precomputed ECDF (Empirical Cumulative Distribution Function), EQF (Empirical Quantile Function), and weights obtained from the calibration data. Random noise within the range of [-sigma, sigma] is added to the test data to ensure fairness. The parameter epsilon controls the trade-off between fairness and accuracy, with 0 enforcing full fairness and 1 retaining the original predictions.

References

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Luca Oneto and Massimiliano Pontil, “Fair Regression with Wasserstein Barycenters” (NeurIPS20)

Examples

>>> y = np.array([0.05, 0.08, 0.9, 0.9, 0.01, 0.88])
>>> sensitive_feature = pd.DataFrame({'nb_child': [1, 3, 2, 3, 1, 2]})
>>> wasserstein = FairWasserstein(sigma=0.001)
>>> wasserstein.fit(y, sensitive_feature)
>>> y = np.array([0.01, 0.99, 0.98, 0.04])
>>> sensitive_feature = pd.DataFrame({'nb_child': [3, 1, 2, 3]})
>>> print(wasserstein.transform(y, sensitive_feature, epsilon=0.2))
[0.26063673 0.69140959 0.68940959 0.26663673]
class equipy.fairness.MultiWasserstein(sigma: float = 0.0001)[source]#

Bases: object

Class extending FairWasserstein for multi-sensitive attribute fairness adjustment.

Parameters:

sigma (float, optional (default=0.0001)) – Standard deviation of the random noise added during fairness adjustment.

sigma#

Standard deviation of the random noise added during fairness adjustment.

Type:

float

y_fair#

Dictionary storing fair predictions for each sensitive feature.

Type:

dict

modalities_calib_all#

Dictionary storing modality values obtained from calibration data for all sensitive features.

Type:

dict

weights_all#

Dictionary storing weights (probabilities) for each modality based on their occurrences in calibration data for all sensitive features.

Type:

dict

ecdf_all#

Dictionary storing ECDF (Empirical Cumulative Distribution Function) objects for each sensitive modality for all sensitive features.

Type:

dict

eqf_all#

Dictionary storing EQF (Empirical Quantile Function) objects for each sensitive modality for all sensitive features.

Type:

dict

fit(y: ndarray, sensitive_features: DataFrame) None[source]#

Perform fit on the calibration data and save the ECDF, EQF, and weights for each sensitive variable.

Parameters:
  • y (np.ndarray, shape (n_samples,)) – The calibration labels.

  • sensitive_features (pd.DataFrame, shape (n_samples, n_sensitive_features)) – The calibration samples representing multiple sensitive attributes.

Return type:

None

Notes

This method computes the ECDF (Empirical Cumulative Distribution Function), EQF (Empirical Quantile Function), and weights for each sensitive variable based on the provided calibration data. These computed values are used during the transformation process to ensure fairness in predictions.

transform(y: ndarray, sensitive_features: DataFrame, epsilon: list[float] | None = None) ndarray[source]#

Transform the calib and test data to enforce fairness using Wasserstein distance.

Parameters:
  • y (np.ndarray, shape (n_samples,)) – The target values of the test data.

  • sensitive_features (pd.DataFrame shape (n_samples, n_sensitive_features)) – The test samples representing multiple sensitive attributes.

  • epsilon (list, shape (n_sensitive_features,), optional (default=None)) – The fairness parameters controlling the trade-off between fairness and accuracy for each sensitive feature. If None, no fairness constraints are applied.

Returns:

y_fair – Fair predictions for the test data after enforcing fairness constraints.

Return type:

np.ndarray, shape (n_samples,)

Notes

This method applies Wasserstein distance-based fairness adjustment to the test data using the precomputed ECDF (Empirical Cumulative Distribution Function), EQF (Empirical Quantile Function), and weights obtained from the calibration data. Random noise within the range of [-sigma, sigma] is added to the test data to ensure fairness. The parameter epsilon is a list, where each element controls the trade-off between fairness and accuracy for the corresponding sensitive feature.

References

François Hu, Philipp Ratz, Arthur Charpentier, “A Sequentially Fair Mechanism for Multiple Sensitive Attributes” (AAAI24)

Examples

>>> wasserstein = MultiWasserStein(sigma=0.001)
>>> y = np.array([0.6, 0.43, 0.32, 0.8])
>>> sensitive_features = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'], 'nb_child': [1, 2, 0, 2]})
>>> wasserstein.fit(y, sensitive_features)
>>> y = np.array([0.8, 0.35, 0.23, 0.2])
>>> sensitive_features = pd.DataFrame({'color': ['blue', 'blue', 'blue', 'green'], 'nb_child': [2, 2, 1, 2]})
>>> epsilon = [0.1, 0.2]
>>> fair_predictions = wasserstein.transform(y, sensitive_features, epsilon=epsilon)
>>> print(fair_predictions)
[0.42483123 0.36412012 0.36172012 0.36112012]