equipy.metrics package#

Module contents#

Computation of the fairness (i.e. measurement of the similarity in prediction distribution between different population groups according to their sensitive attributes) and performance (i.e. measurement of the similarity between prediction and actual value).

equipy.metrics.performance(y_true: ~numpy.ndarray, y_pred: ~numpy.ndarray, metric: ~typing.Callable = <function mean_squared_error>) float[source]#

Compute the performance value for predicted fair output compared to the true labels.

Parameters:
  • y_true (np.ndarray) – Actual values.

  • y_pred (np.ndarray) – Predicted (fair or not) output values.

  • metric (Callable, (default=mean_squared_error)) – The metric used to compute the performance, which expects y_true then y_pred, default=sklearn.metrics.mean_square_error.

Returns:

The calculated performance value.

Return type:

float

Example

>>> from sklearn.metrics import f1_score
>>> y_true = np.array([1, 0, 1, 1, 0])
>>> y_pred = np.array([0, 1, 1, 1, 0])
>>> classification_performance = compute_performance(y_true, y_pred, f1_score)
>>> print(classification_performance)
0.6
>>> y_true = [1.2, 2.5, 3.8, 4.0, 5.2]
>>> y_pred = [1.0, 2.7, 3.5, 4.2, 5.0]
>>> regression_performance = compute_performance(y_true, y_pred)
>>> print(regression_performance)
0.05
equipy.metrics.unfairness(y: ndarray, sensitive_features: DataFrame, n_min: float = 1000) float[source]#

Compute the unfairness value for a given fair output (y) and multiple sensitive attributes data (sensitive_features) containing several modalities. If there is a single sensitive feature, it calculates the maximum quantile difference between different modalities of that single sensitive feature. If there are multiple sensitive features, it calculates the maximum quantile difference for each sensitive feature and then takes the sum of these maximums.

Parameters:
  • y (np.ndarray) – Predicted (fair or not) output data.

  • sensitive_features (pd.DataFrame) – Sensitive attribute data.

  • n_min (float) – Below this threshold, compute the unfairness based on the Wasserstein distance.

Returns:

Unfairness value in the dataset.

Return type:

float

Example

>>> y = np.array([5, 0, 6, 7, 9])
>>> sensitive_features = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue'],
                                       'nb_child': [1, 2, 0, 2]})
>>> unf = unfairness(y, sensitive_features, n_min=5)
>>> print(unf)
6.0