label_quality_utils#

Helper functions for computing label quality scores

Functions:

get_normalized_entropy(pred_probs[, ...])

Returns the normalized entropy of pred_probs.

cleanlab.internal.label_quality_utils.get_normalized_entropy(pred_probs: numpy.array, min_allowed_prob=1e-06) numpy.array[source]#

Returns the normalized entropy of pred_probs.

Normalized entropy is between 0 and 1. Higher values of entropy indicate higher uncertainty in the model’s prediction of the correct label.

Read more about normalized entropy on Wikipedia.

Normalized entropy is used in active learning for uncertainty sampling: https://towardsdatascience.com/uncertainty-sampling-cheatsheet-ec57bc067c0b

Unlike label-quality scores, entropy only depends on the model’s predictions, not the given label.

Parameters
  • pred_probs (np.array (shape (N, K))) – P(label=k|x) is a matrix with K model-predicted probabilities. Each row of this matrix corresponds to an example x and contains the model-predicted probabilities that x belongs to each possible class. The columns must be ordered such that these probabilities correspond to class 0,1,2,… pred_probs should have been computed using 3 (or higher) fold cross-validation.

  • min_allowed_prob (float, default 1e-6) – Minimum allowed probability value. Entries of pred_probs below this value will be clipped to this value. Ensures entropy remains well-behaved even when pred_probs contains zeros.

Returns

entropy

Return type

np.array (float)