rank#

Methods to rank/order data by cleanlab’s label quality score. Except for order_label_issues, which operates only on the subset of the data identified as potential label issues/errors, the methods in this module can be used on whichever subset of the dataset you choose (including the entire dataset) and provide a label quality score for every example. You can then do something like: np.argsort(label_quality_score) to obtain ranked indices of individual data.

CAUTION: These label quality scores are computed based on pred_probs from your model that must be out-of-sample! You should never provide predictions on the same examples used to train the model, as these will be overfit and unsuitable for finding label-errors. To obtain out-of-sample predicted probabilities for every datapoint in your dataset, you can use cross-validation. Alternatively it is ok if your model was trained on a separate dataset and you are only evaluating labels in data that was previously held-out.

Functions:

get_confidence_weighted_entropy_for_each_label(...)

Returns the "confidence weighted entropy" label-quality score for each datapoint.

get_knn_distance_ood_scores(features, nbrs)

Returns the KNN distance out-of-distribution (OOD) score for each datapoint.

get_label_quality_ensemble_scores(labels, ...)

Returns label quality scores based on predictions from an ensemble of models.

get_label_quality_scores(labels, pred_probs, *)

Returns label quality scores for each datapoint.

get_normalized_margin_for_each_label(labels, ...)

Returns the "normalized margin" label-quality score for each datapoint.

get_self_confidence_for_each_label(labels, ...)

Returns the self-confidence label-quality score for each datapoint.

order_label_issues(label_issues_mask, ...[, ...])

Sorts label issues by label quality score.

cleanlab.rank.get_confidence_weighted_entropy_for_each_label(labels: numpy.array, pred_probs: numpy.array) numpy.array[source]#

Returns the “confidence weighted entropy” label-quality score for each datapoint.

This is a function to compute label-quality scores for classification datasets, where lower scores indicate labels less likely to be correct.

“confidence weighted entropy” is the normalized entropy divided by “self-confidence”.

Parameters
Returns

label_quality_scores – An array of scores (between 0 and 1) for each example of its likelihood of being correctly labeled.

Return type

np.array

cleanlab.rank.get_knn_distance_ood_scores(features: numpy.array, nbrs: sklearn.neighbors._unsupervised.NearestNeighbors, k: Optional[int] = None) numpy.array[source]#

Returns the KNN distance out-of-distribution (OOD) score for each datapoint.

This is a function to compute OOD scores where higher scores indicate the datapoint is more likely to be OOD.

Parameters
  • features (np.array) – Feature matrix of shape (N, M), where N is the number of datapoints and M is the number of features. This is the “query set” of features for each datapoint which are used for nearest neighbor search.

  • nbrs (sklearn.neighbors.NearestNeighbors) – Instantiated NearestNeighbors class object that’s been fitted on a dataset in the same feature space. Note that the distance metric and n_neighbors is specified when instantiating this class. See: https://scikit-learn.org/stable/modules/neighbors.html

  • k (int, default None) – Number of neighbors to use when calculating average distance to neighbors. This value k needs to be less than or equal to max_k which is the n_neighbors used when fitting instantiated NearestNeighbors class object. If k=None, then by default k=min(10, max_k) is used where max_k is extracted from the given nbrs.

Returns

avg_nbrs_distances – Average distance to k-nearest neighbors for each datapoint which is used as a score for OOD detection.

Return type

np.array

cleanlab.rank.get_label_quality_ensemble_scores(labels: numpy.array, pred_probs_list: List[numpy.array], *, method: str = 'self_confidence', adjust_pred_probs: bool = False, weight_ensemble_members_by: str = 'accuracy', custom_weights: Optional[numpy.array] = None, log_loss_search_T_values: List[float] = [0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 200.0], verbose: bool = True) numpy.array[source]#

Returns label quality scores based on predictions from an ensemble of models.

This is a function to compute label-quality scores for classification datasets, where lower scores indicate labels less likely to be correct.

Ensemble scoring requires a list of pred_probs from each model in the ensemble.

For each pred_probs in list, compute label quality score. Take the average of the scores with the chosen weighting scheme determined by weight_ensemble_members_by.

Score is between 0 and 1:

  • 1 — clean label (given label is likely correct).

  • 0 — dirty label (given label is likely incorrect).

Parameters
  • labels (np.array) – Labels in the same format expected by the get_label_quality_scores function.

  • pred_probs_list (List[np.array]) – Each element in this list should be an array of pred_probs in the same format expected by the get_label_quality_scores function. Each element of pred_probs_list corresponds to the predictions from one model for all examples.

  • method ({"self_confidence", "normalized_margin", "confidence_weighted_entropy"}, default "self_confidence") – Label quality scoring method. See get_label_quality_scores for scenarios on when to use each method.

  • adjust_pred_probs (bool, optional) – adjust_pred_probs in the same format expected by the get_label_quality_scores function.

  • weight_ensemble_members_by ({"uniform", "accuracy", "log_loss_search", "custom"}, default "accuracy") –

    Weighting scheme used to aggregate scores from each model:

    • ”uniform”: Take the simple average of scores.

    • ”accuracy”: Take weighted average of scores, weighted by model accuracy.

    • ”log_loss_search”: Take weighted average of scores, weighted by exp(t * -log_loss) where t is selected from log_loss_search_T_values parameter and log_loss is the log-loss between a model’s pred_probs and the given labels.

    • ”custom”: Take weighted average of scores using custom weights that the user passes to the custom_weights parameter.

  • custom_weights (np.array, default None) – Weights used to aggregate scores from each model if weight_ensemble_members_by=”custom”. Length of this array must match the number of models: len(pred_probs_list).

  • log_loss_search_T_values (List, default [1e-4, 1e-3, 1e-2, 1e-1, 1e0, 1e1, 1e2, 2e2]) – List of t values considered if weight_ensemble_members_by=”log_loss_search”. We will choose the value of t that leads to weights which produce the best log-loss when used to form a weighted average of pred_probs from the models.

  • verbose (bool, default True) – Set to False to suppress all print statements.

Returns

label_quality_scores

Return type

np.array

cleanlab.rank.get_label_quality_scores(labels: numpy.array, pred_probs: numpy.array, *, method: str = 'self_confidence', adjust_pred_probs: bool = False) numpy.array[source]#

Returns label quality scores for each datapoint.

This is a function to compute label-quality scores for classification datasets, where lower scores indicate labels less likely to be correct.

Score is between 0 and 1.

1 - clean label (given label is likely correct). 0 - dirty label (given label is likely incorrect).

Parameters
  • labels (np.array) – A discrete vector of noisy labels, i.e. some labels may be erroneous. Format requirements: for dataset with K classes, labels must be in 0, 1, …, K-1.

  • pred_probs (np.array, optional) –

    An array of shape (N, K) of model-predicted probabilities, P(label=k|x). Each row of this matrix corresponds to an example x and contains the model-predicted probabilities that x belongs to each possible class, for each of the K classes. The columns must be ordered such that these probabilities correspond to class 0, 1, …, K-1.

    Caution: pred_probs from your model must be out-of-sample! You should never provide predictions on the same examples used to train the model, as these will be overfit and unsuitable for finding label-errors. To obtain out-of-sample predicted probabilities for every datapoint in your dataset, you can use cross-validation. Alternatively it is ok if your model was trained on a separate dataset and you are only evaluating data that was previously held-out.

  • method ({"self_confidence", "normalized_margin", "confidence_weighted_entropy"}, default "self_confidence") –

    Label quality scoring method.

    Letting k = labels[i] and P = pred_probs[i] denote the given label and predicted class-probabilities for datapoint i, its score can either be:

    • 'normalized_margin': P[k] - max_{k' != k}[ P[k'] ]

    • 'self_confidence': P[k]

    • 'confidence_weighted_entropy': entropy(P) / self_confidence

    Let C = {0, 1, ..., K} denote the classification task’s specified set of classes.

    The normalized_margin score works better for identifying class conditional label errors, i.e. examples for which another label in C is appropriate but the given label is not.

    The self_confidence score works better for identifying alternative label issues corresponding to bad examples that are: not from any of the classes in C, well-described by 2 or more labels in C, or generally just out-of-distribution (ie. anomalous outliers).

  • adjust_pred_probs (bool, optional) – Account for class imbalance in the label-quality scoring by adjusting predicted probabilities via subtraction of class confident thresholds and renormalization. Set this to True if you prefer to account for class-imbalance. See Northcutt et al., 2021.

Returns

label_quality_scores – Scores are between 0 and 1 where lower scores indicate labels less likely to be correct.

Return type

np.array

cleanlab.rank.get_normalized_margin_for_each_label(labels: numpy.array, pred_probs: numpy.array) numpy.array[source]#

Returns the “normalized margin” label-quality score for each datapoint.

This is a function to compute label-quality scores for classification datasets, where lower scores indicate labels less likely to be correct.

Letting k denote the given label for a datapoint, the normalized margin is (p(label = k) - max(p(label != k))), i.e. the probability of the given label minus the probability of the argmax label that is not the given label. This gives you an idea of how likely an example is BOTH its given label AND not another label, and therefore, scores its likelihood of being a good label or a label error.

Normalized margin works better for finding class conditional label errors where there is another label in the class that is better than the given label.

Parameters
Returns

label_quality_scores – An array of scores (between 0 and 1) for each example of its likelihood of being correctly labeled. normalized_margin = prob_label - max_prob_not_label

Return type

np.array

cleanlab.rank.get_self_confidence_for_each_label(labels: numpy.array, pred_probs: numpy.array) numpy.array[source]#

Returns the self-confidence label-quality score for each datapoint.

This is a function to compute label-quality scores for classification datasets, where lower scores indicate labels less likely to be correct.

The self-confidence is the holdout probability that an example belongs to its given class label.

Self-confidence works better for finding out-of-distribution (OOD) examples, weird examples, bad examples, multi-label, and other types of label errors.

Parameters
Returns

label_quality_scores – An array of holdout probabilities that each example in pred_probs belongs to its label.

Return type

np.array

cleanlab.rank.order_label_issues(label_issues_mask: numpy.array, labels: numpy.array, pred_probs: numpy.array, *, rank_by: str = 'self_confidence', rank_by_kwargs: dict = {}) numpy.array[source]#

Sorts label issues by label quality score.

Default label quality score is “self_confidence”.

Parameters
  • label_issues_mask (np.array) – A boolean mask for the entire dataset where True represents a label issue and False represents an example that is accurately labeled with high confidence.

  • labels (np.array) – Labels in the same format expected by the get_label_quality_scores function.

  • pred_probs (np.array (shape (N, K))) – Predicted-probabilities in the same format expected by the get_label_quality_scores function.

  • rank_by (str, optional) – Score by which to order label error indices (in increasing order). See the method argument of get_label_quality_scores.

  • rank_by_kwargs (dict, optional) – Optional keyword arguments to pass into get_label_quality_scores function. Accepted args include adjust_pred_probs.

Returns

label_issues_idx – Return an array of the indices of the label issues, ordered by the label-quality scoring method passed to rank_by.

Return type

np.array