noise_generation#
Functions:
When min_prob=0 and max_prob = 1.0, this method is deprecated. |
|
|
DEPRECATED - Use generate_noise_matrix_from_trace() |
|
Generates a K x K noise matrix P(label=k_s|true_label=k_y) with trace as the np.mean(np.diagonal(noise_matrix)). |
|
Generates noisy labels (shape (N, 1)) from perfect labels y, 'exactly' yielding the provided noise_matrix between labels and y. |
|
Given a prior py = p(true_label=k), returns true if the given noise_matrix is a learnable matrix. |
Returns a uniformly random numpy integer array of length N that sums to K. |
- cleanlab.noise_generation.generate_n_rand_probabilities_that_sum_to_m(n, m, *, max_prob=1.0, min_prob=0.0)[source]#
When min_prob=0 and max_prob = 1.0, this method is deprecated. Instead use np.random.dirichlet(np.ones(n))*m
Generates ‘n’ random probabilities that sum to ‘m’.
- Parameters
n (
int
) – Length of np.array of random probabilities to be returned.m (
float
) – Sum of np.array of random probabilities that is returned.max_prob (
float (0.0
,1.0] | Default value is 1.0
) – Maximum probability of any entry in the returned np.array.min_prob (
float [0.0
,1.0) | Default value is 0.0
) – Minimum probability of any entry in the returned np.array.
- cleanlab.noise_generation.generate_noise_matrix(K, *, max_noise_rate=1.0, frac_zero_noise_rates=0.0, verbose=False)[source]#
DEPRECATED - Use generate_noise_matrix_from_trace()
Generates a noise matrix by randomly assigning noise rates up to max_noise_rate, then setting noise rates to zero until P(label!=k|label=k) < 1 is satisfied. Additionally, frac_zero_noise_rates are set to zero.
- Parameters
K (
int
) – Creates a noise matrix of shape (K, K). Implies there are K classes for learning with noisy labels.max_noise_rate (
float
) – Smaller —> easier learning problem (less noise)frac_zero_noise_rates (
float
) – Make problem more tractable by making a fraction of noise rates zero. Larger –> Easier learning problemverbose (
bool
) – Print debugging output if set to True.
- cleanlab.noise_generation.generate_noise_matrix_from_trace(K, trace, *, max_trace_prob=1.0, min_trace_prob=1e-05, max_noise_rate=0.99999, min_noise_rate=0.0, valid_noise_matrix=True, py=None, frac_zero_noise_rates=0.0, seed=0, max_iter=10000)[source]#
Generates a K x K noise matrix P(label=k_s|true_label=k_y) with trace as the np.mean(np.diagonal(noise_matrix)).
- Parameters
K (
int
) – Creates a noise matrix of shape (K, K). Implies there are K classes for learning with noisy labels.trace (
float (0.0
,1.0]
) – Sum of diagonal entries of np.array of random probabilities returned.max_trace_prob (
float (0.0
,1.0]
) – Maximum probability of any entry in the trace of the return matrix.min_trace_prob (
float [0.0
,1.0)
) – Minimum probability of any entry in the trace of the return matrix.max_noise_rate (
float (0.0
,1.0]
) – Maximum noise_rate (non-diagonal entry) in the returned np.array.min_noise_rate (
float [0.0
,1.0)
) – Minimum noise_rate (non-diagonal entry) in the returned np.array.valid_noise_matrix (
bool
) – If True, returns a matrix having all necessary conditions for learning with noisy labels. In particular, p(true_label=k)p(label=k) < p(true_label=k,label=k) is satisfied. This requires that Trace > 1.py (
np.array (shape (K
,1))
) – Fraction (prior probability) of each true class label, P(true_label = k). REQUIRED when valid_noise_matrix == True.frac_zero_noise_rates (
float
) – The fraction of the n*(n-1) noise rates that will be set to 0. Note that if you set a high trace, it may be impossible to also have a low fraction of zero noise rates without forcing all non-“1” diagonal values. Instead, when this happens we only guarantee to produce a noise matrix with frac_zero_noise_rates or higher. The opposite occurs with a small trace.seed (
int
) – Seeds the random number generator for numpy.max_iter (
int (default
:10000)
) – The max number of tries to produce a valid matrix before returning False.
- Returns
noise matrix P(label=k_s|true_label=k_y) with trace as the np.sum(np.diagonal(noise_matrix)). This a conditional probability matrix and a left stochastic matrix.
- Return type
np.array (shape (K
,K))
- cleanlab.noise_generation.generate_noisy_labels(true_labels, noise_matrix)[source]#
Generates noisy labels (shape (N, 1)) from perfect labels y, ‘exactly’ yielding the provided noise_matrix between labels and y.
Below we provide a for loop implementation of what this function does. We do not use this implementation as it is not a fast algorithm, but it explains as Python pseudocode what is happening in this function.
- Parameters
true_labels (
np.array (shape (N
,1))
) – Perfect labels, without any noise. Contains K distinct natural number classes, e.g. 0, 1,…, K-1noise_matrix (
np.array
ofshape (K
,K)
,K = number
ofclasses
) – A conditional probability matrix of the form P(label=k_s|true_label=k_y) containing the fraction of examples in every class, labeled as every other class. Assumes columns of noise_matrix sum to 1.
Examples
# Generate labels count_joint = (noise_matrix * py * len(y)).round().astype(int) labels = np.array(y) for k_s in range(K): for k_y in range(K): if k_s != k_y: idx_flip = np.where((labels==k_y)&(true_label==k_y))[0] if len(idx_flip): # pragma: no cover labels[np.random.choice( idx_flip, count_joint[k_s][k_y], replace=False, )] = k_s
- cleanlab.noise_generation.noise_matrix_is_valid(noise_matrix, py, *, verbose=False)[source]#
Given a prior py = p(true_label=k), returns true if the given noise_matrix is a learnable matrix. Learnability means that it is possible to achieve better than random performance, on average, for the amount of noise in noise_matrix.