fasttext#

Text classification with FastText models that are compatible with cleanlab. This module allows you to easily find label issues in your text datasets.

You must first pip install fasttext

Classes:

FastTextClassifier(train_data_fn[, ...])

Functions:

data_loader([fn, indices, label, batch_size])

Returns a generator, yielding two lists containing [labels], [text].

class cleanlab.experimental.fasttext.FastTextClassifier(train_data_fn, test_data_fn=None, labels=None, tmp_dir='', label='__label__', del_intermediate_data=True, kwargs_train_supervised={}, p_at_k=1, batch_size=1000)[source]#

Bases: sklearn.base.BaseEstimator

Methods:

`fit`([X, y, sample_weight])	Trains the fast text classifier.
`get_params`([deep])	Get parameters for this estimator.
`predict`([X, train_data, return_labels])	Predict labels of X
`predict_proba`([X, train_data, return_labels])	Produces a probability matrix with examples on rows and classes on columns, where each row sums to 1 and captures the probability of the example belonging to each class.
`score`([X, y, sample_weight, k])	Compute the average precision @ k (single label) of the labels predicted from X and the true labels given by y.
`set_params`(**params)	Set the parameters of this estimator.

fit(X=None, y=None, sample_weight=None)[source]#

Trains the fast text classifier. Typical usage requires NO parameters, just clf.fit() # No params.

Parameters

X (iterable, e.g. list, numpy array (default None)) – The list of indices of the data to use. When in doubt, set as None. None defaults to range(len(data)).
y (None) – Leave this as None. It’s a filler to suit sklearns reqs.
sample_weight (None) – Leave this as None. It’s a filler to suit sklearns reqs.

get_params(deep=True)#

Get parameters for this estimator.

Parameters: deep (bool, default True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

predict(X=None, train_data=True, return_labels=False)[source]#: Predict labels of X

predict_proba(X=None, train_data=True, return_labels=False)[source]#: Produces a probability matrix with examples on rows and classes on columns, where each row sums to 1 and captures the probability of the example belonging to each class.

score(X=None, y=None, sample_weight=None, k=None)[source]#: Compute the average precision @ k (single label) of the labels predicted from X and the true labels given by y. score expects a y variable. In this case, y is the noisy labels.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

cleanlab.experimental.fasttext.data_loader(fn=None, indices=None, label='__label__', batch_size=1000)[source]#: Returns a generator, yielding two lists containing [labels], [text]. Items are always returned in the order in the file, regardless if indices are provided.