f1_score —— 类别不平衡问题的重要metrics

程序员文章站 2022-03-22 16:55:57

...

F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal.

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html

【1】调用: from sklearn.metrics import f1_score

f1_score (y_true, y_pred, labels=None, pos_label=1, average=’binary’, sample_weight=None)

【2】关键参数

（1）average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘weighted’, ‘samples’]，This parameter is required for multiclass/multilabel targets.

①None：返回每一类各自的f1_score，得到一个array。

②'binary': 只对二分类问题有效，返回由pos_label指定的类的f1_score。

Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary.

③'micro': 设置average='micro'时，Precision = Recall = F1_score = Accuracy。

Note that for “micro”-averaging in a multiclass setting with all labels included will produce equal precision, recall and F_beta.

Calculate metrics globally by counting the total true positives, false negatives and false positives.

④'macro': 对每一类别的f1_score进行简单算术平均（unweighted mean）, with assumption that all classes are equally important。

Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

⑤'weighted': 对每一类别的f1_score进行加权平均，权重为各类别数在y_true中所占比例。

Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

⑥'samples':

Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).

【3】例子

from sklearn.metrics import f1_score, accuracy_score
y_true = [0, 1, 2, 0, 1, 2, 2]
y_pred = [0, 2, 1, 0, 0, 1, 1]

macro = f1_score(y_true, y_pred, average='macro')  

micro = f1_score(y_true, y_pred, average='micro')  

weighted = f1_score(y_true, y_pred, average='weighted')  

none = f1_score(y_true, y_pred, average=None)

acc = accuracy_score(y_true, y_pred)

print(micro==acc)

【4】User Guide：

http://scikit-learn.org/stable/modules/model_evaluation.html#average

3.3.2.1. From binary to multiclass and multilabel

Some metrics are essentially defined for binary classification tasks (e.g. f1_score, roc_auc_score). In these cases, by default only the positive label is evaluated, assuming by default that the positive class is labelled 1 (though this may be configurable through the pos_label parameter).

In extending a binary metric to multiclass or multilabel problems, the data is treated as a collection of binary problems, one for each class. There are then a number of ways to average binary metric calculations across the set of classes, each of which may be useful in some scenario. Where available, you should select among these using the average parameter.

"macro" simply calculates the mean of the binary metrics, giving equal weight to each class. In problems where infrequent classes are nonetheless important, macro-averaging may be a means of highlighting their performance. On the other hand, the assumption that all classes are equally important is often untrue, such that macro-averaging will over-emphasize the typically low performance on an infrequent class.
"weighted" accounts for class imbalance by computing the average of binary metrics in which each class’s score is weighted by its presence in the true data sample.
"micro" gives each sample-class pair an equal contribution to the overall metric (except as a result of sample-weight). Rather than summing the metric per class, this sums the dividends and divisors that make up the per-class metrics to calculate an overall quotient. Micro-averaging may be preferred in multilabel settings, including multiclass classification where a majority class is to be ignored.
"samples" applies only to multilabel problems. It does not calculate a per-class measure, instead calculating the metric over the true and predicted classes for each sample in the evaluation data, and returning their (sample_weight-weighted) average.
Selecting average=None will return an array with the score for each class.

While multiclass data is provided to the metric, like binary targets, as an array of class labels, multilabel data is specified as an indicator matrix, in which cell [i, j] has value 1 if sample i has label j and value 0 otherwise.

3.3.2.8.1. Binary classification

f1_score —— 类别不平衡问题的重要metrics

3.3.2.8.2. Multiclass and multilabel classification

f1_score —— 类别不平衡问题的重要metrics