欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

自定义损失函数

程序员文章站 2024-03-14 12:28:58
...

lightgbm自定义示例:
https://www.cnblogs.com/fujian-code/p/9804129.html
https://github.com/manifoldai/mf-eng-public/blob/master/notebooks/custom_loss_lightgbm.ipynb

担心链接失效:重新编辑。

%load_ext autoreload
%autoreload 2
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from lightgbm import LGBMRegressor
import lightgbm 
from sklearn.datasets import make_friedman2, make_friedman1, make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import lightgbm
import seaborn as sns; sns.set()
from sklearn.metrics import mean_absolute_error, mean_squared_error
sns.set_style("whitegrid", {'axes.grid' : False})

Simulating Friendman dataset
About the dataset

Inputs X are independent features uniformly distributed on the interval [0, 1]. The output y is created according to the formula:
y(X) = 10 * sin(pi * X[:, 0] * X[:, 1]) + 20 * (X[:, 2] - 0.5) ** 2 + 10 * X[:, 3] + 5 * X[:, 4] + noise * N(0, 1).

Out of the n_features features, only 5 are actually used to compute y. The remaining features are independent of y.

# simulating 10,000 data points with 2 useless and 5 uniformly distributed features

X, y = make_friedman1(n_samples=10000, n_features=7, noise=0.0, random_state=11)
min(y), max(y) 
(0.3545368892371061, 28.516918961287963)
# distribution of target variable
h = plt.hist(y)

自定义损失函数

# train-validation split
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.20, random_state=42)
# test set for generalization of scores
X_test, y_test = make_friedman1(n_samples=5000, n_features=7, noise=0.0, random_state=21)
### Plotting helper functions

def plot_residual_distribution(model):
    """
    Density plot of residuals (y_true - y_pred) for testation set for given model 
    """
    ax = sns.distplot(y_test - model.predict(X_test), hist = False, kde = True,
                 kde_kws = {'shade': True, 'linewidth': 3}, axlabel="Residual")
    title = ax.set_title('Kernel density of residuals', size=15)

def plot_scatter_pred_actual(model):
    """
    Scatter plot of predictions from given model vs true target variable from testation set
    """
    ax = sns.scatterplot(x=model.predict(X_test), y = y_test)
    ax.set_xlabel('Predictions')
    ax.set_ylabel('Actuals')
    title = ax.set_title('Actual vs Prediction scatter plot', size=15)   

Random Forest

# basic random forest regressor with mse as criterion to measure the quality of split

rf = RandomForestRegressor(n_estimators=50, oob_score=True, random_state=33)
rf.fit(X_train, y_train)
plot_residual_distribution(rf)

自定义损失函数

plot_scatter_pred_actual(rf)

自定义损失函数

print(f"MSE is {mean_squared_error(y_test, rf.predict(X_test))}")
MSE is 1.0925877452294468

Default LightGBM
LightGBM default: MSE

# make new model on new value
gbm = lightgbm.LGBMRegressor(random_state=33)
gbm.fit(X_train,y_train)
print(f"MSE is {mean_squared_error(y_test, gbm.predict(X_test))}")
MSE is 0.2362458093307746
We see that GBM has performed a better than random forest model for our validation MSE score

LightGBM default: MSE + early stopping¶

# make new model on new value
# 'regression' is actually also the default objective for LGBMRegressor

gbm2 = lightgbm.LGBMRegressor(objective='regression',
                              random_state=33,
                              early_stopping_rounds = 10,
                              n_estimators=10000
                             )

gbm2.fit(
    X_train,
    y_train,
    eval_set=[(X_valid, y_valid)],
    eval_metric='l2',  # also the default
    verbose=False,
)

It is basically similar to the default model we fitted in the previous section but we did not use eval_set in that section. Because we are specifically mentioning eval_set in this case, we can leverage early_stopping_rounds and run for more boosting iterations which improves our model performance aand saves best score based on the give metrics and eval_score.

print(f"MSE is {mean_squared_error(y_test, gbm2.predict(X_test))}")
MSE is 0.13763903255048504
We see an improvement in score as the model is able to run for more boosting iterations

Assymetric Custom Loss
There are 2 parameters that we might be interested in which define the traininig process in gradient boosted based tree based models. In context of LightGBM, we

In LightGBM training API:

fobj: Customized objective function
feval: Customized evaluation function. basically a way to use custom metric for cv. used in addition to metric
metric: a function to be monitored while doing cross validation. (select hyperparameters that minimize or maximuze this). can be plural
In sklearn wrapper around LightGBM API:

objective: default parameter in model()
eval_metric in model.fit()
I am going to use sklearn wrapper to set the objective and evaluation metric, but these are essentially the same

Let’s say that we don’t want our model to overpredict, but we are fine with underpredictions.

We can make a custom loss which gives 10 times more penalty when the true targets are less than predictions as compared to when true targets are more.

Let’s say that we don’t want our model to overpredict, but we are fine with underpredictions.

We can make a custom loss which gives 10 times more penalty when the true targets are less than predictions as compared to when true targets are more

def custom_asymmetric_objective(y_true, y_pred):
    residual = (y_true - y_pred).astype("float")
    grad = np.where(residual<0, -2*10.0*residual, -2*residual)
    hess = np.where(residual<0, 2*10.0, 2.0)
    return grad, hess

def custom_asymmetric_eval(y_true, y_pred):
    residual = (y_true - y_pred).astype("float")
    loss = np.where(residual < 0, (residual**2)*10.0, residual**2) 
    return "custom_asymmetric_eval", np.mean(loss), False

Exploring our custom loss function with some plots

# let's see how our custom loss function looks with respect to different prediction values

y_true = np.repeat(0,1000)
y_pred = np.linspace(-100,100,1000)
residual = (y_true - y_pred).astype("float")

custom_loss = np.where(residual < 0, (residual**2)*10.0, residual**2) 

fig, ax = plt.subplots(1,1, figsize=(8,4))
sns.lineplot(y_pred, custom_loss, alpha=1, label="asymmetric mse")
sns.lineplot(y_pred, residual**2, alpha = 0.5, label = "symmetric mse", color="red")
ax.set_xlabel("Predictions")
ax.set_ylabel("Loss value")

fig.tight_layout()

自定义损失函数

grad, hess = custom_asymmetric_objective(y_true, y_pred)

fig, ax = plt.subplots(1,1, figsize=(8,4))

# ax.plot(y_hat, errors)
ax.plot(y_pred, grad)
ax.plot(y_pred, hess)
ax.legend(('gradient', 'hessian'))
ax.set_xlabel('Predictions')
ax.set_ylabel('first or second derivates')

fig.tight_layout()

自定义损失函数
The behaviour of gradient of custom loss is as per our expecation. The slope has a little higher value when the residual is negative as compared to when it is positive

LightGBM custom objective

# make new model on new value
gbm3 = lightgbm.LGBMRegressor(random_state=33)
gbm3.set_params(**{'objective': custom_asymmetric_objective}, metrics = ["mse", 'mae'])

gbm3.fit(
    X_train,
    y_train,
    eval_set=[(X_valid, y_valid)],
    eval_metric='l2',
    verbose=False,
)

LightGBM_early_boosting custom eval_metric

# make new model on new value
gbm4 = lightgbm.LGBMRegressor(random_state=33,
                              early_stopping_rounds = 10,
                              n_estimators=10000
                             )

gbm4.set_params(**{'objective': "regression"}, metrics = ["mse", 'mae'])

gbm4.fit(
    X_train,
    y_train,
    eval_set=[(X_valid, y_valid)],
    eval_metric=custom_asymmetric_eval,
    verbose=False,
)

LightGBM_early_boosting custom objective

# make new model on new value
gbm5 = lightgbm.LGBMRegressor(random_state=33,
                              early_stopping_rounds = 10,
                              n_estimators=10000
                             )

gbm5.set_params(**{'objective': custom_asymmetric_objective}, metrics = ["mse", 'mae'])

gbm5.fit(
    X_train,
    y_train,
    eval_set=[(X_valid, y_valid)],
    eval_metric="l2",
    verbose=False,
)

LightGBM_early_boosting custom eval_metric + objective

# make new model on new value
gbm6 = lightgbm.LGBMRegressor(random_state=33,
                              early_stopping_rounds = 10,
                              n_estimators=10000
                             )

gbm6.set_params(**{'objective': custom_asymmetric_objective}, metrics = ["mse", 'mae'])

gbm6.fit(
    X_train,
    y_train,
    eval_set=[(X_valid, y_valid)],
    eval_metric=custom_asymmetric_eval,
    verbose=False,
)

Reporting scores for different models
Scores table

# asymmetric mse scores
_,loss_rf,_ = custom_asymmetric_eval(y_test, rf.predict(X_test))
_,loss_gbm,_ = custom_asymmetric_eval(y_test, gbm.predict(X_test))
_,loss_gbm2,_ = custom_asymmetric_eval(y_test, gbm2.predict(X_test))
_,loss_gbm3,_ = custom_asymmetric_eval(y_test, gbm3.predict(X_test))
_,loss_gbm4,_ = custom_asymmetric_eval(y_test, gbm4.predict(X_test))
_,loss_gbm5,_ = custom_asymmetric_eval(y_test, gbm5.predict(X_test))
_,loss_gbm6,_ = custom_asymmetric_eval(y_test, gbm6.predict(X_test))
score_dict = {'Random Forest default':
              {'asymmetric custom mse (test)': loss_rf,
               'asymmetric custom mse (train)': custom_asymmetric_eval(y_train, rf.predict(X_train))[1],
               'symmetric mse': mean_squared_error(y_test, rf.predict(X_test)),
               '# boosting rounds' : '-'},
              
              'LightGBM default' : 
              {'asymmetric custom mse (test)': loss_gbm,
              'asymmetric custom mse (train)': custom_asymmetric_eval(y_train, gbm.predict(X_train))[1],
               'symmetric mse': mean_squared_error(y_test, gbm.predict(X_test)), 
               '# boosting rounds' : gbm.booster_.current_iteration()},
              
              'LightGBM with custom training loss (no hyperparameter tuning)': 
              {'asymmetric custom mse (test)': loss_gbm3,
               'asymmetric custom mse (train)': custom_asymmetric_eval(y_train, gbm3.predict(X_train))[1],               
               'symmetric mse': mean_squared_error(y_test, gbm3.predict(X_test)),
               '# boosting rounds' : gbm3.booster_.current_iteration()},
              
              'LightGBM with early stopping' : 
              {'asymmetric custom mse (test)': loss_gbm2,
               'asymmetric custom mse (train)': custom_asymmetric_eval(y_train, gbm2.predict(X_train))[1],
               'symmetric mse': mean_squared_error(y_test, gbm2.predict(X_test)),
               '# boosting rounds' : gbm2.booster_.current_iteration()},

             'LightGBM with early_stopping and custom validation loss': 
              {'asymmetric custom mse (test)': loss_gbm4,
               'asymmetric custom mse (train)': custom_asymmetric_eval(y_train, gbm4.predict(X_train))[1],
               'symmetric mse': mean_squared_error(y_test, gbm4.predict(X_test)),
               '# boosting rounds' : gbm4.booster_.current_iteration()},
              
              'LightGBM with early_stopping and custom training loss': 
              {'asymmetric custom mse (test)': loss_gbm5,
               'asymmetric custom mse (train)': custom_asymmetric_eval(y_train, gbm5.predict(X_train))[1],
               'symmetric mse': mean_squared_error(y_test, gbm5.predict(X_test)),
               '# boosting rounds' : gbm5.booster_.current_iteration()}, 
              
              'LightGBM with early_stopping, custom training and custom validation loss': 
              {'asymmetric custom mse (test)': loss_gbm6,
               'asymmetric custom mse (train)': custom_asymmetric_eval(y_train, gbm6.predict(X_train))[1],
               'symmetric mse': mean_squared_error(y_test, gbm6.predict(X_test)),
               '# boosting rounds' : gbm6.booster_.current_iteration()}
             
             }
pd.DataFrame(score_dict).T

自定义损失函数
Plots
Density plot to show comparison of LightGBM with symmetric and with asymmetric MSE functions

fig, ax = plt.subplots(figsize=(12,6))
ax = sns.distplot(y_test - gbm.predict(X_test), hist = False, kde = True,
             kde_kws = {'shade': True, 'linewidth': 3}, axlabel="Residual", label = "LightGBM with default mse")
ax = sns.distplot(y_test - gbm3.predict(X_test), hist = False, kde = True,
             kde_kws = {'shade': True, 'linewidth': 3}, axlabel="Residual", label = "LightGBM with asymmetric mse")

# control x and y limits
ax.set_xlim(-3, 3)

title = ax.set_title('Kernel density plot of residuals', size=15)

自定义损失函数

fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(12,5))
ax1, ax2, ax3 = ax.flatten()

ax1.plot(rf.predict(X_test), y_test, 'o', color='#1c9099')
ax1.set_xlabel('Predictions')
ax1.set_ylabel('Actuals')
ax1.set_title('Random Forest')  

ax2.plot(gbm.predict(X_test), y_test, 'o', color='#1c9099')
ax2.set_xlabel('Predictions')
ax2.set_ylabel('Actuals')
ax2.set_title('LightGBM default') 

ax3.plot(gbm6.predict(X_test), y_test, 'o', color='#1c9099')
ax3.set_xlabel('Predictions')
ax3.set_ylabel('Actuals')
ax3.set_title('LightGBM with early_stopping, \n custom objective and custom evalution') 

fig.suptitle("Scatter plots of predictions vs. actual targets for different models", y = 1.05, fontsize=15)
fig.tight_layout()

自定义损失函数

fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(12,5))
ax1, ax2, ax3 = ax.flatten()

ax1.hist(y_test - rf.predict(X_test), bins=50, color='#1c9099')
ax1.axvline(x=0, ymin=0, ymax=500, color='black', lw=1.2)
ax1.set_xlabel('Residuals')
ax1.set_title('Random Forest')  
ax1.set_ylabel('# observations')

ax2.hist(y_test - gbm.predict(X_test), bins=50,  color='#1c9099')
ax2.axvline(x=0, ymin=0, ymax=500, color='black', lw=1.2)
ax2.set_xlabel('Residuals')
ax2.set_ylabel('# observations')
ax2.set_title('LightGBM default') 

ax3.hist(y_test - gbm6.predict(X_test), bins=50,  color='#1c9099')
ax3.axvline(x=0, ymin=0, ymax=500, color='black', lw=1.2)
ax3.set_xlabel('Residuals')
ax3.set_ylabel('# observations')
ax3.set_title('LightGBM with early_stopping, \n custom objective and custom evalution') 

fig.suptitle("Error histograms of predictions from different models", y = 1.05, fontsize=15)
fig.tight_layout()

自定义损失函数