欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

特征重要性评估及筛选

程序员文章站 2022-07-14 13:41:32
...
feature_results = pd.DataFrame({'feature': list(train_features.columns),
                               'importance': model.feature_importances_})
feature_results = feature_results.sort_values('importance',ascending=False).reset_index(drop=True)

from IPython.core.pylabtools import figsize
figsize(12, 10)
plt.style.use('ggplot')

feature_results.loc[:9, :].plot(x = 'feature', y = 'importance', edgecolor = 'k',
                               kind = 'barh', color = 'blue')
plt.xlabel('Relative Importance', fontsize = 18); plt.ylabel('')
plt.title('Feature Importances from Random Forest', size = 26)

特征重要性评估及筛选

cumulative_importances = np.cumsum(feature_results['importance'])
plt.figure(figsize = (20, 6))
plt.plot(list(range(feature_results.shape[0])), cumulative_importances.values, 'b-')
plt.hlines(y=0.95, xmin=0, xmax=feature_results.shape[0], color='r', linestyles='dashed')
# plt.xticks(list(range(feature_results.shape[0])), feature_results.feature, rotation=60)
plt.xlabel('Feature', fontsize = 18)
plt.ylabel('Cumulative importance', fontsize = 18)
plt.title('Cumulative Importances', fontsize = 26)

特征重要性评估及筛选

most_num_importances = np.where(cumulative_importances > 0.95)[0][0] + 1
print('Number of features for 95% importance: ', most_num_importances)

Number of features for 95% importance: 13

  • 基于重要性来进行特征选择
most_important_features = feature_results['feature'][:13]
indices = [list(train_features.columns).index(x) for x in most_important_features]
X_reduced = X[:, indices]
X_test_reduced = X_test[:, indices]
print('Most import training features shape: ', X_reduced.shape)
print('Most import testing features shape: ', X_test_reduced.shape)

Most import training features shape: (6622, 13)
Most import testing features shape: (2839, 13)