欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

python数据分析练习——FoodFacts.csv分析

程序员文章站 2022-07-07 19:37:09
python数据分析练习——FoodFacts.csv分析导包import numpy as npimport pandas as pdimport datetimeimport matplotlib.pyplot as pltimport seaborn as sns# 正常显示中文标签plt.rcParams['font.sans-serif'] = ['SimHei']# 自动适应布局plt.rcParams.update({'figure.autolayout': True})...

python数据分析练习——FoodFacts.csv分析

导包

import numpy as np
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import seaborn as sns

# 正常显示中文标签
plt.rcParams['font.sans-serif'] = ['SimHei']
# 自动适应布局
plt.rcParams.update({'figure.autolayout': True})
# 正常显示负号
plt.rcParams['axes.unicode_minus'] = False

获取数据:导入待分析数据:dataset\FoodFacts.csv,并显示显示前10行。

data = pd.read_csv('dataset\FoodFacts.csv')
data.head(10)
code url creator created_t created_datetime last_modified_t last_modified_datetime product_name generic_name quantity ... caffeine_100g taurine_100g ph_100g fruits_vegetables_nuts_100g collagen_meat_protein_ratio_100g cocoa_100g chlorophyl_100g carbon_footprint_100g nutrition_score_fr_100g nutrition_score_uk_100g
0 000000000000012866 http://world-en.openfoodfacts.org/product/0000... date-limite-app 1447004364 2015-11-08T17:39:24Z 1447004364 2015-11-08T17:39:24Z Poêlée à la sarladaise NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 0000000024600 http://world-en.openfoodfacts.org/product/0000... date-limite-app 1434530704 2015-06-17T08:45:04Z 1434535914 2015-06-17T10:11:54Z Filet de bœuf NaN 2.46 kg ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 0000000036252 http://world-en.openfoodfacts.org/product/0000... tacinte 1422221701 2015-01-25T21:35:01Z 1422221855 2015-01-25T21:37:35Z Lion Peanut x2 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 0000000039259 http://world-en.openfoodfacts.org/product/0000... tacinte 1422221773 2015-01-25T21:36:13Z 1422221926 2015-01-25T21:38:46Z Twix x2 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 0000000039529 http://world-en.openfoodfacts.org/product/0000... teolemon 1420147051 2015-01-01T21:17:31Z 1439141740 2015-08-09T17:35:40Z Pack de 2 Twix NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 0000001071894 http://world-en.openfoodfacts.org/product/0000... bcatelin 1409411252 2014-08-30T15:07:32Z 1439141739 2015-08-09T17:35:39Z Flute Flute NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 0000005200016 http://world-en.openfoodfacts.org/product/0000... sigoise 1441186657 2015-09-02T09:37:37Z 1442570752 2015-09-18T10:05:52Z lentilles vertes NaN 1 kg ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 0000007020254 http://world-en.openfoodfacts.org/product/0000... teolemon 1420150193 2015-01-01T22:09:53Z 1420210373 2015-01-02T14:52:53Z NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 0000010090206 http://world-en.openfoodfacts.org/product/0000... sebleouf 1370977431 2013-06-11T19:03:51Z 1445083431 2015-10-17T12:03:51Z Thé de Noël aromatisé orange-cannelle NaN 75 g ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9 0000020364373 http://world-en.openfoodfacts.org/product/0000... openfoodfacts-contributors 1393970573 2014-03-04T22:02:53Z 1393970733 2014-03-04T22:05:33Z Zumo de Piña NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

10 rows × 159 columns

查看文件大小,会发现文件列数过多,而我们只需要对’countries_en’和 'additives_n’这两列进行分析,因此,请处理数据只剩下这两列即可;然后判断有无缺失值,若有,直接删除缺失值。

data2 = data[['countries_en', 'additives_n']]
data2
countries_en additives_n
0 France NaN
1 France NaN
2 France NaN
3 France NaN
4 France NaN
... ... ...
65498 Poland NaN
65499 France 0.0
65500 France NaN
65501 France 0.0
65502 China NaN

65503 rows × 2 columns

data2.dropna(axis=0, how='any', inplace=True)
data2
countries_en additives_n
5 United Kingdom 0.0
6 France 0.0
8 France 0.0
10 United Kingdom 5.0
11 United Kingdom 5.0
... ... ...
65480 United States 4.0
65490 France 0.0
65494 France 0.0
65499 France 0.0
65501 France 0.0

43616 rows × 2 columns

将所有国家名称转换为小写

data2['countries_en'] = data2['countries_en'].str.lower()
data2
countries_en additives_n
5 united kingdom 0.0
6 france 0.0
8 france 0.0
10 united kingdom 5.0
11 united kingdom 5.0
... ... ...
65480 united states 4.0
65490 france 0.0
65494 france 0.0
65499 france 0.0
65501 france 0.0

43616 rows × 2 columns

按国家对additives_n列进行分组,并统计使用添加剂的平均数量,并对统计结果(均值)从大到小排序.

group = data2['additives_n'].groupby(data2['countries_en']).mean().sort_values(ascending=False)
group.head(10).index
Index(['australia,indonesia,united states', 'france,saudi arabia',
       'denmark,france,portugal', 'france,greece,netherlands', 'togo', 'qatar',
       'denmark,france,switzerland', 'france,luxembourg',
       'egypt,united kingdom,united states', 'australia,new zealand'],
      dtype='object', name='countries_en')

对排名前10的数据进行简单的可视化展示(例如,使用matplotlib中的柱状图、折线图或者饼图等任意一种绘图方式进行)

plt.figure(figsize=(10, 6))
plt.pie(group.head(10),labels=group.head(10).index, colors=sns.color_palette('hls',10),
       shadow=True,startangle=90,autopct='%.2f%%')
plt.axis('equal')
plt.legend(loc='upper right')

python数据分析练习——FoodFacts.csv分析

本文地址:https://blog.csdn.net/qq_41754907/article/details/107164607