python数据分析练习——FoodFacts.csv分析
程序员文章站
2022-07-07 19:37:09
python数据分析练习——FoodFacts.csv分析导包import numpy as npimport pandas as pdimport datetimeimport matplotlib.pyplot as pltimport seaborn as sns# 正常显示中文标签plt.rcParams['font.sans-serif'] = ['SimHei']# 自动适应布局plt.rcParams.update({'figure.autolayout': True})...
python数据分析练习——FoodFacts.csv分析
导包
import numpy as np
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
# 正常显示中文标签
plt.rcParams['font.sans-serif'] = ['SimHei']
# 自动适应布局
plt.rcParams.update({'figure.autolayout': True})
# 正常显示负号
plt.rcParams['axes.unicode_minus'] = False
获取数据:导入待分析数据:dataset\FoodFacts.csv,并显示显示前10行。
data = pd.read_csv('dataset\FoodFacts.csv')
data.head(10)
code | url | creator | created_t | created_datetime | last_modified_t | last_modified_datetime | product_name | generic_name | quantity | ... | caffeine_100g | taurine_100g | ph_100g | fruits_vegetables_nuts_100g | collagen_meat_protein_ratio_100g | cocoa_100g | chlorophyl_100g | carbon_footprint_100g | nutrition_score_fr_100g | nutrition_score_uk_100g | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 000000000000012866 | http://world-en.openfoodfacts.org/product/0000... | date-limite-app | 1447004364 | 2015-11-08T17:39:24Z | 1447004364 | 2015-11-08T17:39:24Z | Poêlée à la sarladaise | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 0000000024600 | http://world-en.openfoodfacts.org/product/0000... | date-limite-app | 1434530704 | 2015-06-17T08:45:04Z | 1434535914 | 2015-06-17T10:11:54Z | Filet de bœuf | NaN | 2.46 kg | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 0000000036252 | http://world-en.openfoodfacts.org/product/0000... | tacinte | 1422221701 | 2015-01-25T21:35:01Z | 1422221855 | 2015-01-25T21:37:35Z | Lion Peanut x2 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 0000000039259 | http://world-en.openfoodfacts.org/product/0000... | tacinte | 1422221773 | 2015-01-25T21:36:13Z | 1422221926 | 2015-01-25T21:38:46Z | Twix x2 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 0000000039529 | http://world-en.openfoodfacts.org/product/0000... | teolemon | 1420147051 | 2015-01-01T21:17:31Z | 1439141740 | 2015-08-09T17:35:40Z | Pack de 2 Twix | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 | 0000001071894 | http://world-en.openfoodfacts.org/product/0000... | bcatelin | 1409411252 | 2014-08-30T15:07:32Z | 1439141739 | 2015-08-09T17:35:39Z | Flute | Flute | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
6 | 0000005200016 | http://world-en.openfoodfacts.org/product/0000... | sigoise | 1441186657 | 2015-09-02T09:37:37Z | 1442570752 | 2015-09-18T10:05:52Z | lentilles vertes | NaN | 1 kg | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
7 | 0000007020254 | http://world-en.openfoodfacts.org/product/0000... | teolemon | 1420150193 | 2015-01-01T22:09:53Z | 1420210373 | 2015-01-02T14:52:53Z | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
8 | 0000010090206 | http://world-en.openfoodfacts.org/product/0000... | sebleouf | 1370977431 | 2013-06-11T19:03:51Z | 1445083431 | 2015-10-17T12:03:51Z | Thé de Noël aromatisé orange-cannelle | NaN | 75 g | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
9 | 0000020364373 | http://world-en.openfoodfacts.org/product/0000... | openfoodfacts-contributors | 1393970573 | 2014-03-04T22:02:53Z | 1393970733 | 2014-03-04T22:05:33Z | Zumo de Piña | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
10 rows × 159 columns
查看文件大小,会发现文件列数过多,而我们只需要对’countries_en’和 'additives_n’这两列进行分析,因此,请处理数据只剩下这两列即可;然后判断有无缺失值,若有,直接删除缺失值。
data2 = data[['countries_en', 'additives_n']]
data2
countries_en | additives_n | |
---|---|---|
0 | France | NaN |
1 | France | NaN |
2 | France | NaN |
3 | France | NaN |
4 | France | NaN |
... | ... | ... |
65498 | Poland | NaN |
65499 | France | 0.0 |
65500 | France | NaN |
65501 | France | 0.0 |
65502 | China | NaN |
65503 rows × 2 columns
data2.dropna(axis=0, how='any', inplace=True)
data2
countries_en | additives_n | |
---|---|---|
5 | United Kingdom | 0.0 |
6 | France | 0.0 |
8 | France | 0.0 |
10 | United Kingdom | 5.0 |
11 | United Kingdom | 5.0 |
... | ... | ... |
65480 | United States | 4.0 |
65490 | France | 0.0 |
65494 | France | 0.0 |
65499 | France | 0.0 |
65501 | France | 0.0 |
43616 rows × 2 columns
将所有国家名称转换为小写
data2['countries_en'] = data2['countries_en'].str.lower()
data2
countries_en | additives_n | |
---|---|---|
5 | united kingdom | 0.0 |
6 | france | 0.0 |
8 | france | 0.0 |
10 | united kingdom | 5.0 |
11 | united kingdom | 5.0 |
... | ... | ... |
65480 | united states | 4.0 |
65490 | france | 0.0 |
65494 | france | 0.0 |
65499 | france | 0.0 |
65501 | france | 0.0 |
43616 rows × 2 columns
按国家对additives_n列进行分组,并统计使用添加剂的平均数量,并对统计结果(均值)从大到小排序.
group = data2['additives_n'].groupby(data2['countries_en']).mean().sort_values(ascending=False)
group.head(10).index
Index(['australia,indonesia,united states', 'france,saudi arabia',
'denmark,france,portugal', 'france,greece,netherlands', 'togo', 'qatar',
'denmark,france,switzerland', 'france,luxembourg',
'egypt,united kingdom,united states', 'australia,new zealand'],
dtype='object', name='countries_en')
对排名前10的数据进行简单的可视化展示(例如,使用matplotlib中的柱状图、折线图或者饼图等任意一种绘图方式进行)
plt.figure(figsize=(10, 6))
plt.pie(group.head(10),labels=group.head(10).index, colors=sns.color_palette('hls',10),
shadow=True,startangle=90,autopct='%.2f%%')
plt.axis('equal')
plt.legend(loc='upper right')
本文地址:https://blog.csdn.net/qq_41754907/article/details/107164607