python开发中数据分析

程序员文章站 2022-04-12 20:14:58

python开发中数据分析为什么使用Python Python 使用空白字符和缩进来表示代码分块，而不使用分号和花括号具有大量的标准模块、附加模块以及函数，可以非常方便地完...

python开发中数据分析

为什么使用Python

Python 使用空白字符和缩进来表示代码分块，而不使用分号和花括号具有大量的标准模块、附加模块以及函数，可以非常方便地完成一般的数据处理与分析操作

python常用数据分析模块

xlrd 和xlwt

解析与读写MExcel 工作簿

mysqlclient/MySQL-python/MySQLdb
连接MySQL数据库，在数据库表上运行查询 pandas
读取各种类型的文件；管理、筛选和转换数据；聚合数据并计算基本统计量；创建各种类型的统计图表 statsmodels
估计各种统计模型，包括线性回归模型、广义线性模型和分类模型 scikit-learn
估计机器学习统计模型，包括回归、分类和聚类，以及执行数据处理、维度归约和交叉验证

python代码编辑器

IPython Notebook Pycharm Notepad++ Sublime Text Anaconda Python
预先安装几百个最流行的Python附加模块. 提供Spyder集成开发环境跨平台性

安装Anaconda Python

下载地址:https://www.anaconda.com/download/ 选择Windows 64-bit Python 3.5 Graphical Installer 双击已下载的.exe 文件

按照安装程序的指示操作
Spyder中运行python脚本

#!/usr/bin/env python3
x = 4
y = 5
z = x +y
# 9
print(z)
# 9
print(format(z))
# "{0:d}".format(z)----{}是一个占位符号，表示要传入print语句一个具体的值，这里指变量z;
# 0 指向format() 方法中的第一个参数，在这里，只包含一个参数z，所以0 就指向这个值；相反，如果有多个参数，0 就确定地表示传入第一个参数。
# 冒号（:）用来分隔传入的值和它的格式；d表示被格式化为证书
# Output #2: Four plus five equals 9.
print("Output #2: Four plus five equals {0:d}.".format(z))


a = [1, 2, 3, 4]
b = ["first", "second", "third", "fourth"]
c = a + b

# "{0}, {1}, {2}".format(a, b, c)，它说明了如何在print 语句中包含多个值。a 被传给{0}，b 被传给{1}，c 被传给{2}。因为这3 个值都是列表，不是数值，所以不设置数值格式.
# Output #3: [1, 2, 3, 4], ['first', 'second', 'third', 'fourth'], [1, 2, 3, 4, 'first', 'second', 'third', 'fourth']
print("Output #3: {0}, {1}, {2}".format(a, b, c))

为什么使用.format

Python 并不要求每条print 语句都必须使用.format，但是.format 确实功能强大，可以
为你节省很多输入。在上面的示例中，注意print("Output #3: {0}, {1}, {2}".format(a,
b, c)) 的最终结果是用逗号分隔的3 个变量。如果你想在不使用.format 的情况下得到
同样的结果，那么就应该这样写：print("Output #3: ",a,", ",b,", ",c)，但这是一段非
常容易出现输入错误的代码。后面还会介绍.format 的其他用法，但是从现在开始，你
就应该熟练掌握它的用法，以便在需要的时候加以使用

Python语言基础要素

整数:

x = 9
print("Output #4: {0}".format(x))

# 3的4次方
print("Output #5: {0}".format(3**4))

#将数值转换成整数并进行除法运算
print("Output #6: {0}".format(int(8.3)/int(2.7)))

输出结果
# =============================================================================
# Output #4: 9
# Output #5: 81
# Output #6: 4.0
# =============================================================================

浮点数

# {0：.3f}---保留三位小数
# .format(x)---赋值给占位符{0}
print("Output #7: {0:.3f}".format(8.3/2.7))
y = 2.5*4.8
print("Output #8: {0:.1f}".format(y))
r = 8/float(3)
print("Output #9: {0:.2f}".format(r))
print("Output #10: {0:.4f}".format(8.0/3))

输出结果：
Output #7: 3.074
Output #8: 12.0
Output #9: 2.67
Output #10: 2.6667

type函数:查看数据类型

type(x)

使用math模块中的一些函数

# 脚本开头shebang行的下方添加from math import[function name]
from math import exp,log,sqrt
print("Output #11: {0:.4f}".format(exp(3)))
print("Output #12: {0:.2f}".format(log(4)))
print("Output #13: {0:.1f}".format(sqrt(81)))

输出结果:
Output #11: 20.0855
Output #12: 1.39
Output #13: 9.0

字符串

字符串可以包含在单引号、双引号、3 个单引号或3 个双引号之间

字符串类型
可以阅读的文本：名称、地址看上去是数字的：邮政编码等，数字不可数学运算.

# 单引号中出现单引号，要加转义符\
.format('I\'m enjoying learning Python.'))

# 双引号中换行要加\
.format("a\
        b\
        c\
        d")

# 使用3单引号或3双引号创建多行字符串，不需要加\
.format('''a
        b
        c
        d''')

字符串常用操作符和函数:

string1 = "This is a "
string2 = "short string."
sentence = string1 + string2
#输出: This is a short string.
print("输出: {0:s}".format(sentence))

# *----字符串重复一定的次数
#输出: She is very very very very  beautiful.
print("输出: {0:s} {1:s} {2:s}".format("She is", "very "*4, "beautiful."))

m = len(sentence)
# 输出:23
print("输出:{0:d}".format(m))

# split()分割的使用
string1 = "My deliverable is due in May"
string1_list1 = string1.split()
# 使用空格字符（默认值）对字符串进行拆分
# 拆成子串列表
# 输出: ['My', 'deliverable', 'is', 'due', 'in', 'May']
print("输出: {0}".format(string1_list1))

# 使用前两个空格进行拆分
string1_list2 = string1.split(" ",2)
# 输出: FIRST PIECE:My SECOND PIECE:deliverable THIRD PIECE:is due in May
print("输出: FIRST PIECE:{0} SECOND PIECE:{1} THIRD PIECE:{2}"\
.format(string1_list2[0], string1_list2[1], string1_list2[2]))

string2 = "Your,deliverable,is,due,in,June"
string2_list = string2.split(",")

# 输出: ['Your', 'deliverable', 'is', 'due', 'in', 'June']
print("输出: {0}".format(string2_list))

# 输出: deliverable June June
print("输出: {0} {1} {2}".format(string2_list[1], string2_list[5],\string2_list[-1]))




join 函数将列表中的子字符串组合成一个字符串
将一个参数放在join 前面，表示使用这个字符（或字符串）在子字符串之间进行组合
string1 = "My deliverable is due in May"
string1_list2 = string1.split(" ",2)

# 输出:['My', 'deliverable', 'is due in May']
# 输出:My,deliverable,is due in May
print("输出:{0}".format(string1_list2))
print("输出:{0}".format(",".join(string1_list2)))


使用strip、lstrip 和rstrip 函数从字符串两端删除不想要的字符
string4 = "$$The unwanted characters have been removed.__---++"
#去掉下划线
string4_strip = string4.strip('$_-+')

# Output #31: The unwanted characters have been removed.
print("Output #31: {0:s}".format(string4_strip))


replace 函数将字符串中的一个或一组字符替换为另一个或另一组字符
string5 = "Let's replace the spaces in this sentence with other characters."
# 空格替换逗号
string5.replace(" ", ",")



lower和upper将字符串中的字母转为小写和大写
capitalize首字母变为大写

string5 = "here's WHAT Happens WHEN you use Capitalize."
string5_list = string5.split()
print("每个单词大写:")
for word in string5_list:
    print("{0:s}".format(word.capitalize()))

输出:
每个单词大写:
Here's
What
Happens
When
You
Use
Capitalize.

正则匹配

re模块

使用时先导入:import re

#!/usr/bin/env python3
import re
string = "The quick brown fox jumps over the lazy dog."
# 分割
string_list = string.split()
# 创建pattern正则表达式，re.compile()提高运行速度；re.I 函数确保模式是不区分大小写；r确保不处理转义字符，比如\、\t 或\n
pattern = re.compile(r"The", re.I)
count = 0
for word in string_list:
    # 将列表中的每个单词与正则表达式进行比较；pattern。search()匹配的结果返回true/false
    if pattern.search(word):
        count += 1
print("Output #38: {0:d}".format(count))


# 正则表达式比较长时，适用
string_to_find = r"The"
pattern = re.compile(string_to_find, re.I)

pattern = re.compile(r"The", re.I)


import re
string = "The quick brown fox jumps over the lazy dog."
string_to_find = r"The"
pattern = re.compile(string_to_find, re.I)
# 在string中找到the并用a替换
print("Output #40: {:s}".format(pattern.sub("a", string)))

日期

from datetime import date, time, datetime, timedelta
# 只含年月日
today = date.today()
# {0！s}!s表示将值转化为字符串，尽管是数值型数据
print("Output #41: today: {0!s}".format(today))
print("Output #42: {0!s}".format(today.year))
print("Output #43: {0!s}".format(today.month))
print("Output #44: {0!s}".format(today.day))

# 包括时分秒
current_datetime = datetime.today()
print("Output #45: {0!s}".format(current_datetime))

# 输出结果
Output #41: today: 2018-02-26
Output #42: 2018
Output #43: 2
Output #44: 26
Output #45: 2018-02-26 20:43:23.966000

列表

# 使用方括号创建一个列表
# 用len()计算列表中元素的数量
# 用max()和min()找出最大值和最小值
# 用count()计算出列表中某个值出现的次数
a_list = [1, 2, 3]
print("Output #58: {}".format(a_list))
print("Output #59: a_list has {} elements.".format(len(a_list)))
print("Output #60: the maximum value in a_list is {}.".format(max(a_list)))
print("Output #61: the minimum value in a_list is {}.".format(min(a_list)))
another_list = ['printer', 5, ['star', 'circle', 9]]
print("Output #62: {}".format(another_list))
print("Output #63: another_list also has {} elements.".format\
(len(another_list)))
print("Output #64: 5 is in another_list {} time.".format(another_list.count(5)))

# 输出
Output #58: [1, 2, 3]
Output #59: a_list has 3 elements.
Output #60: the maximum value in a_list is 3.
Output #61: the minimum value in a_list is 1.
Output #62: ['printer', 5, ['star', 'circle', 9]]
Output #63: another_list also has 3 elements.
Output #64: 5 is in another_list 1 time.



# 使用索引值访问列表中的特定元素
# [0]是第1个元素，[-1]是最后一个元素
a_list = [1, 2, 3]
another_list = ['printer', 5, ['star', 'circle', 9]]
print("Output #65: {}".format(a_list[0]))
print("Output #66: {}".format(a_list[1]))
print("Output #67: {}".format(a_list[2]))
print("Output #68: {}".format(a_list[-1]))
print("Output #69: {}".format(a_list[-2]))
print("Output #70: {}".format(a_list[-3]))
print("Output #71: {}".format(another_list[2]))
print("Output #72: {}".format(another_list[-1]))

# 输出
Output #65: 1
Output #66: 2
Output #67: 3
Output #68: 3
Output #69: 2
Output #70: 1
Output #71: ['star', 'circle', 9]
Output #72: ['star', 'circle', 9]



# 使用列表切片访问列表元素的一个子集
# 从开头开始切片，可以省略第1个索引值
# 一直切片到末尾，可以省略第2个索引值
a_list = [1, 2, 3]
another_list = ['printer', 5, ['star', 'circle', 9]]
print("Output #73: {}".format(a_list[0:2]))
print("Output #74: {}".format(another_list[:2]))
print("Output #75: {}".format(a_list[1:3]))
print("Output #76: {}".format(another_list[1:]))

# 输出
Output #73: [1, 2]
Output #74: ['printer', 5]
Output #75: [2, 3]
Output #76: [5, ['star', 'circle', 9]]


# 使用[:]复制一个列表
a_new_list = a_list[:]
# a_new_list 是a_list 的一个完美复制，你可以对a_new_list 添加或删除、排序，而不会影响a_list
print("Output #77: {}".format(a_new_list))

#输出
Output #77: [1, 2, 3]



a_list = [1, 2, 3]
another_list = ['printer', 5, ['star', 'circle', 9]]
# 使用+将两个或更多个列表连接起来
a_longer_list = a_list + another_list
print("Output #78: {}".format(a_longer_list))

#输出
Output #78: [1, 2, 3, 'printer', 5, ['star', 'circle', 9]]



a_list = [1, 2, 3]
another_list = ['printer', 5, ['star', 'circle', 9]]
# 使用in和not in来检查列表中是否有特定元素
a = 2 in a_list
print("Output #79: {}".format(a))
if 2 in a_list:
    print("Output #80: 2 is in {}.".format(a_list))
b = 6 not in a_list
print("Output #81: {}".format(b))
if 6 not in a_list:
    print("Output #82: 6 is not in {}.".format(a_list))

# 输出
Output #79: True
Output #80: 2 is in [1, 2, 3].
Output #81: True
Output #82: 6 is not in [1, 2, 3].



a_list = [1, 2, 3]
another_list = ['printer', 5, ['star', 'circle', 9]]
# 使用append()向列表末尾追加一个新元素
# 使用remove()从列表中删除一个特定元素
# 使用pop()从列表末尾删除一个元素
a_list.append(4)
a_list.append(5)
a_list.append(6)
print("Output #83: {}".format(a_list))
a_list.remove(5)
print("Output #84: {}".format(a_list))
a_list.pop()
a_list.pop()
print("Output #85: {}".format(a_list))

# 输出
Output #83: [1, 2, 3, 4, 5, 6]
Output #84: [1, 2, 3, 4, 6]
Output #85: [1, 2, 3]



a_list = [1, 2, 3]
another_list = ['printer', 5, ['star', 'circle', 9]]
# 使用reverse()原地反转一个列表会修改原列表
# 要想反转列表同时又不修改原列表，可以先复制列表
a_list.reverse()
print("Output #86: {}".format(a_list))
a_list.reverse()
print("Output #87: {}".format(a_list))

#输出
Output #86: [3, 2, 1]
Output #87: [1, 2, 3]




# 使用sort()对列表进行原地排序会修改原列表
# 要想对列表进行排序同时又不修改原列表，可以先复制列表
unordered_list = [3, 5, 1, 7, 2, 8, 4, 9, 0, 6]
print("Output #88: {}".format(unordered_list))
list_copy = unordered_list[:]
list_copy.sort()
print("Output #89: {}".format(list_copy))
print("Output #90: {}".format(unordered_list))

# 输出
Output #88: [3, 5, 1, 7, 2, 8, 4, 9, 0, 6]
Output #89: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Output #90: [3, 5, 1, 7, 2, 8, 4, 9, 0, 6]

上一篇： PostgreSQL数据库管理工作中,定期vacuum的效果和方式讲解

下一篇： eclipse或myeclipse用svn提交的时候报错如何解决？

python开发中数据分析