Numpy知识点
程序员文章站
2024-01-18 15:11:01
最近在学习python数据分析的书籍《利用python进行数据分析》,以下是第四章总结的一些知识点 1.ndarray ndarray是一个N维数组对象。 创建ndarray: In [5]: data = [[1,2,3],[4,5,6]] In [6]: arr = numpy.array(da ......
最近在学习python数据分析的书籍《利用python进行数据分析》,以下是第四章总结的一些知识点
1.ndarray
ndarray是一个n维数组对象。
创建ndarray:
in [5]: data = [[1,2,3],[4,5,6]] in [6]: arr = numpy.array(data, dtype=numpy.int32) in [7]: arr out[7]: array([[1, 2, 3], [4, 5, 6]])
查看数组各维度大小:
in [9]: arr.shape out[9]: (2, 3)
查看数组数据类型:
in [10]: arr.dtype out[10]: dtype('int32')
其他创建方法:
in [11]: numpy.zeros((3,6)) # 创建一个维度大小(3,6)的数组,长度全0 out[11]: array([[0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.]])
arange类似于python内置的range:
in [12]: numpy.arange(15) out[12]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
转化type:
in [15]: farr = arr.astype(numpy.float64) in [16]: farr.dtype out[16]: dtype('float64')
ps:如果将浮点数转化为整数,那么小数部分将被截断
数组的切片是原始数组的视图,而不是数据被复制,所以修改切片会反应到原始数组上去:
in [2]: arr = numpy.arange(10) in [3]: arr_slice = arr[5:8] in [4]: arr_slice[0] = 123456 in [5]: arr out[5]: array([ 0, 1, 2, 3, 4, 123456, 6, 7, 8, 9])
ps:这样做是因为当数量大量数据时,频繁的复制会导致性能降低
想要得到切片副本而非视图可以使用copy:
in [7]: arr2 = arr[5:8].copy()
数组和值都可以赋值给ndarray:
in [13]: data = [[[1,2,3],[4,5,6]],[[4,5,6],[7,8,9]]] in [14]: arr = numpy.array(data) in [15]: arr2 = arr[0].copy() in [16]: arr[0] = 123 in [17]: arr out[17]: array([[[123, 123, 123], [123, 123, 123]], [[ 4, 5, 6], [ 7, 8, 9]]]) in [18]: arr[0] = arr2 in [19]: arr out[19]: array([[[1, 2, 3], [4, 5, 6]], [[4, 5, 6], [7, 8, 9]]])
布尔型的数组索引和切片可以一起使用
in [1]: arrr[name=="liu", :2]
按顺序选区行子集,只需要索引一个列表或ndarray:
in [9]: arr out[9]: array([[0., 0., 0., 0.], [1., 1., 1., 1.], [2., 2., 2., 2.], [3., 3., 3., 3.], [4., 4., 4., 4.], [5., 5., 5., 5.], [6., 6., 6., 6.], [7., 7., 7., 7.]]) in [10]: arr[[4,3,0,6]] out[10]: array([[4., 4., 4., 4.], [3., 3., 3., 3.], [0., 0., 0., 0.], [6., 6., 6., 6.]])
将一维数组展开成二维数组:
in [11]: arr = numpy.arange(32).reshape((8,4)) in [12]: arr out[12]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]])
花式索引:
in [13]: arr[numpy.ix_([1,5,7,2],[0,3,1,2])] out[13]: array([[ 4, 7, 5, 6], [20, 23, 21, 22], [28, 31, 29, 30], [ 8, 11, 9, 10]])
ps:花式索引是将数据复制到新数组中
数据转置(transpose):
in [14]: arr out[14]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]]) in [15]: arr.t out[15]: array([[ 0, 4, 8, 12, 16, 20, 24, 28], [ 1, 5, 9, 13, 17, 21, 25, 29], [ 2, 6, 10, 14, 18, 22, 26, 30], [ 3, 7, 11, 15, 19, 23, 27, 31]])
对于高维数组,需要设置编号才能转置:
in [16]: arr = numpy.arange(16).reshape((2,2,4)) in [17]: arr out[17]: array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7]], [[ 8, 9, 10, 11], [12, 13, 14, 15]]]) in [18]: arr.transpose((1,0,2)) out[18]: array([[[ 0, 1, 2, 3], [ 8, 9, 10, 11]], [[ 4, 5, 6, 7], [12, 13, 14, 15]]])
2.利用数组进行数据处理
in [2]: point = numpy.arange(-5,5,0.01) in [3]: xs, ys = numpy.meshgrid(point, point) in [4]: ys out[4]: array([[-5. , -5. , -5. , ..., -5. , -5. , -5. ], [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99], [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98], ..., [ 4.97, 4.97, 4.97, ..., 4.97, 4.97, 4.97], [ 4.98, 4.98, 4.98, ..., 4.98, 4.98, 4.98], [ 4.99, 4.99, 4.99, ..., 4.99, 4.99, 4.99]]) in [6]: import matplotlib.pyplot as plt in [7]: z = numpy.sqrt(xs**2+ ys**2) in [8]: z out[8]: array([[7.07106781, 7.06400028, 7.05693985, ..., 7.04988652, 7.05693985,7.06400028], [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,7.05692568], [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,7.04985815], ..., [7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 , 7.03571603,7.04279774], [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,7.04985815], [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,7.05692568]]) in [9]: plt.imshow(z,cmap=plt.cm.gray);plt.colorbar()
3.将条件逻辑表述为数组运算
in [9]: xarr = numpy.array([1.1,1.2,1.3,1.4,1.5]) in [10]: yarr=numpy.array([2.1,2.2,2.3,2.4,2.5]) in [11]: cond =numpy.array([true,false,true,true,false]) in [12]: numpy.where(cond,xarr,yarr) out[12]: array([1.1, 2.2, 1.3, 1.4, 2.5])
第二/三个参数不一定要传数组
in [9]: numpy.where(arr>0,2,-2)