KNN手写数字训练识别
一.前述
1.生成4*3的数组
from numpy import *
a=random.rand(4,3)
print(a.shape[0])
输出为 4 ,也就是4行三列数列的行
2.将数组转换为矩阵
b=mat(a)
3.对矩阵b求逆
c=b.I
4.矩阵相乘
print(c*b)
[[ 1.00000000e+00 -3.83564254e-16 1.18823551e-16]
[-4.86974025e-16 1.00000000e+00 -5.37713486e-16]
[ 5.60557203e-17 1.28799732e-16 1.00000000e+00]]
结果可能类似上面,主对角线为单位1,其余不为零是因为浮点运算精度问题
5.创建3*3的单位矩阵
eye(3)
6.然后我们回到第二步,接着套娃
d=array(b)
将矩阵转换为数组
7.数组重复
tile(A,reps):将数组A复制reps次,具体情况如下:
tile(A, reps)
Construct an array by repeating A the number of times given by reps.
If `reps` has length ``d``, the result will have dimension of
``max(d, A.ndim)``.
If ``A.ndim < d``, `A` is promoted to be d-dimensional by prepending new
axes. So a shape (3,) array is promoted to (1, 3) for 2-D replication,
or shape (1, 1, 3) for 3-D replication. If this is not the desired
behavior, promote `A` to d-dimensions manually before calling this
function.
If ``A.ndim > d``, `reps` is promoted to `A`.ndim by pre-pending 1's to it.
Thus for an `A` of shape (2, 3, 4, 5), a `reps` of (2, 2) is treated as
(1, 1, 2, 2).
举个简单例子:
a=random.rand(3,2)
b=tile(a,(4,5,3,5))
那么b.shape 为(4,5,9,10)也就是说末尾是两者的乘积,由这个乘积关系可以很好的想象理解tile是如何对数组进行复制。
补充一点tile也可对矩阵进行复制操作!
8.数组平方
就是每一个数的平方
from numpy import *
a=array([[1,2],[3,4],[5,6],[7,8]])
print(a)
b=a**2
print(b)
结果:
[[1 2]
[3 4]
[5 6]
[7 8]]
[[ 1 4]
[ 9 16]
[25 36]
[49 64]]
8.数组求和
from numpy import *
a=array([[1,2,3],[3,4,5],[5,6,7],[7,8,9]])
print(a.sum(axis=0))
print(a.sum(axis=1))
print(a.sum(axis=2))#报错:axis 2 is out of bounds for array of dimension 2
也可写成:sum(a,axis=1)
结果:
[16 20 24]
[ 6 12 18 24]
error
9.数组排序
b.argsort()表示对数组b从小到大排序,输出的是数组下标
b=array([18, 24 , 6, 12])
c=b.argsort()
print(c)
结果
[2 3 0 1]
10. 字典的get
dict.get(key, default=None)
参数
key – 字典中要查找的键。
default – 如果指定键的值不存在时,返回该默认值。
c={}
p='A'
c[p]=c.get(p,4)
print(c)
c[p]=c.get(p,4)+2
print(c)
c[p]=c.get(p,4)+1
print(c)
输出:
{‘A’: 4}
{‘A’: 6}
{‘A’: 7}
11.字典排序
sort 是应用在 list 上的方法,sorted 可以对所有可迭代的对象进行排序操作。所以我们用sorted
sorted(iterable, key=None, reverse=False)
参数说明:
iterable – 可迭代对象。
key – 主要是用来进行比较的元素,只有一个参数,具体的函数的参数就是 取自于可迭代对象中,指定可迭代对象中的一个元素来进行排序。
reverse – 排序规则,reverse = True 降序 , reverse = False 升序(默 认)。
import operator
a = {'1':6,'2':3,'3':1,'4':5}
b = sorted(a.items(),key = operator.itemgetter(0), reverse = True)
c = sorted(a.items(),key = operator.itemgetter(1), reverse = True)
print(b)
print(c)
输出
[(‘4’, 5), (‘3’, 1), (‘2’, 3), (‘1’, 6)]
[(‘1’, 6), (‘4’, 5), (‘2’, 3), (‘3’, 1)]
operator.itemgetter(0)、operator.itemgetter(1)分别表示对字典的第一簇元素排序、第二簇元素排序
12.数组提取操作
详见链接
13.字符移除操作
str.strip([chars]);当为str.strip()时,即括号里没有东西时,默认为默认为空格或换行符!!!
该方法只能删除开头或是结尾的字符,不能删除中间部分的字符。
如:
str='aabcd a efgaa'
print(str.strip('a'))
输出 :bcd a efg
13.字符切片
str.split(str="", num=string.count(str)).
参数
str – 分隔符,默认为所有的空字符,包括空格、换行(\n)、制表符(\t)等。
num – 分割次数。默认为 -1, 即分隔所有。
如:
l = "111&2222&333&444"
result = l.split("&", 2)
print(result)
结果 :[‘111’, ‘2222’, ‘333&444’]
二.KNN手写数字训练、识别
代码来源于《机器学习实战》
数据:https://cloud.189.cn/t/VfquiqNzAZVn
from numpy import *
import operator
from os import listdir
#KNN分类
def classify0(inX, dataSet, labels, k):
dataSetSize = dataSet.shape[0]
diffMat = tile(inX, (dataSetSize,1)) - dataSet
sqDiffMat = diffMat**2
sqDistances = sqDiffMat.sum(axis=1)
distances = sqDistances**0.5
sortedDistIndicies = distances.argsort()
classCount={}
for i in range(k):
voteIlabel = labels[sortedDistIndicies[i]]
classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
sortedClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True)
return sortedClassCount[0][0]
#把一个32×32的二进制图像矩阵转换为1× 1024的向量
def img2vector(filename):
returnVect = zeros((1,1024))
fr = open(filename)
for i in range(32):
lineStr = fr.readline()
for j in range(32):
returnVect[0,32*i+j] = int(lineStr[j])
return returnVect
#手写数字训练、测试
def handwritingClassTest():
hwLabels = []
trainingFileList = listdir('trainingDigits') #load the training set
m = len(trainingFileList)
trainingMat = zeros((m,1024))
#第一个for循环训练
for i in range(m):
fileNameStr = trainingFileList[i]
fileStr = fileNameStr.split('.')[0] #take off .txt
classNumStr = int(fileStr.split('_')[0])
hwLabels.append(classNumStr)
trainingMat[i,:] = img2vector('trainingDigits/%s' % fileNameStr)
testFileList = listdir('testDigits') #iterate through the test set
errorCount = 0.0
mTest = len(testFileList)
#第二个for循环测试
for i in range(mTest):
fileNameStr = testFileList[i]
fileStr = fileNameStr.split('.')[0] #take off .txt
classNumStr = int(fileStr.split('_')[0])
vectorUnderTest = img2vector('testDigits/%s' % fileNameStr)
classifierResult = classify0(vectorUnderTest, trainingMat, hwLabels, 3)
print ("the classifier came back with: %d, the real answer is: %d,the file is: %s"\
% (classifierResult, classNumStr,fileNameStr))
if (classifierResult != classNumStr): errorCount += 1.0
print ("\nthe total number of errors is: %d" % errorCount)
print ("\nthe total error rate is: %f" % (errorCount/float(mTest)))
if __name__ == '__main__':
handwritingClassTest()
参考:https://www.runoob.com/python3/python3-tutorial.html
《机器学习实战》
本文地址:https://blog.csdn.net/wxkhturfun/article/details/107483964