欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

KNN手写数字训练识别

程序员文章站 2022-03-11 21:15:31
1.随机数from numpy import *a=random.rand(4,3)生成4*3的数组b=mat(a)将数组转换为矩阵c=b.I对矩阵b求逆print(c*b)[[ 1.00000000e+00 -3.83564254e-16 1.18823551e-16][-4.86974025e-16 1.00000000e+00 -5.37713486e-16][ 5.60557203e-17 1.28799732e-16 1.00000000e+00]]结果可能...

一.前述

1.生成4*3的数组

from numpy import *
a=random.rand(4,3)
print(a.shape[0])

输出为 4 ,也就是4行三列数列的行
2.将数组转换为矩阵

b=mat(a)

3.对矩阵b求逆

c=b.I

4.矩阵相乘

print(c*b)

[[ 1.00000000e+00 -3.83564254e-16 1.18823551e-16]
[-4.86974025e-16 1.00000000e+00 -5.37713486e-16]
[ 5.60557203e-17 1.28799732e-16 1.00000000e+00]]
结果可能类似上面,主对角线为单位1,其余不为零是因为浮点运算精度问题

5.创建3*3的单位矩阵

eye(3)

6.然后我们回到第二步,接着套娃

d=array(b)

将矩阵转换为数组
7.数组重复
tile(A,reps):将数组A复制reps次,具体情况如下:

tile(A, reps)
    Construct an array by repeating A the number of times given by reps.
    
    If `reps` has length ``d``, the result will have dimension of
    ``max(d, A.ndim)``.
    
    If ``A.ndim < d``, `A` is promoted to be d-dimensional by prepending new
    axes. So a shape (3,) array is promoted to (1, 3) for 2-D replication,
    or shape (1, 1, 3) for 3-D replication. If this is not the desired
    behavior, promote `A` to d-dimensions manually before calling this
    function.
    
    If ``A.ndim > d``, `reps` is promoted to `A`.ndim by pre-pending 1's to it.
    Thus for an `A` of shape (2, 3, 4, 5), a `reps` of (2, 2) is treated as
    (1, 1, 2, 2).

举个简单例子:

a=random.rand(3,2)
b=tile(a,(4,5,3,5))

那么b.shape 为(4,5,9,10)也就是说末尾是两者的乘积,由这个乘积关系可以很好的想象理解tile是如何对数组进行复制。
补充一点tile也可对矩阵进行复制操作!
8.数组平方
就是每一个数的平方

from numpy import *
a=array([[1,2],[3,4],[5,6],[7,8]])
print(a)
b=a**2
print(b)

结果:

[[1 2]
 [3 4]
 [5 6]
 [7 8]]
[[ 1  4]
 [ 9 16]
 [25 36]
 [49 64]]

8.数组求和

from numpy import *
a=array([[1,2,3],[3,4,5],[5,6,7],[7,8,9]])
print(a.sum(axis=0))
print(a.sum(axis=1))
print(a.sum(axis=2))#报错:axis 2 is out of bounds for array of dimension 2

也可写成:sum(a,axis=1)
结果:
[16 20 24]
[ 6 12 18 24]
error
9.数组排序
b.argsort()表示对数组b从小到大排序,输出的是数组下标

b=array([18, 24 , 6, 12])
c=b.argsort()
print(c)

结果
[2 3 0 1]
10. 字典的get
dict.get(key, default=None)
参数
key – 字典中要查找的键。
default – 如果指定键的值不存在时,返回该默认值。

c={}
p='A'
c[p]=c.get(p,4)
print(c)
c[p]=c.get(p,4)+2
print(c)
c[p]=c.get(p,4)+1
print(c)

输出:
{‘A’: 4}
{‘A’: 6}
{‘A’: 7}
11.字典排序
sort 是应用在 list 上的方法,sorted 可以对所有可迭代的对象进行排序操作。所以我们用sorted

sorted(iterable, key=None, reverse=False)
参数说明:
iterable – 可迭代对象。
key – 主要是用来进行比较的元素,只有一个参数,具体的函数的参数就是 取自于可迭代对象中,指定可迭代对象中的一个元素来进行排序。
reverse – 排序规则,reverse = True 降序 , reverse = False 升序(默 认)。

import operator
a = {'1':6,'2':3,'3':1,'4':5}
b = sorted(a.items(),key = operator.itemgetter(0), reverse = True)
c = sorted(a.items(),key = operator.itemgetter(1), reverse = True)
print(b)
print(c)

输出
[(‘4’, 5), (‘3’, 1), (‘2’, 3), (‘1’, 6)]
[(‘1’, 6), (‘4’, 5), (‘2’, 3), (‘3’, 1)]
operator.itemgetter(0)、operator.itemgetter(1)分别表示对字典的第一簇元素排序、第二簇元素排序
12.数组提取操作
详见链接
13.字符移除操作
str.strip([chars]);当为str.strip()时,即括号里没有东西时,默认为默认为空格或换行符!!!
该方法只能删除开头或是结尾的字符,不能删除中间部分的字符。
如:

str='aabcd a efgaa'
print(str.strip('a'))

输出 :bcd a efg
13.字符切片
str.split(str="", num=string.count(str)).
参数
str – 分隔符,默认为所有的空字符,包括空格、换行(\n)、制表符(\t)等。
num – 分割次数。默认为 -1, 即分隔所有。
如:

l = "111&2222&333&444"
result = l.split("&", 2) 
print(result)

结果 :[‘111’, ‘2222’, ‘333&444’]

二.KNN手写数字训练、识别

代码来源于《机器学习实战》
数据:https://cloud.189.cn/t/VfquiqNzAZVn

from numpy import *
import operator
from os import listdir
#KNN分类
def classify0(inX, dataSet, labels, k):
    dataSetSize = dataSet.shape[0]
    diffMat = tile(inX, (dataSetSize,1)) - dataSet
    sqDiffMat = diffMat**2
    sqDistances = sqDiffMat.sum(axis=1)
    distances = sqDistances**0.5
    sortedDistIndicies = distances.argsort()     
    classCount={}          
    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]]
        classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
    sortedClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True)
    return sortedClassCount[0][0]
#把一个32×32的二进制图像矩阵转换为1× 1024的向量
def img2vector(filename):
    returnVect = zeros((1,1024))
    fr = open(filename)
    for i in range(32):
        lineStr = fr.readline()
        for j in range(32):
            returnVect[0,32*i+j] = int(lineStr[j])
    return returnVect
#手写数字训练、测试
def handwritingClassTest():
    hwLabels = []
    trainingFileList = listdir('trainingDigits')           #load the training set
    m = len(trainingFileList)
    trainingMat = zeros((m,1024))
    #第一个for循环训练
    for i in range(m):
        fileNameStr = trainingFileList[i]
        fileStr = fileNameStr.split('.')[0]     #take off .txt
        classNumStr = int(fileStr.split('_')[0])
        hwLabels.append(classNumStr)
        trainingMat[i,:] = img2vector('trainingDigits/%s' % fileNameStr)
    testFileList = listdir('testDigits')        #iterate through the test set
    errorCount = 0.0
    mTest = len(testFileList)
    #第二个for循环测试
    for i in range(mTest):
        fileNameStr = testFileList[i]
        fileStr = fileNameStr.split('.')[0]     #take off .txt
        classNumStr = int(fileStr.split('_')[0])
        vectorUnderTest = img2vector('testDigits/%s' % fileNameStr)
        classifierResult = classify0(vectorUnderTest, trainingMat, hwLabels, 3)
        print ("the classifier came back with: %d, the real answer is: %d,the file is: %s"\
             % (classifierResult, classNumStr,fileNameStr))
        if (classifierResult != classNumStr): errorCount += 1.0
    print ("\nthe total number of errors is: %d" % errorCount)
    print ("\nthe total error rate is: %f" % (errorCount/float(mTest)))
if __name__ == '__main__':
    handwritingClassTest()

参考:https://www.runoob.com/python3/python3-tutorial.html
《机器学习实战》

本文地址:https://blog.csdn.net/wxkhturfun/article/details/107483964

相关标签: machineLearning