KNN手写数字训练识别

程序员文章站 2022-06-22 07:58:48

1.随机数from numpy import *a=random.rand(4,3)生成4*3的数组b=mat(a)将数组转换为矩阵c=b.I对矩阵b求逆print(c*b)[[ 1.00000000e+00 -3.83564254e-16 1.18823551e-16][-4.86974025e-16 1.00000000e+00 -5.37713486e-16][ 5.60557203e-17 1.28799732e-16 1.00000000e+00]]结果可能...

一.前述

1.生成4*3的数组

from numpy import *
a=random.rand(4,3)

print(a.shape[0])

输出为 4 ，也就是4行三列数列的行
2.将数组转换为矩阵

b=mat(a)

3.对矩阵b求逆

c=b.I

4.矩阵相乘

print(c*b)

[[ 1.00000000e+00 -3.83564254e-16 1.18823551e-16]
[-4.86974025e-16 1.00000000e+00 -5.37713486e-16]
[ 5.60557203e-17 1.28799732e-16 1.00000000e+00]]
结果可能类似上面，主对角线为单位1，其余不为零是因为浮点运算精度问题

5.创建3*3的单位矩阵

eye(3)

6.然后我们回到第二步，接着套娃

d=array(b)

将矩阵转换为数组
7.数组重复
tile(A,reps):将数组A复制reps次，具体情况如下：

tile(A, reps)
    Construct an array by repeating A the number of times given by reps.
    
    If `reps` has length ``d``, the result will have dimension of
    ``max(d, A.ndim)``.
    
    If ``A.ndim < d``, `A` is promoted to be d-dimensional by prepending new
    axes. So a shape (3,) array is promoted to (1, 3) for 2-D replication,
    or shape (1, 1, 3) for 3-D replication. If this is not the desired
    behavior, promote `A` to d-dimensions manually before calling this
    function.
    
    If ``A.ndim > d``, `reps` is promoted to `A`.ndim by pre-pending 1's to it.
    Thus for an `A` of shape (2, 3, 4, 5), a `reps` of (2, 2) is treated as
    (1, 1, 2, 2).

举个简单例子：

a=random.rand(3,2)
b=tile(a,(4,5,3,5))

那么b.shape 为（4，5，9，10）也就是说末尾是两者的乘积，由这个乘积关系可以很好的想象理解tile是如何对数组进行复制。
补充一点tile也可对矩阵进行复制操作！
8.数组平方
就是每一个数的平方

from numpy import *
a=array([[1,2],[3,4],[5,6],[7,8]])
print(a)
b=a**2
print(b)

结果：

[[1 2]
 [3 4]
 [5 6]
 [7 8]]
[[ 1  4]
 [ 9 16]
 [25 36]
 [49 64]]

8.数组求和

from numpy import *
a=array([[1,2,3],[3,4,5],[5,6,7],[7,8,9]])
print(a.sum(axis=0))
print(a.sum(axis=1))
print(a.sum(axis=2))#报错：axis 2 is out of bounds for array of dimension 2

也可写成：sum(a,axis=1)
结果：
[16 20 24]
[ 6 12 18 24]
error
9.数组排序
b.argsort()表示对数组b从小到大排序，输出的是数组下标

b=array([18, 24 , 6, 12])
c=b.argsort()
print(c)

结果
[2 3 0 1]
10. 字典的get
dict.get(key, default=None)
参数
key – 字典中要查找的键。
default – 如果指定键的值不存在时，返回该默认值。

c={}
p='A'
c[p]=c.get(p,4)
print(c)
c[p]=c.get(p,4)+2
print(c)
c[p]=c.get(p,4)+1
print(c)

输出：
{‘A’: 4}
{‘A’: 6}
{‘A’: 7}
11.字典排序
sort 是应用在 list 上的方法，sorted 可以对所有可迭代的对象进行排序操作。所以我们用sorted

sorted(iterable, key=None, reverse=False)
参数说明：
iterable – 可迭代对象。
key – 主要是用来进行比较的元素，只有一个参数，具体的函数的参数就是取自于可迭代对象中，指定可迭代对象中的一个元素来进行排序。
reverse – 排序规则，reverse = True 降序， reverse = False 升序（默认）。

import operator
a = {'1':6,'2':3,'3':1,'4':5}
b = sorted(a.items(),key = operator.itemgetter(0), reverse = True)
c = sorted(a.items(),key = operator.itemgetter(1), reverse = True)
print(b)
print(c)

输出
[(‘4’, 5), (‘3’, 1), (‘2’, 3), (‘1’, 6)]
[(‘1’, 6), (‘4’, 5), (‘2’, 3), (‘3’, 1)]
operator.itemgetter(0)、operator.itemgetter(1)分别表示对字典的第一簇元素排序、第二簇元素排序
12.数组提取操作
详见链接
13.字符移除操作
str.strip([chars]);当为str.strip()时，即括号里没有东西时，默认为默认为空格或换行符！！！
该方法只能删除开头或是结尾的字符，不能删除中间部分的字符。
如：

str='aabcd a efgaa'
print(str.strip('a'))

输出：bcd a efg
13.字符切片
str.split(str="", num=string.count(str)).
参数
str – 分隔符，默认为所有的空字符，包括空格、换行(\n)、制表符(\t)等。
num – 分割次数。默认为 -1, 即分隔所有。
如：

l = "111&2222&333&444"
result = l.split("&", 2) 
print(result)

结果：[‘111’, ‘2222’, ‘333&444’]

二.KNN手写数字训练、识别

代码来源于《机器学习实战》
数据：https://cloud.189.cn/t/VfquiqNzAZVn

from numpy import *
import operator
from os import listdir
#KNN分类
def classify0(inX, dataSet, labels, k):
    dataSetSize = dataSet.shape[0]
    diffMat = tile(inX, (dataSetSize,1)) - dataSet
    sqDiffMat = diffMat**2
    sqDistances = sqDiffMat.sum(axis=1)
    distances = sqDistances**0.5
    sortedDistIndicies = distances.argsort()     
    classCount={}          
    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]]
        classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
    sortedClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True)
    return sortedClassCount[0][0]
#把一个32×32的二进制图像矩阵转换为1× 1024的向量
def img2vector(filename):
    returnVect = zeros((1,1024))
    fr = open(filename)
    for i in range(32):
        lineStr = fr.readline()
        for j in range(32):
            returnVect[0,32*i+j] = int(lineStr[j])
    return returnVect
#手写数字训练、测试
def handwritingClassTest():
    hwLabels = []
    trainingFileList = listdir('trainingDigits')           #load the training set
    m = len(trainingFileList)
    trainingMat = zeros((m,1024))
    #第一个for循环训练
    for i in range(m):
        fileNameStr = trainingFileList[i]
        fileStr = fileNameStr.split('.')[0]     #take off .txt
        classNumStr = int(fileStr.split('_')[0])
        hwLabels.append(classNumStr)
        trainingMat[i,:] = img2vector('trainingDigits/%s' % fileNameStr)
    testFileList = listdir('testDigits')        #iterate through the test set
    errorCount = 0.0
    mTest = len(testFileList)
    #第二个for循环测试
    for i in range(mTest):
        fileNameStr = testFileList[i]
        fileStr = fileNameStr.split('.')[0]     #take off .txt
        classNumStr = int(fileStr.split('_')[0])
        vectorUnderTest = img2vector('testDigits/%s' % fileNameStr)
        classifierResult = classify0(vectorUnderTest, trainingMat, hwLabels, 3)
        print ("the classifier came back with: %d, the real answer is: %d,the file is: %s"\
             % (classifierResult, classNumStr,fileNameStr))
        if (classifierResult != classNumStr): errorCount += 1.0
    print ("\nthe total number of errors is: %d" % errorCount)
    print ("\nthe total error rate is: %f" % (errorCount/float(mTest)))
if __name__ == '__main__':
    handwritingClassTest()

参考：https://www.runoob.com/python3/python3-tutorial.html
《机器学习实战》

本文地址：https://blog.csdn.net/wxkhturfun/article/details/107483964

KNN手写数字训练识别

一.前述

二.KNN手写数字训练、识别

PyTorch CNN实战之MNIST手写数字识别示例

kNN算法python实现和简单数字识别的方法

Python(TensorFlow框架)实现手写数字识别系统的方法

python使用KNN算法识别手写数字

手写数字识别（使用tensorflow2.2.0框架）

基于MNIST手写数字数据集的数字识别小程序

机器学习python实战之手写数字识别

PyTorch CNN实战之MNIST手写数字识别示例

kNN算法python实现和简单数字识别的方法

Python实现识别手写数字大纲