Python识别身份证号码并检查是否合法(pysseract,dlib,opencv)

程序员文章站 2022-06-12 20:00:06

...

接下来我将讲述如何利用pysseract、dlib以及OpenCV识别身份证号码并检查其是否合法（包括库文件的安装等）。

我参照了以下两篇博客的内容（或者说是这两篇博客的整理以及整合）：

https://blog.csdn.net/qq_37674858/article/details/80497563

https://blog.csdn.net/nzjdsds/article/details/81775981

1.安装相关的库文件

身份证号码识别需要用到的库文件有：

import pytesseract
import cv2
import matplotlib.pyplot as plt
import dlib
import matplotlib.patches as mpatches
from skimage import io,draw,transform,color
import numpy as np
import pandas as pd
import re

建议大家下载anaconda来安装相关的包，这样比较简单，但是anaconda只能安装其中一部分的包(numpy,pandas,matplotlib,cv2)，剩下的几个包利用anaconda安装的时候会报错，这时候就要使用其他方法下载。（re库自带）

这里提醒一下大家，CV2就是OpenCV，大家想用CV2库的话直接下载 OpenCV即可；skimage的全名是scikit-image，如果直接搜索skimage是搜不到的，需要搜索scikit-image。

anaconda安装pytesseract、dlib、scikit-image这三个库时，可能会报如下错误：

TypeError: sequence item 0: expected str instance, bytes found

出现这个错误后，我上网搜了很多方法，都下载不下来，最后利用windows自带的pip安装，才下载下来。

pip安装的命令：

pip install pytesseract
pip install dlib
pip install scikit-image

其中，pytesseract安装后并不能直接使用，在windows环境下进行识别时，会报如下错误：pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path

意思就是找不到Tesseract。

出现这个错误的原因主要由两点，一个是没有下载相应的库(Tesseract-OCR)，另一个就是pytesseract调用tesseract的路径不对。

如果没有下载Tesseract-OCR，这里给大家提供下载：

链接：https://pan.baidu.com/s/1C1DTjcunff6zRrExdInjWw
提取码：tv9y

如果是调用路径不对，大家可以参考如下的博客修改相关路径：

https://blog.csdn.net/qq_36853469/article/details/91572797

至此，所有的库文件已经下载完毕，前期的准备工已经完成了。

代码的详细讲解，大家可以参照下面这篇博客，我这里只贴完整的代码（可以直接运行）：

https://blog.csdn.net/qq_37674858/article/details/80497563

如果缺少相关的数据库(shape_predictor_5_face_landmarks.dat以及shape_predictor_68_face_landmarks.dat)，大家可以去我的另一篇博客下载相关资源（百度网盘资源）：

https://blog.csdn.net/Viadimir/article/details/105035660

识别身份证号码的完整代码如下：

# -*- coding: UTF-8 -*- 

import pytesseract
import cv2
import matplotlib.pyplot as plt
import dlib
import matplotlib.patches as mpatches
from skimage import io,draw,transform,color
import numpy as np
import pandas as pd
import re

#计算图像的身份证倾斜的角度
def IDcorner(landmarks):
    """
    landmarks：检测的人脸五个特征点
    经过测试使用第0个和第2个特征点计算角度比较合适
    """
    corner20 =  twopointcor(landmarks[2,:],landmarks[0,:])
    corner = np.mean([corner20])
    return corner

#计算眼睛的倾斜角度，逆时针角度
def twopointcor(point1,point2):
    """point1 = (x1,y1),point2 = (x2,y2)"""
    deltxy = point2 - point1
    corner = np.arctan(deltxy[1] / deltxy[0]) * 180 / np.pi
    return corner

def rotateIdcard(image):
    "image :需要处理的图像"
    ## 使用dlib.get_frontal_face_detector识别人脸
    detector = dlib.get_frontal_face_detector()
    #使用detector进行人脸检测 dets为返回的结果
    dets = detector(image, 2) 
    ## 检测人脸的眼睛所在位置
    predictor = dlib.shape_predictor("shape_predictor_5_face_landmarks.dat")
    detected_landmarks = predictor(image, dets[0]).parts()
    landmarks = np.array([[p.x, p.y] for p in detected_landmarks])
    corner = IDcorner(landmarks)
    ## 旋转后的图像
    image2 = transform.rotate(image,corner,clip=False)
    image2 = np.uint8(image2*255)
    ## 旋转后人脸位置
    det = detector(image2, 2)
    return image2,det

detector = dlib.get_frontal_face_detector()
image = io.imread("idcard.jpg")
#使用detector进行人脸检测dets为返回结果
dets = detector(image, 2)
for i, face in enumerate(dets):
    #在图片中标注人脸并显示
    left = face.left()
    top = face.top()
    right = face.right()
    bottom = face.bottom()
    rect = mpatches.Rectangle((left,bottom), right - left, top - bottom,
                                  fill=False, edgecolor='red', linewidth=1)

image = io.imread("idcard.jpg")
image2,dets = rotateIdcard(image)

# 在图片中标注人脸，并显示
left = dets[0].left()
top = dets[0].top()
right = dets[0].right()
bottom = dets[0].bottom()
rect = mpatches.Rectangle((left,bottom), (right - left), (top - bottom),
                          fill=False, edgecolor='red', linewidth=1)


## 照片的位置（不怎么精确）
width = right - left
high = top - bottom
left2 = np.uint(left - 0.3*width)
bottom2 = np.uint(bottom + 0.4*width)
rect = mpatches.Rectangle((left2,bottom2), 1.6*width, 1.8*high,
                          fill=False, edgecolor='blue', linewidth=1)

## 身份证上人的照片
top2 = np.uint(bottom2+1.8*high)
right2 = np.uint(left2+1.6*width)
image3 = image2[top2:bottom2,left2:right2,:]

## 对图像进行处理，转化为灰度图像=>二值图像
imagegray = cv2.cvtColor(image2,cv2.COLOR_RGB2GRAY)
retval, imagebin = cv2.threshold(imagegray, 120, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY)
## 将照片去除
imagebin[0:bottom2,left2:-1] = 255

#利用pytesseract识别照片上的身份证号码
text = pytesseract.image_to_string(imagebin,lang='chi_sim')
textlist = text.split("\n")
textdf = pd.DataFrame({"text":textlist})
textdf["textlen"] = textdf.text.apply(len)
textdf = textdf[textdf.textlen > 1].reset_index(drop = True)
#打印身份证号码
print(textdf.text.values[-1])

其实上面那片博客是有完整代码的，但是那个博主写的完整代码很不符合Python编码的规范，简直可以用乱七八糟来形容，所以我这里给他整理了一下，删掉了冗余的内容，并更改了部分代码的位置。

接下来贴判别身份证号码是否合法的代码。网络上有很多判别身份证是否合法的代码，效果有的好有的坏，我试了很多，下面给大家贴我认为我找到的判别效果最好的代码：

# -*- coding: utf-8 -*-
import re
#Errors=['验证通过!','身份证号码位数不对!','身份证号码出生日期超出范围或含有非法字符!','身份证号码校验错误!','身份证地区非法!']
def checkIdcard(idcard):
    Errors=['验证通过!','身份证号码位数不对!','身份证号码出生日期超出范围或含有非法字符!','身份证号码校验错误!','身份证地区非法!']
    area={"11":"北京","12":"天津","13":"河北","14":"山西","15":"内蒙古","21":"辽宁","22":"吉林","23":"黑龙江","31":"上海","32":"江苏","33":"浙江","34":"安徽","35":"福建","36":"江西","37":"山东","41":"河南","42":"湖北","43":"湖南","44":"广东","45":"广西","46":"海南","50":"重庆","51":"四川","52":"贵州","53":"云南","54":"*","61":"陕西","62":"甘肃","63":"青海","64":"宁夏","65":"*","71":"*","81":"香港","82":"澳门","91":"国外"}
    idcard=str(idcard)
    idcard=idcard.strip()
    idcard_list=list(idcard)
 
    #地区校验
    if(not area[(idcard)[0:2]]):
        print(Errors[4])
    #15位身份号码检测
    if(len(idcard)==15):
        if((int(idcard[6:8])+1900) % 4 == 0 or((int(idcard[6:8])+1900) % 100 == 0 and (int(idcard[6:8])+1900) % 4 == 0 )):
            erg=re.compile('[1-9][0-9]{5}[0-9]{2}((01|03|05|07|08|10|12)(0[1-9]|[1-2][0-9]|3[0-1])|(04|06|09|11)(0[1-9]|[1-2][0-9]|30)|02(0[1-9]|[1-2][0-9]))[0-9]{3}$')#//测试出生日期的合法性
        else:
            ereg=re.compile('[1-9][0-9]{5}[0-9]{2}((01|03|05|07|08|10|12)(0[1-9]|[1-2][0-9]|3[0-1])|(04|06|09|11)(0[1-9]|[1-2][0-9]|30)|02(0[1-9]|1[0-9]|2[0-8]))[0-9]{3}$')#//测试出生日期的合法性
        if(re.match(ereg,idcard)):
            print(Errors[0])
        else:
            print(Errors[2])
    #18位身份号码检测
    elif(len(idcard)==18):
        #出生日期的合法性检查
        #闰年月日:((01|03|05|07|08|10|12)(0[1-9]|[1-2][0-9]|3[0-1])|(04|06|09|11)(0[1-9]|[1-2][0-9]|30)|02(0[1-9]|[1-2][0-9]))
        #平年月日:((01|03|05|07|08|10|12)(0[1-9]|[1-2][0-9]|3[0-1])|(04|06|09|11)(0[1-9]|[1-2][0-9]|30)|02(0[1-9]|1[0-9]|2[0-8]))
        if(int(idcard[6:10]) % 4 == 0 or (int(idcard[6:10]) % 100 == 0 and int(idcard[6:10])%4 == 0 )):
            ereg=re.compile('[1-9][0-9]{5}(19[0-9]{2}|20[0-9]{2})((01|03|05|07|08|10|12)(0[1-9]|[1-2][0-9]|3[0-1])|(04|06|09|11)(0[1-9]|[1-2][0-9]|30)|02(0[1-9]|[1-2][0-9]))[0-9]{3}[0-9Xx]$')#//闰年出生日期的合法性正则表达式
        else:
            ereg=re.compile('[1-9][0-9]{5}(19[0-9]{2}|20[0-9]{2})((01|03|05|07|08|10|12)(0[1-9]|[1-2][0-9]|3[0-1])|(04|06|09|11)(0[1-9]|[1-2][0-9]|30)|02(0[1-9]|1[0-9]|2[0-8]))[0-9]{3}[0-9Xx]$')#//平年出生日期的合法性正则表达式
        #//测试出生日期的合法性
        if(re.match(ereg,idcard)):
            #//计算校验位
            S = (int(idcard_list[0]) + int(idcard_list[10])) * 7 + (int(idcard_list[1]) + int(idcard_list[11])) * 9 + (int(idcard_list[2]) + int(idcard_list[12])) * 10 + (int(idcard_list[3]) + int(idcard_list[13])) * 5 + (int(idcard_list[4]) + int(idcard_list[14])) * 8 + (int(idcard_list[5]) + int(idcard_list[15])) * 4 + (int(idcard_list[6]) + int(idcard_list[16])) * 2 + int(idcard_list[7]) * 1 + int(idcard_list[8]) * 6 + int(idcard_list[9]) * 3
            Y = S % 11
            M = "F"
            JYM = "10X98765432"
            M = JYM[Y]#判断校验位
            if(M == idcard_list[17]):#检测ID的校验位
                print(Errors[0])
            else:
                print(Errors[3])
        else:
            print(Errors[2])
    else:
        print(Errors[1])
 
 
if __name__ == "__main__":
    while True:
        cdcard = input(u"请输入你的身份证号：")
        if cdcard == "exit":
            print ("程序已结束！")
            break
        else:
            checkIdcard(cdcard)

感谢大家的阅读。