【Python 2.7】火车票查询

程序员文章站 2022-07-08 11:10:07

...

『一』初衷

春节临近，为了能够更加快速进行查询（其实还是懒得在12306上一次次重复点点点）自己想要的火车票的余票信息，就萌生出了想要写一个爬取火车票的小程序的想法，正好公司主要使用的Python 2.7，自己刚接触Python一个多月正需要验证一下学习情况，就自己动手写了这个火车票查询程序。

『二』准备工作

1、我需要完成的是从12306上抓取数据，因此需要使用requests包的get()方法；

2、对抓取到的数据进行解析，因为获取到的数据均为json格式数据，则需要使用json包；

3、我想要在数据获取之后以一个比较美观的形式展示出来，因此使用了prettytable模块用来展示。

4、通过查看12306上查询余票信息时调用的接口信息，如下：

https://kyfw.12306.cn/otn/leftTicket/queryZ?leftTicketDTO.train_date=2018-02-13&leftTicketDTO.from_station=XAY&leftTicketDTO.to_station=BJP&purpose_codes=ADULT

注意其中标红的字段，train_date是乘车日期，from_station为出发站，to_station为到达站，这三个参数是必需的，所以我先从这三个参数入手。

首先，观察到出发站和到达站不是该站点的拼音,而是不同的Code，所以我们要先获取各个车站站点的Code并生成一个字典，这样在之后进行查询时也能对输入数据的合法性进行判断。

新建一个get_station.py文件，编写：

#-*-  coding: utf-8  -*-

import re
import requests
url = 'https://kyfw.12306.cn/otn/resources/js/framework/station_name.js?station_version=1.9042'#获取站点拼音和代码对应的网址

station_html = requests.get(url)
stationtxt = station_html.text
stationtxt = stationtxt.encode('unicode_escape')
stationtxt = stationtxt.encode('utf-8')
list = stationtxt.split('@')
length = len(list)
stationsCP = {}
for i in range(1,length):
    st = list[i]
    list_new = st.split('|')
    Pinyin = list_new[3]#拼音
    Code = list_new[2]#代码
    Chinese = list_new[1]#汉字
    stationsCP[Code] = (Pinyin)
print "stationsCP = ",stationsCP

#执行  python get_stations.py > stations.py   完成后会生成stations.py文件，用来保存站点的拼音和代码

完成上一步之后我们就获取到了一个包含所有站点的字典啦～接下来进行下一步，读取用户输入的出发站、到达站、乘车日期这三个参数并进行合法性判断。我选择新建一个文件，将这些函数都写在这个文件里，命名为Get_and_Check.py，编写：

#-*-  coding: utf-8  -*-

#获取出发站并判定信息合法性
def get_from_station(correct_station):#correct_station为上一步生成的那个stations字典
    import re
    while 1:
        #对用户输入的数据进行预处理，去除前后和中间包含的空格，并将字符置为小写
        get_from_station = (raw_input("Please Input from_Station:")).lower().strip()
        get_from_station = "".join(get_from_station.split())
        if get_from_station == '':
            continue
        if get_from_station in correct_station:#判断是否合法，合法返回，不合法再次输入
            return get_from_station
        elif get_from_station == 'q':
            return 'quit'
        else:
            print "\nThe from_Station IS WRONG! Please TRY AGAIN!\n"

#获取到达站并判断信息合法性
def get_to_station(correct_station,get_from_station):#correct_station为上一步生成的那个stations字典
    import re
    while 1:
        #对用户输入的数据进行预处理，去除前后和中间包含的空格，并将字符置为小写
        get_to_station = (raw_input("Please Input to_Station:")).lower().strip()
        get_to_station = "".join(get_to_station.split())
        if get_to_station == '':
            continue
        if get_to_station in correct_station:#判断是否合法，合法返回，不合法再次输入
            if get_to_station == get_from_station:#出发站和到达站不能相同
                print "\nSAME STATIONS!!!\n"
                continue
            else:
                return get_to_station
        elif get_to_station == 'q':
            return 'quit'
        else:
            print "\nThe to_Station IS WRONG! Please TRY AGAIN!\n"

#获取发车时间并判断信息合法性
def get_search_date(today,lastday):#today为今天的日期，lastday为本系统可查询的最后日期
    import re, datetime
    today = str(today)#获取今天的日期
    today = re.match(r'^(\d{4}).(\d{2}).(\d{2})',today)#将日期进行匹配分割
    now_year = int(today.group(1))
    now_month = int(today.group(2))
    now_day = int(today.group(3))
    today = datetime.date(now_year,now_month,now_day)#重新组合成****-**-**的格式
    while 1:
        #对用户输入的数据进行预处理，去除前后和中间包含的空格
        get_search_date = raw_input("Please Input search_Time:").strip()
        get_search_date = "".join(get_search_date.split())
        if get_search_date == 'q':
            return 'quit'
        else:#用户输入了年月日
            search_date = re.match(r'^(\d{4}).(\d{2}).(\d{2})',get_search_date)
            if search_date:
                year = int(search_date.group(1))
                month = int(search_date.group(2))
                day = int(search_date.group(3))
                try:
                    search_time = datetime.date(year,month,day)
                except ValueError:
                    print "\nWRONG DATE TYPE!\n"
                    continue

                if today <= search_time and search_time <= lastday:#判断查询日期是否在系统可查询时间范围内
                    return search_time
                else:
                    print "\nWRONG DATE RANGE!!\n"
                    continue
            else:#用户只输入了月和日，则默认本年
                search_date = re.match(r'^(\d{2}).(\d{2})',get_search_date)
                if search_date:
                    year = now_year
                    month = int(search_date.group(1))
                    day = int(search_date.group(2))
                    try:
                        search_time = datetime.date(year,month,day)
                    except ValueError:
                        print "\nWRONG DATE TYPE!\n"
                        continue

                    if today <= search_time and search_time <= lastday:
                        return search_time
                    else:
                        print "\nWRONG DATE RANGE!!\n"
                        continue
                else:#用户只输入了日，则默认为本年本月
                    search_date = re.match(r'^(\d{2})',get_search_date)
                    if search_date:
                        year = now_year
                        month = now_month
                        day = int(search_date.group(1))
                        try:
                            search_time = datetime.date(year,month,day)
                        except ValueError:
                            print "\nWRONG DATE TYPE!\n"
                            continue

                        if today <= search_time and search_time <= lastday:
                            return search_time
                        else:
                            print "\nWRONG DATE RANGE!!\n"
                            continue
                    else:#用户什么都没有输入，则默认为本日
                        return today

因为有时查某一天的车票没有余票了，就要再切换其他日期，那么我就再新增了一个可以查询连续30天余票信息的功能：

#获取连续天数
def get_running_days():
    get_running_days = raw_input("Please input the running_days:").strip()
    get_running_days == "".join(get_running_days.split())
    if get_running_days == '':
        pass
    if get_running_days == 'q':
        return 'quit'
    try:
        get_running_days = int(get_running_days)
        if get_running_days is None:
            running_days = 1
        elif get_running_days >= 30:
            running_days = 29
        else:
            running_days = int(get_running_days)
            running_days = running_days
    except:
        running_days = 1
    return running_days

好的，到这一步，需要准备的基本数据已经足够了，那么接下来是要进行构建URL、尝试进行连接并抓取数据，然后进行解析输出展示。

先进行连接这一步吧，还是在Get_and_Check.py中新增一段代码：

#构建url并获取网页数据
def connect_and_get(code_date,code_from_station,code_to_station):#分别是查询日期，出发站Code，到达站Code
    import requests,json,time

    url =  'https://kyfw.12306.cn/otn/leftTicket/queryZ?leftTicketDTO.train_date={}&leftTicketDTO.from_station={}
            &leftTicketDTO.to_station={}&purpose_codes=ADULT'.format(code_date, code_from_station, code_to_station)
    while 1:
        try:
            r = requests.get(url,timeout = 4)
            trains_info = r.json()['data']['result']#解析数据
            if len(trains_info) != '':
                return trains_info
            elif len(trains_info) == '':
                continue
        except:
            return '0'#连接失败

构建URL过程中，使用format将数据填充进去，比拼接字符串简洁明了。

接下来是要将抓取到的车票信息进行分类保存，包括商务座、一等座、二等座等等类型，为使用prettytable模块做铺垫：

#3 车次  4 始发站  5 出发站  6 终点站  7 到达站  8 发车时间  9 到达时间  10 历时
#11 票种（成人，学生，儿童）  13 日期  32 商务座/特等座  31 一等座
#30 二等座  23 软卧  28 硬卧  29 硬座  26 无座

#车次信息
def analyze(trains_info,stations,search_date):
    from prettytable import PrettyTable
    tickets = []
    if trains_info == '0':
        return '0'
    else:
        for t_info in trains_info:
            tickets_info = t_info.split('|')
            train_name = tickets_info[3]
            from_station = tickets_info[6]
            to_station = tickets_info[7]
            from_time = tickets_info[8]
            arrive_time = tickets_info[9]
            duration = tickets_info[10]
            train_date = search_date
            business_seat = tickets_info[32]
            first_seat = tickets_info[31]
            second_seat = tickets_info[30]
            softsleep = tickets_info[23]
            hardsleep = tickets_info[28]
            hard_seat = tickets_info[29]
            without_seat = tickets_info[26]

            station = (' - '.join([stations[from_station],stations[to_station]]))
            time = (' - '.join([from_time,arrive_time]))

            tickets.append([train_date,train_name,station,time,duration,business_seat,first_seat,second_seat,softsleep,hardsleep,hard_seat,without_seat])
    
    return tickets

『三』主程序

到此，所有的准备工作都做完了，接下来就是要进行展示了。我用到的是PrettyTable模块，不是内置模块，所以需要下载安装一下，进入Python 2.7的安装目录下，我的是 C:\\Python27\Scripts\，然后执行命令 pip install prettytable 下载安装。完成之后我们新建一个新的文件作为主程序，命名为Ticket_Check_System.py

#-*-  coding: utf-8  -*-

from stations import stations
import Get_and_Check
#注意：如果以上两个文件：stations.py和Get_and_Check.py与主函数不在同一文件目录下，则需要以下方法增加路径
#import sys
#sys.path.append(r"")#双引号内为文件完整路径，如E:\\test\

import datetime
import time
from datetime import date,timedelta
from prettytable import PrettyTable
import requests,re

today = date.today()
now = datetime.datetime.now()
correct_station = stations.keys()#获取站点的Code
lastday = today + timedelta(days = 29)#规定最终日期

#格式化输出
string_1 = '\t\t\t\t+=======================================+'
string_2 = '\t\t\t\t+       # Tickets Search System #       +'
print string_1,'\n',string_2,'\n','\t\t\t\t+','\t',today,'to',lastday,'\t','+','\n',string_1,'\n\t\t\t\t  Now_Time:',now,'\n'

def get_and_check():
    #获取需要进行查询的信息
    from_station = Get_and_Check.get_from_station(correct_station)
    if from_station == 'quit':
        return 'q'
    to_station = Get_and_Check.get_to_station(correct_station,from_station)
    if to_station == 'quit':
        return 'q'
    search_date = Get_and_Check.get_search_date(today,lastday)
    if search_date == 'quit':
        return 'q'
    running_days = Get_and_Check.get_running_days()
    if running_days == 'quit':
        return 'q'
    print "\n\t\t\t\t\tFrom_Station:",from_station,"\tTo_Station:",to_station,"\tFrom_Date:",search_date,'\n'

    #获取出发站、到达站、发车时间，连接网络获取信息
    code_from_station = stations.get(from_station)
    code_to_station = stations.get(to_station)
    code_date = str(search_date)
    #prettytable设置列
    pt = PrettyTable('No Date Train From_Station/To_Station From_Date/To_Date Duration Business_Seat FSoft_Seat SSoft_Seat Soft_Sleep Hard_Sleep Hard_Seat Without_Seat'.split(' '))
    count = 0#查询到的列车车次计数
    pt.clear_rows()#初始化

    start_time = time.clock()
    #查询
    for i in range(running_days):
        trains_info = Get_and_Check.connect_and_get(code_date,code_from_station,code_to_station)
        tickets = Get_and_Check.analyze(trains_info,stations,search_date)
        if tickets == '0':
            print code_date,"\tCAN NOT GET CONNETION"
            tickets = ""
        for ticket in tickets:
            ticket.insert(0,count+1)
            pt.add_row(ticket)
            count = count + 1
        search_date = search_date + timedelta(days = 1)
        code_date = str(search_date)

    end_time = time.clock()
    spend_time = end_time - start_time#系统查询运行时间

    if count == 0:
        print "\nSORRY, I CAN NOT FIND ANY TICKETS!\n"
    else:
        print '\nTOTAL:',count,'trains\t','Spend Seconds: %d'%spend_time
        print pt.get_string()#打印结果
        pt.clear()#清空缓存

if __name__ == '__main__':
    while 1:
        gc = get_and_check()
        if gc == 'q':
            print "\nSuccessful Quit!"
            break

『四』结果截图

『五』总结

blahblah =。= 太懒了，不想写总结。。。

算了，还是提一下优化问题

1、建议while循环使用 while 1代替 while true，在Python 2.7中，True是全局变量而不是关键字，所以效率上会拖那么一丢丢后腿

2、查询时使用dict代替list，这样会极大地减少查询的时间，但是对内存会有较大需求

3、格式化字符，第一种方式（"字符串"+"字符串"+...+"字符串"），用时最短。这里：

【Python 2.7】火车票查询

这是第二种方式（"字符串{}...字符串{}".format(a,b...,n)），用时比第一种慢一点。第三种：

【Python 2.7】火车票查询

这是最慢的一种方式（"字符串%d字符串%s" % (a,b) ），但是可读性是最高的『%d整数 %s字符串 %f浮点数 %c字符』

4、if判断时尽量使用 if ... is ...替代 if ... == ...

5、使用join合并迭代器中的字符串

6、使用级联比较 a<b<c替代 a<b and b<c

7、可以使用性能分析工具查看程序运行时所有模块的花费时间，使用方法如下：

python -m cProfile Ticket_Check_System.py

以上所有提及的性能优化的都可以在本例中找到，也都可以进行优化。

【Python 2.7】火车票查询

『一』初衷

『二』准备工作

『三』主程序

『四』结果截图

『五』总结

windows+python2.7安装cv2

ubuntu 16.04 安装 python2.7 以及 cv2, dist-package 和 site-package 的区别, import cv2 出问题解答

pip2 python2.7 安装opencv-python cv2遇到问题的可能解决办法 skbuild list(pattern)

教你Python3实现12306火车票自动抢票，小白必学

php实现12306火车票余票查询和价格查询

Python连接postgresql并以字典返回查询结果

解决Python2.7读写文件中的中文乱码问题

关于python查询mysql表的问题

macOS自带Python2.7版本升级

Linux中RedHat下安装Python2.7开发环境的详细介绍