基于selenium进行网页填充数据与点击提交，实现自动化办公

程序员文章站 2022-04-11 18:53:25

...

文章均从个人微信公众号“ AI牛逼顿”转载，文末扫码，欢迎关注！

最近的工作中，遇到很多重复性的操作流程，以至于点击鼠标的手指都隐隐作痛。

工作嘛，一回生二回熟，熟完之后就得动点心思，怎样才能提高效率呢。先放出两个动图，看看全自动完成操作流程的效果。（考虑数据安全问题，网页里的部分地方打码；本次的代码与数据也不会上传。）

动图1 添加同类试题

基于selenium进行网页填充数据与点击提交，实现自动化办公

动图1展示的是自动完成页面登录、文字输入、点击查询、添加试题、确认提交等环节。

动图2 知识点复用

基于selenium进行网页填充数据与点击提交，实现自动化办公

动图2展示的是自动完成页面登录、文字输入、点击查询、展开树形目录、勾选复选框、确认提交等环节。

哈哈，看着浏览器自动完成所有的操作流程，端着水杯的我，一股极度的舒适感涌上心头。

实现原理

采用python开发环境，读取已经整理好的数据表格，利用selenium框架实现自动操作浏览器，从而完成工作流程。

实现代码

整体的代码组成如下图所示

基于selenium进行网页填充数据与点击提交，实现自动化办公

（1）data文件夹里存放整理好的数据表；

（2）configuration.py为配置信息。有登陆的用户名与密码，有定位浏览器元素的各种xpath信息；

（3）insert_first_topic.py用来实现视频1里的功能，文件里面为封装的AddSimilaryTopic类；

（4）multiplex_point.py用来实现视频2里的功能，文件里面为封装的CopyPoint类。

代码简介

AddSimilaryTopic类的结构

from selenium import webdriver
import time
import pandas as pd
import configuration as config

class AddSimilaryTopic(object):
    def __init__(self):
        self.website = config.website
        self.username = config.username
        self.password = config.password
        self.data_file = config.point_add_topic_file
        self.dr = webdriver.Chrome()

    '''加载数据的函数'''
    def load_data(self):
        data = pd.read_excel(self.data_file)
        point_name = data.point_name
        point_id = data.point_id
        topic_id = data.topic_id
        data_length = len(data)
        return point_name, point_id, topic_id, data_length

    '''实现登录网页首页，并选择用户角色'''
    def login(self):
        self.dr.get(self.website)
        # 最大化窗口
        self.dr.maximize_window()
        time.sleep(1)
        # 找到输入用户名的搜索框
        username = self.dr.find_element_by_id('username')
        username.send_keys(self.username)
        # 找到输入密码的搜索框
        password = self.dr.find_element_by_id('password')
        password.send_keys(self.password)
        # 找到登录按钮并点击
        login_btn = self.dr.find_element_by_xpath(config.login_btn)
        login_btn.click()
        time.sleep(4)
        # 找到对应模块的按钮并点击（这里是“锚点管理”按钮）
        point_manage = self.dr.find_element_by_xpath(config.point_manager)
        point_manage.click()
        time.sleep(4)
        # 找到学科选择按钮并点击
        subject = self.dr.find_element_by_xpath(config.subject)
        subject.click()
        time.sleep(4)
        # 选择学科学段按钮并点击
        pythisc_high = self.dr.find_element_by_xpath(config.pyhsic_high)
        pythisc_high.click()
        time.sleep(4)

    '''寻找输入框并输入数据'''
    def input_word(self, xpath, wd, t=None):
        word = self.dr.find_element_by_xpath(xpath)
        word.send_keys(wd)
        if t is None:
            time.sleep(2)
        else:
            time.sleep(t)

    '''清除输入框里的数据，方便下次再次输入数据'''
    def clear_word(self, xpath):
        word = self.dr.find_element_by_xpath(xpath)
        word.clear()
        time.sleep(1)

    '''寻找按钮并实现点击功能'''
    def select_btn_and_click(self, xpath, t=None):
        btn = self.dr.find_element_by_xpath(xpath)
        btn.click()
        if t is None:
            time.sleep(2)
        else:
            time.sleep(t)

    '''整个操作流程'''
    def operate(self):
        point_name, point_id, topic_id, data_length = self.load_data()
        self.login()
        for i in range(data_length):
            try:
                # 由于循环输入锚点名称，然后添加同类题，会弹出新的窗口。为了能统一使用同一个窗口进行锚点搜索，这里始终选择第一个当前窗口，并配合末尾的窗口关闭命令
                self.dr.switch_to_window(self.dr.window_handles[0])
                # 找到锚点查询的搜索框并输入锚点名称
                self.input_word(config.point_name, point_name[i])
                # 找到锚点查询按钮并点击
                self.select_btn_and_click(config.point_qurey_btn)
                # 将锚点查询搜索框的内容清除，方便下一次输入
                self.clear_word(config.point_name)
                # 找到查询结果的锚点名称链接并点击
                self.select_btn_and_click(config.point_name_btn.format(point_id[i]))
                # 找到资源编辑按钮并点击
                self.select_btn_and_click(config.resource_edit_btn, 4)
                # 上一步操作后会弹出新的窗口，现在要在新的窗口里进行操作
                self.dr.switch_to_window(self.dr.window_handles[1])
                # 找到“手动选题”按钮并点击
                self.select_btn_and_click(config.choose_topic_btn)
                # 找到试题id输入框并输入试题id
                self.input_word(config.input_topic_id, topic_id[i])
                # 找到查询按钮并点击
                self.select_btn_and_click(config.topic_qurey_btn)
                # 找到同类题按钮并点击
                self.select_btn_and_click(config.similar_topic_btn)
                # 找到审核按钮并点击
                self.select_btn_and_click(config.check_btn, 4)
                # 找到审核通过按钮并点击
                self.select_btn_and_click(config.agree_check_btn)
                # 找到提交按钮并点击
                self.select_btn_and_click(config.refer_btn)
                # 找到确认按钮并点击
                self.select_btn_and_click(config.sure_btn)
                # 关闭当前窗口
                self.dr.close()
                time.sleep(2)

            except:
                #这里要加上关闭当前窗口的命令，因为try语句里的代码执行到中间，如果出错，当前窗口没法关闭
                #会导致后面循环时，窗口选择出现混乱。同时也能避免出现多个异常时，浏览器上出现多个没有关闭的窗口。
                self.dr.close()
                un_add_data = pd.DataFrame([[point_name[i], point_id[i], topic_id[i]]])
                un_add_data.to_csv('un_add_topic.csv', mode='a', index=False, header=False)
                print('数据表格里第' + str(i+1) +'行的试题没有添加成功，请注意！')

if __name__ == '__main__':
    add = AddSimilaryTopic()
    add.operate()

将登陆功能、输入数据功能、点击按钮功能进行了封装。核心功能由operate方法来实现。

operate方法里，严格按照人工操作流程来实现。每一步的作用都有对应的注释进行说明，不再赘述。这里只指出两个需要注意的地方：

1、尽量把每一步操作都设置延时。原因有两个：

（1）网速不稳定使得页面没有及时加载出来时，就会导致浏览器元素定位失败。

（2）主要是担心频繁操作会导致自己的IP被封

2、异常的处理。公司数据后台有时会闹情绪，使得页面无法访问。此时将没有写入的数据保存起来，等循环结束后，可以重新再次自动写入或者人工处理。

CopyPoint类的结构

from selenium import webdriver
import time
import pandas as pd
import configuration as config

class CopyPoint(object):
    def __init__(self, sheetname):
        self.website = config.website
        self.username = config.username
        self.password = config.password
        self.index_information_file = config.index_information_file
        self.point_copy_file = config.point_copy_file
        self.sheetname = sheetname
        self.dr = webdriver.Chrome()

    '''加载数据的函数'''
    def load_data(self):
        index_information_data = pd.read_excel(self.index_information_file, sheetname=self.sheetname)
        point_data = pd.read_excel(self.point_copy_file)
        data = pd.merge(point_data, index_information_data)
        return data

    '''实现登录网页首页，并选择用户角色'''
    def login(self):
        self.dr.get(self.website)
        # 最大化窗口
        self.dr.maximize_window()
        time.sleep(1)
        # 找到输入用户名的搜索框
        username = self.dr.find_element_by_id('username')
        username.send_keys(self.username)
        # 找到输入密码的搜索框
        password = self.dr.find_element_by_id('password')
        password.send_keys(self.password)
        # 找到登录按钮并点击
        login_btn = self.dr.find_element_by_xpath(config.login_btn)
        login_btn.click()
        time.sleep(4)
        # 找到对应模块的按钮并点击（这里是“锚点管理”按钮）
        point_manage = self.dr.find_element_by_xpath(config.point_manager)
        point_manage.click()
        time.sleep(4)
        # 找到学科选择按钮并点击
        subject = self.dr.find_element_by_xpath(config.subject)
        subject.click()
        time.sleep(4)
        # 选择学科学段按钮并点击
        pythisc_high = self.dr.find_element_by_xpath(config.pyhsic_high)
        pythisc_high.click()
        time.sleep(4)

    '''寻找输入框并输入数据'''
    def input_word(self, xpath, wd, t=None):
        word = self.dr.find_element_by_xpath(xpath)
        word.send_keys(wd)
        if t is None:
            time.sleep(2)
        else:
            time.sleep(t)

    '''清除输入框里的数据，方便下次再次输入数据'''
    def clear_word(self, xpath):
        word = self.dr.find_element_by_xpath(xpath)
        word.clear()
        time.sleep(1)

    '''寻找按钮并实现点击功能'''
    def select_btn_and_click(self, xpath, t=None):
        btn = self.dr.find_element_by_xpath(xpath)
        btn.click()
        if t is None:
            time.sleep(2)
        else:
            time.sleep(t)

    '''整个操作流程'''
    def operate(self):
        data = self.load_data()
        self.login()
        for i in range(len(data)):
            try:
                # 找到锚点查询的搜索框并输入锚点名称
                self.input_word(config.point_name, data['锚点名称'][i])
                # 找到锚点查询按钮并点击
                self.select_btn_and_click(config.point_qurey_btn)
                # 将锚点查询搜索框的内容清除，方便下一次输入
                self.clear_word(config.point_name)
                # 找到查询结果的锚点名称链接并点击
                self.select_btn_and_click(config.point_name_btn.format(data['锚点id'][i]))
                # 找到编辑按钮并点击
                self.select_btn_and_click(config.edit_btn)
                # 找到添加按钮并点击
                self.select_btn_and_click(config.add_btn)
                '''展开树形目录时，要设置延时，不然就容易报错：找不到元素
                我猜想的原因是：进度条下拉需要时间，如果进度条没有拉到合适的位置，就容易出错'''
                # 找到对应版本的按钮，并点击前面的“+”展开目录
                self.select_btn_and_click(config.editon_index % data['教材版本序号'][i])
                # 找到对应模块的位置，并点击前面的“+”展开目录
                self.select_btn_and_click(config.module_index % (data['教材版本序号'][i], \
                                                                 data['模块序号'][i]))
                if self.sheetname == '沪科版':
                    # 找到对应篇的位置，并点击前面的“+”展开目录
                    self.select_btn_and_click(config.huke_piece_index % (data['教材版本序号'][i], \
                                                                         data['模块序号'][i], \
                                                                         data['篇序号'][i]))
                    # 找到对应章的位置，并点击复选框
                    self.select_btn_and_click(
                        config.huke_chapter_index % (data['教材版本序号'][i], \
                                                     data['模块序号'][i], \
                                                     data['篇序号'][i], \
                                                     data['章序号'][i]))
                    # 找到对应节的位置，并点击复选框
                    self.select_btn_and_click(
                        config.huke_section_index % (data['教材版本序号'][i], \
                                                     data['模块序号'][i], \
                                                     data['篇序号'][i], \
                                                     data['章序号'][i], \
                                                     data['节序号'][i]))
                else:
                    # 找到对应章的位置，并点击前面的“+”展开目录
                    self.select_btn_and_click(config.chapter_index % (data['教材版本序号'][i], \
                                                                      data['模块序号'][i], \
                                                                      data['章序号'][i]))
                    # 找到对应节的位置，并点击复选框
                    self.select_btn_and_click(config.section_index % (data['教材版本序号'][i], \
                                                                      data['模块序号'][i], \
                                                                      data['章序号'][i], \
                                                                      data['节序号'][i]))
                # 找到确定按钮并点击
                self.select_btn_and_click(config.point_sure_btn)
                time.sleep(2)

            except:
                un_copy_data = pd.DataFrame([data.iloc[i]])
                un_copy_data.to_csv('un_copy_topic.csv', mode='a', index=False)
                print('数据表格里第' + str(i+1) +'行的试题没有添加成功，请注意！')
        #循环结束后，关闭浏览器
        self.dr.close()

if __name__ == '__main__':
    copy = CopyPoint('沪科版')
    copy.operate()

动图2由CopyPoint类来实现，其结构和AddSimilaryTopic类完全一样。

operate方法的思路也同上，唯一的区别在于展开树形目录定位复选框时，要根据具体的输入数据来确定xpath表达式。所以代码中，复选框的xpath采用字符串拼接的办法来确定的。

体会

xpath表达式无疑是这个任务里的核心。如果直接从浏览器里的开发者工具中复制xpath表达式，往往会出现定位不到元素的错误。原因在于，直接复制的xpath表达式写的非常死，有点像相对路径。只有认真分析了所要定位元素的xpath特点，才能写出通用的xpath表达式。

基于selenium进行网页填充数据与点击提交，实现自动化办公

千里之行始于足下！定期分享人工智能的干货，通俗展现原理和案例实现，并探索案例在中学物理教育过程中的使用。还有各种有趣的物理科普哟。坚持原创分享！坚持理解并吸收后的转发分享！欢迎大家的关注与交流。

上一篇：简单的了解下什么是面向对象编程和什么是面向过程编程

下一篇： Eureka实现微服务的调用