python打开网页中的链接需要用到的模块和方法

程序员文章站 2022-04-30 13:58:16

...

python打开网页中的链接需要用到的模块和方法
import urllib.request
from urllib.request import URLError

response = urllib.request.urlopen(real_url)
a_text = url.get_attribute(“text”)
a_url = url.get_attribute(“href”)

# coding: utf-8
from selenium import webdriver
import urllib.request
from urllib.request import URLError
import time

# 调用chrome浏览器并后台运行
option=webdriver.ChromeOptions()
option.add_argument('headless')
driver = webdriver.Chrome(options=option)

# driver = webdriver.Chrome()
driver.get("http://www.baidu.com/")   # 要测试的页面
urls = driver.find_elements_by_xpath("//a")   # 匹配出所有a元素里的链接
print("该网页一共有%d个链接："%len(urls))

success_count = 0
fail_count = 0
for url in urls:
    real_url = url.get_attribute('href')
    if real_url == 'None':   # 很多的a元素没有链接，所以是None
        continue
    try:
        response = urllib.request.urlopen(real_url)   # 可以通过urllib测试url地址是否能打开
        time.sleep(1)

    except URLError as reason:
        fail_count += 1
        print('问题链接%d显示的是:'%fail_count, real_url, '对应的文本是：' + url.get_attribute("text"))   # 把测试不通过的url显示出来
    else:
        success_count += 1
        print('可用链接%d是:'%success_count, real_url)   # 测试通过的url展示出来

driver.close()

上一篇： idea中tomcat下读取资源文件的路径，资源文件放在web下面

下一篇： poi实现Excel模板的报表导出

python打开网页中的链接需要用到的模块和方法

python使用BeautifulSoup分页网页中超链接的方法

C#提取网页中超链接link和text部分的方法

Python os模块中的isfile()和isdir()函数均返回false问题解决方法

python使用BeautifulSoup分页网页中超链接的方法

python中os和sys模块的区别与常用方法总结

Flex中给按钮添加链接点击链接打开网页的方法

Python中类/函数/模块的简单介绍和方法调用

Python提取网页中超链接的方法

C#提取网页中超链接link和text部分的方法

Python os模块中的isfile()和isdir()函数均返回false问题解决方法