ip 代理池的获取

程序员文章站 2022-05-19 13:32:37

...

没充钱只能捡捡别人剩下的，，，拿着别人免费的ip一个个试吧，可行率极低，，，

在http://www.xicidaili.com/nt/上爬取ip

附上其他还不错的网站：

http://www.66ip.cn/

http://www.coobobo.com/

http://cn-proxy.com/

https://www.kuaidaili.com/free/inha/

在http://2017.ip138.com/ic.asp上验证ip可行性

import random,requests,time,re
from fake_useragent import UserAgent 

def get_random_header():
    header = {
        'User-Agent': "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36",
        'Referer': 'http://www.xicidaili.com/wt',
    }
    return header

def test_ip(ip,test_url='http://2017.ip138.com/ic.asp',time_out=3):
    proxies={'http': ip[0]+':'+ip[1]}
    try_ip=ip[0]
    #print(try_ip)
    try:
        r=requests.get(test_url,headers=get_random_header(),proxies=proxies,timeout=time_out)
        if r.status_code==200:
            r.encoding='gbk'
            result=re.search('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}',r.text)
            result=result.group()
            if result[:9]==try_ip[:9]:
                print(r.text)
                print('测试通过')
                return True
            else:
                #print('%s:%s 携带代理失败,使用了本地IP' %(ip[0],ip[1]))
                return False    
        else:
            #print('%s:%s 请求码不是200' %(ip[0],ip[1]))
            return False
    except:
        #print('%s:%s 请求过程错误' %(ip[0],ip[1]))
        return False

def   scraw_proxies(page_num,scraw_url="http://www.xicidaili.com/nt/"):
    scraw_ip=list()
    available_ip=list()
    for page in range(1,page_num):
        print("抓取第%d页代理IP" %page)
        url=scraw_url+str(page)
        r=requests.get(url,headers=get_random_header())
        r.encoding='utf-8'
        pattern = re.compile('<td class="country">.*?alt="Cn" />.*?</td>.*?<td>(.*?)</td>.*?<td>(.*?)</td>', re.S)
        scraw_ip= re.findall(pattern, r.text)
        print(scraw_ip)
        for ip in scraw_ip:
            if(test_ip(ip)==True):
                print('%s:%s通过测试，添加进可用代理列表' %(ip[0],ip[1]))
                available_ip.append(ip)
                                    
            else:
                pass    
        print("代理爬虫暂停10s")
        time.sleep(10)
        print("爬虫重启")
    print('抓取结束')
    return available_ip

if __name__=="__main__":
    available_ip=scraw_proxies(3)

上一篇： Python的数字转换和数据交换

下一篇：六、类练习题

ip 代理池的获取

php获取ip的三个属性区别介绍(HTTP_X_FORWARDED_FOR,HTTP_VIA,REMOTE_ADDR)

Tomcat获取Nginx反向代理的客户端域名

使用nginx服务器时，php获取用户ip的方法

PHP获取IP地址的地区问题

写了个爬虫代理ip的脚本给大家使用

php中获取主机名、协议及IP地址的方法

php下获取客户端ip地址的函数_PHP教程

PHP获取IP地址的地区有关问题

php通过Chianz.com获取IP地址与地区的方法_PHP

iphone来电显示归属地软件 php中获取远程客户端的真实ip地址的方法