python urllib2模块学习
程序员文章站
2023-12-23 23:12:27
urllib2是urllib扩展的库,不仅可以使用http协议,而且可以扩展到ftp等协议。
常用函数一:urlopen(url, data=none, timeout=)
最基本...
urllib2是urllib扩展的库,不仅可以使用http协议,而且可以扩展到ftp等协议。
常用函数一:urlopen(url, data=none, timeout=
默认如果只有一个url参数,那么默认是使用http get 的方法来访问url。
如果访问的url不存在, 那么会返回一个httperror
import urllib2 url = "https://www.jnrain.com/go" response = urllib2.urlopen(url) print response.info() print response.read()
其中错误信息如下:
raise httperror(req.get_full_url(), code, msg, hdrs, fp) urllib2.httperror: http error 404: not found可以看到是404错误,默认情况下,只要不是2xx的返回码,都会被当成错误对待。可以通过httperror来捕捉错误信息
import urllib2 url = "https://www.jnrain.com/go" try: response = urllib2.urlopen(url) print response.info() print response.read() except urllib2.httperror, e: print e.getcode() print e.reason print e.geturl() print "-------------------------" print e.info() print e.read()通过捕捉错误可以打印出详细的错误信息:
404 not found https://www.jnrain.com/go ------------------------- server: nginx/1.4.1 date: wed, 19 feb 2014 08:51:50 gmt content-type: text/html; charset=gb18030 content-length: 168 connection: close 404 not found 404 not found nginx/1.4.1
import urllib2 request = urllib2.request("https://www.jnrain.com/wforum/logon.php") request.add_header('user-agent', 'internet explorer') try: response = urllib2.urlopen(request) except urllib2.urlerror, e: print e.code headers = response.info() data = response.read()
这样我们的http request头部中的user-agent就变成了internet explorer,如果创建reques的时候不添加data, 那么默认使用get方法,如果添加data,那么使用post方法,下面是request类中的方法:
def get_method(self): if self.has_data(): return "post" else: return "get" # xxx these helper methods are lame def add_data(self, data): self.data = data def has_data(self): return self.data is not none def get_data(self): return self.data
可以在执行时输出debug信息
import urllib2 httphandler = urllib2.httphandler(debuglevel=1) httpshandler = urllib2.httpshandler(debuglevel=1) opener = urllib2.build_opener(httphandler, httpshandler) urllib2.install_opener(opener) url = "https://www.baidu.com" request = urllib2.request(url) request.add_header("user-agent", "firefox") response = urllib2.urlopen(request)
只要定义debuglevel设置为1就可以了
post 表单的相关使用:
import urllib2 import urllib httphandler = urllib2.httphandler(debuglevel=1) httpshandler = urllib2.httpshandler(debuglevel=1) opener = urllib2.build_opener(httphandler, httpshandler) urllib2.install_opener(opener) url = "https://www.account.xiaomi.com/pass/serviceloginauth2" postdata = urllib.urlencode({"user":"xxxxxxx", "_json":"true", "pwd":"xxxxxxx", "sid":"eshop", "_sign":"g7k1hszpyiao4tslhs1xddjbpv8=", "callback":"https://order.xiaomi.com/login/callback?followup=http%3a%2f%2fwww.xiaomi.com%2findex.php&sign=zjewmwvloty3mwm1oge3yjyxngrizjq5mzjmyji5nde0zwy0nzy5mw,," }) request = urllib2.request(url, data=postdata) request.add_header("user-agent", "mozilla/5.0 (x11; ubuntu; linux x86_64; rv:26.0) gecko/20100101 firefox/26.0") try: response = urllib2.urlopen(request) print response.info() print response.read() except urllib2.httperror,e: print e.getcode() print e.reason上面是小米的自动登陆界面的表单post提交,可以使用firefox和tamper data来获取post需要的参数,然后填写正确的用户名和密码就可以了。