POST抓取页面的问题_html/css_WEB-ITnose
url :http://www.meituan.com/multiact/default/deal/25814805.html
post数据:"yui_3_16_0_1_1423700000_000:{\"act\":\"deal/dynamiccomponent\",\"args\":25814805,\"__referer\":\"\"}"通过python可以正常抓取,抓取代码如下:
import urllibimport urllib2values = { 'yui_3_16_0_1_1423700000_000':'{"act":"deal/dynamiccomponent","args":25814805,"__referer":""}',}header={ "X-Requested-With":"XMLHttpRequest",}url="http://www.meituan.com/multiact/default/deal/25814805.html"data = urllib.urlencode(values)print datareq = urllib2.Request(url, data,header)response = urllib2.urlopen(req)the_page = response.read()print the_page
但是自己构造http请求包无法抓取,请求包如下:
POST /multiact/default/deal/25814805.html HTTP/1.1^M
Host: www.meituan.com^M
Content-Length: 126^M
Connection: close^M
Content-Type: application/x-www-form-urlencoded^M
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2^M
Accept-Encoding: gzip^M
Accept: */*^M
X-Requested-With: XMLHttpRequest^M
抓取失败原因,缺少该参数:Content-Type: application/x-www-form-urlencoded^M
加上就可以了,具体如下:
POST /multiact/default/deal/25814805.html HTTP/1.1^M
Host: www.meituan.com^M
Content-Length: 126^M
Connection: close^M
Content-Type: application/x-www-form-urlencoded^M
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2^M
Accept-Encoding: gzip^M
Accept: */*^M
X-Requested-With: XMLHttpRequest^M
Content-Type: application/x-www-form-urlencoded^M
推荐阅读
-
css3处理sprite背景图压缩来解决H5网页在手机浏览器下图标模糊的问题_html/css_WEB-ITnose
-
iframe父子页传值问题_html/css_WEB-ITnose
-
html打开页面就调用另一个页面的问题_html/css_WEB-ITnose
-
a标签页内跳转路由问题_html/css_WEB-ITnose
-
如何让div充满整个页面的一个问题_html/css_WEB-ITnose
-
关于html里面的注释标记问题_html/css_WEB-ITnose
-
关于网页post账户密码的问题_html/css_WEB-ITnose
-
自行解决12306页面显示异常的问题_html/css_WEB-ITnose
-
CSS3导入字体后用另外一种索引去加载字体里面的字符的问题。_html/css_WEB-ITnose
-
谷歌调试工具选取元素选择不到页面的具体元素问题_html/css_WEB-ITnose