java网络爬虫连接超时解决实例代码
程序员文章站
2023-12-13 11:56:04
本文研究的主要是java网络爬虫连接超时的问题,具体如下。
在网络爬虫中,经常会遇到如下报错。即连接超时。针对此问题,一般解决思路为:将连接时间、请求时间设置长一下。如果...
本文研究的主要是java网络爬虫连接超时的问题,具体如下。
在网络爬虫中,经常会遇到如下报错。即连接超时。针对此问题,一般解决思路为:将连接时间、请求时间设置长一下。如果出现连接超时的情况,则在重新请求【设置重新请求次数】。
exception in thread "main" java.net.connectexception: connection timed out: connect
下面的代码便是使用httpclient解决连接超时的样例程序。直接上程序。
package daili; import java.io.ioexception; import java.net.uri; import org.apache.http.httprequest; import org.apache.http.httpresponse; import org.apache.http.client.clientprotocolexception; import org.apache.http.client.methods.httpget; import org.apache.http.client.params.cookiepolicy; import org.apache.http.client.protocol.clientcontext; import org.apache.http.impl.client.basiccookiestore; import org.apache.http.impl.client.closeablehttpclient; import org.apache.http.impl.client.defaulthttpclient; import org.apache.http.impl.client.defaulthttprequestretryhandler; import org.apache.http.impl.client.httpclients; import org.apache.http.impl.cookie.basicclientcookie2; import org.apache.http.params.httpconnectionparams; import org.apache.http.params.httpparams; import org.apache.http.protocol.basichttpcontext; import org.apache.http.protocol.executioncontext; import org.apache.http.protocol.httpcontext; import org.apache.http.util.entityutils; /* * author:合肥工业大学 管院学院 钱洋 *1563178220@qq.com */ public class test1 { public static void main(string[] args) throws clientprotocolexception, ioexception, interruptedexception { getrawhtml("http://club.autohome.com.cn/bbs/forum-c-2098-1.html#pvareaid=103447"); } public static string getrawhtml ( string url ) throws clientprotocolexception, ioexception, interruptedexception{ //初始化 defaulthttpclient httpclient = new defaulthttpclient(); httpclient.getparams().setparameter("http.protocol.cookie-policy", cookiepolicy.browser_compatibility); //设置参数 httpparams params = httpclient.getparams(); //连接时间 httpconnectionparams.setconnectiontimeout(params, 6000); httpconnectionparams.setsotimeout(params, 6000*20); //超时重新请求次数 defaulthttprequestretryhandler dhr = new defaulthttprequestretryhandler(5,true); httpcontext localcontext = new basichttpcontext(); httprequest request2 = (httprequest) localcontext.getattribute( executioncontext.http_request); httpclient.sethttprequestretryhandler(dhr); basiccookiestore cookiestore = new basiccookiestore(); basicclientcookie2 cookie = new basicclientcookie2("content-type","text/html;charset=utf-8"); basicclientcookie2 cookie1 = new basicclientcookie2("user-agent","mozilla/5.0 (windows nt 10.0; wow64) applewebkit/537.36 (khtml, like gecko) chrome/52.0.2743.116 safari/537.36"); cookiestore.addcookie(cookie); cookiestore.addcookie(cookie1); localcontext.setattribute(clientcontext.cookie_store, cookiestore); httpget request = new httpget(); request.seturi(uri.create(url)); httpresponse response = null; string rawhtml = ""; response = httpclient.execute(request,localcontext); int statuscode = response.getstatusline().getstatuscode(); //获取响应状态码 system.out.println(statuscode); if(statuscode == 200){ //状态码200表示响应成功 //获取实体内容 rawhtml = entityutils.tostring (response.getentity()); system.out.println(rawhtml); //输出实体内容 entityutils.consume(response.getentity()); //消耗实体 } else { //关闭httpentity的流实体 entityutils.consume(response.getentity()); //消耗实体 thread.sleep(20*60*1000); //如果报错先休息30分钟 } httpclient.close(); system.out.println(rawhtml); return rawhtml; } }
结果:
总结
以上就是本文关于java网络爬虫连接超时解决实例代码的全部内容,希望对大家有所帮助。感兴趣的朋友可以继续参阅本站其他相关专题,如有不足之处,欢迎留言指出。感谢朋友们对本站的支持!