欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

java网络爬虫连接超时解决实例代码

程序员文章站 2023-12-01 08:26:10
本文研究的主要是java网络爬虫连接超时的问题,具体如下。 在网络爬虫中,经常会遇到如下报错。即连接超时。针对此问题,一般解决思路为:将连接时间、请求时间设置长一下。如果...

本文研究的主要是java网络爬虫连接超时的问题,具体如下。

在网络爬虫中,经常会遇到如下报错。即连接超时。针对此问题,一般解决思路为:将连接时间、请求时间设置长一下。如果出现连接超时的情况,则在重新请求【设置重新请求次数】。

exception in thread "main" java.net.connectexception: connection timed out: connect

下面的代码便是使用httpclient解决连接超时的样例程序。直接上程序。

package daili;
import java.io.ioexception;
import java.net.uri;
import org.apache.http.httprequest;
import org.apache.http.httpresponse;
import org.apache.http.client.clientprotocolexception;
import org.apache.http.client.methods.httpget;
import org.apache.http.client.params.cookiepolicy;
import org.apache.http.client.protocol.clientcontext;
import org.apache.http.impl.client.basiccookiestore;
import org.apache.http.impl.client.closeablehttpclient;
import org.apache.http.impl.client.defaulthttpclient;
import org.apache.http.impl.client.defaulthttprequestretryhandler;
import org.apache.http.impl.client.httpclients;
import org.apache.http.impl.cookie.basicclientcookie2;
import org.apache.http.params.httpconnectionparams;
import org.apache.http.params.httpparams;
import org.apache.http.protocol.basichttpcontext;
import org.apache.http.protocol.executioncontext;
import org.apache.http.protocol.httpcontext;
import org.apache.http.util.entityutils;
/*
 * author:合肥工业大学 管院学院 钱洋 
 *1563178220@qq.com
*/
public class test1 {
	public static void main(string[] args) throws clientprotocolexception, ioexception, interruptedexception {
		getrawhtml("http://club.autohome.com.cn/bbs/forum-c-2098-1.html#pvareaid=103447");
	}
	public static string getrawhtml ( string url ) throws clientprotocolexception, ioexception, interruptedexception{
		//初始化
		defaulthttpclient httpclient = new defaulthttpclient();
		httpclient.getparams().setparameter("http.protocol.cookie-policy", 
		        cookiepolicy.browser_compatibility);
		//设置参数
		httpparams params = httpclient.getparams();
		//连接时间
		httpconnectionparams.setconnectiontimeout(params, 6000);
		httpconnectionparams.setsotimeout(params, 6000*20);
		//超时重新请求次数
		defaulthttprequestretryhandler dhr = new defaulthttprequestretryhandler(5,true);
		httpcontext localcontext = new basichttpcontext();
		httprequest request2 = (httprequest) localcontext.getattribute( 
		        executioncontext.http_request);
		httpclient.sethttprequestretryhandler(dhr);
		basiccookiestore cookiestore = new basiccookiestore();
		basicclientcookie2 cookie = new basicclientcookie2("content-type","text/html;charset=utf-8");
		basicclientcookie2 cookie1 = new basicclientcookie2("user-agent","mozilla/5.0 (windows nt 10.0; wow64) applewebkit/537.36 (khtml, like gecko) chrome/52.0.2743.116 safari/537.36");
		cookiestore.addcookie(cookie);
		cookiestore.addcookie(cookie1);
		localcontext.setattribute(clientcontext.cookie_store, cookiestore);
		httpget request = new httpget();
		request.seturi(uri.create(url));
		httpresponse response = null;
		string rawhtml = "";
		response = httpclient.execute(request,localcontext);
		int statuscode = response.getstatusline().getstatuscode();
		//获取响应状态码
		system.out.println(statuscode);
		if(statuscode == 200){
			//状态码200表示响应成功
			//获取实体内容
			rawhtml = entityutils.tostring (response.getentity());
			system.out.println(rawhtml);
			//输出实体内容
			entityutils.consume(response.getentity());
			//消耗实体
		} else {
			//关闭httpentity的流实体
			entityutils.consume(response.getentity());
			//消耗实体
			thread.sleep(20*60*1000);
			//如果报错先休息30分钟
		}
		httpclient.close();
		system.out.println(rawhtml);
		return rawhtml;
	}
}

结果:

java网络爬虫连接超时解决实例代码

总结

以上就是本文关于java网络爬虫连接超时解决实例代码的全部内容,希望对大家有所帮助。感兴趣的朋友可以继续参阅本站其他相关专题,如有不足之处,欢迎留言指出。感谢朋友们对本站的支持!