java网络爬虫连接超时解决实例代码

程序员文章站 2023-12-13 11:56:04

本文研究的主要是java网络爬虫连接超时的问题，具体如下。在网络爬虫中，经常会遇到如下报错。即连接超时。针对此问题，一般解决思路为：将连接时间、请求时间设置长一下。如果...

本文研究的主要是java网络爬虫连接超时的问题，具体如下。

在网络爬虫中，经常会遇到如下报错。即连接超时。针对此问题，一般解决思路为：将连接时间、请求时间设置长一下。如果出现连接超时的情况，则在重新请求【设置重新请求次数】。

exception in thread "main" java.net.connectexception: connection timed out: connect

下面的代码便是使用httpclient解决连接超时的样例程序。直接上程序。

package daili;
import java.io.ioexception;
import java.net.uri;
import org.apache.http.httprequest;
import org.apache.http.httpresponse;
import org.apache.http.client.clientprotocolexception;
import org.apache.http.client.methods.httpget;
import org.apache.http.client.params.cookiepolicy;
import org.apache.http.client.protocol.clientcontext;
import org.apache.http.impl.client.basiccookiestore;
import org.apache.http.impl.client.closeablehttpclient;
import org.apache.http.impl.client.defaulthttpclient;
import org.apache.http.impl.client.defaulthttprequestretryhandler;
import org.apache.http.impl.client.httpclients;
import org.apache.http.impl.cookie.basicclientcookie2;
import org.apache.http.params.httpconnectionparams;
import org.apache.http.params.httpparams;
import org.apache.http.protocol.basichttpcontext;
import org.apache.http.protocol.executioncontext;
import org.apache.http.protocol.httpcontext;
import org.apache.http.util.entityutils;
/*
 * author:合肥工业大学 管院学院 钱洋 
 *1563178220@qq.com
*/
public class test1 {
	public static void main(string[] args) throws clientprotocolexception, ioexception, interruptedexception {
		getrawhtml("http://club.autohome.com.cn/bbs/forum-c-2098-1.html#pvareaid=103447");
	}
	public static string getrawhtml ( string url ) throws clientprotocolexception, ioexception, interruptedexception{
		//初始化
		defaulthttpclient httpclient = new defaulthttpclient();
		httpclient.getparams().setparameter("http.protocol.cookie-policy", 
		        cookiepolicy.browser_compatibility);
		//设置参数
		httpparams params = httpclient.getparams();
		//连接时间
		httpconnectionparams.setconnectiontimeout(params, 6000);
		httpconnectionparams.setsotimeout(params, 6000*20);
		//超时重新请求次数
		defaulthttprequestretryhandler dhr = new defaulthttprequestretryhandler(5,true);
		httpcontext localcontext = new basichttpcontext();
		httprequest request2 = (httprequest) localcontext.getattribute( 
		        executioncontext.http_request);
		httpclient.sethttprequestretryhandler(dhr);
		basiccookiestore cookiestore = new basiccookiestore();
		basicclientcookie2 cookie = new basicclientcookie2("content-type","text/html;charset=utf-8");
		basicclientcookie2 cookie1 = new basicclientcookie2("user-agent","mozilla/5.0 (windows nt 10.0; wow64) applewebkit/537.36 (khtml, like gecko) chrome/52.0.2743.116 safari/537.36");
		cookiestore.addcookie(cookie);
		cookiestore.addcookie(cookie1);
		localcontext.setattribute(clientcontext.cookie_store, cookiestore);
		httpget request = new httpget();
		request.seturi(uri.create(url));
		httpresponse response = null;
		string rawhtml = "";
		response = httpclient.execute(request,localcontext);
		int statuscode = response.getstatusline().getstatuscode();
		//获取响应状态码
		system.out.println(statuscode);
		if(statuscode == 200){
			//状态码200表示响应成功
			//获取实体内容
			rawhtml = entityutils.tostring (response.getentity());
			system.out.println(rawhtml);
			//输出实体内容
			entityutils.consume(response.getentity());
			//消耗实体
		} else {
			//关闭httpentity的流实体
			entityutils.consume(response.getentity());
			//消耗实体
			thread.sleep(20*60*1000);
			//如果报错先休息30分钟
		}
		httpclient.close();
		system.out.println(rawhtml);
		return rawhtml;
	}
}

结果：

java网络爬虫连接超时解决实例代码

总结

以上就是本文关于java网络爬虫连接超时解决实例代码的全部内容，希望对大家有所帮助。感兴趣的朋友可以继续参阅本站其他相关专题，如有不足之处，欢迎留言指出。感谢朋友们对本站的支持！