使用java的HttpClient实现多线程并发
说明:以下的代码基于httpclient4.5.2实现。
我们要使用java的httpclient实现get请求抓取网页是一件比较容易实现的工作:
public static string get(string url) { closeablehttpresponseresponse = null; bufferedreader in = null; string result = ""; try { closeablehttpclienthttpclient = httpclients.createdefault(); httpgethttpget = new httpget(url); response = httpclient.execute(httpget); in = new bufferedreader(new inputstreamreader(response.getentity().getcontent())); stringbuffersb = new stringbuffer(""); string line = ""; string nl = system.getproperty("line.separator"); while ((line = in.readline()) != null) { sb.append(line + nl); } in.close(); result = sb.tostring(); } catch (ioexception e) { e.printstacktrace(); } finally { try { if (null != response) response.close(); } catch (ioexception e) { e.printstacktrace(); } } return result; }
要多线程执行get请求时上面的方法也堪用。不过这种多线程请求是基于在每次调用get方法时创建一个httpclient实例实现的。每个httpclient实例使用一次即被回收。这显然不是一种最优的实现。
httpclient提供了多线程请求方案,可以查看官方文档的《 pooling connection manager 》这一节。httpclient实现多线程请求是基于内置的连接池实现的,其中有一个关键的类即poolinghttpclientconnectionmanager,这个类负责管理httpclient连接池。在poolinghttpclientconnectionmanager中提供了两个关键的方法:setmaxtotal和setdefaultmaxperroute。setmaxtotal设置连接池的最大连接数,setdefaultmaxperroute设置每个路由上的默认连接个数。此外还有一个方法setmaxperroute——单独为某个站点设置最大连接个数,像这样:
httphosthost = new httphost("locahost", 80); cm.setmaxperroute(new httproute(host), 50);
根据文档稍稍调整下我们的get请求实现:
package com.zhyea.robin; import org.apache.http.client.methods.closeablehttpresponse; import org.apache.http.client.methods.httpget; import org.apache.http.impl.client.closeablehttpclient; import org.apache.http.impl.client.httpclients; import org.apache.http.impl.conn.poolinghttpclientconnectionmanager; import java.io.bufferedreader; import java.io.ioexception; import java.io.inputstreamreader; public class httputil { private static closeablehttpclienthttpclient; static { poolinghttpclientconnectionmanagercm = new poolinghttpclientconnectionmanager(); cm.setmaxtotal(200); cm.setdefaultmaxperroute(20); cm.setdefaultmaxperroute(50); httpclient = httpclients.custom().setconnectionmanager(cm).build(); } public static string get(string url) { closeablehttpresponseresponse = null; bufferedreaderin = null; string result = ""; try { httpgethttpget = new httpget(url); response = httpclient.execute(httpget); in = new bufferedreader(new inputstreamreader(response.getentity().getcontent())); stringbuffersb = new stringbuffer(""); string line = ""; string nl = system.getproperty("line.separator"); while ((line = in.readline()) != null) { sb.append(line + nl); } in.close(); result = sb.tostring(); } catch (ioexception e) { e.printstacktrace(); } finally { try { if (null != response) response.close(); } catch (ioexception e) { e.printstacktrace(); } } return result; } public static void main(string[] args) { system.out.println(get("https://www.baidu.com/")); } }
这样就差不多了。不过对于我自己而言,我更喜欢httpclient的fluent实现,比如我们刚才实现的http get请求完全可以这样简单的实现:
package com.zhyea.robin; import org.apache.http.client.fluent.request; import java.io.ioexception; public class httputil { public static string get(string url) { string result = ""; try { result = request.get(url) .connecttimeout(1000) .sockettimeout(1000) .execute().returncontent().asstring(); } catch (ioexception e) { e.printstacktrace(); } return result; } public static void main(string[] args) { system.out.println(get("https://www.baidu.com/")); } }
我们要做的只是将以前的httpclient依赖替换为fluent-hc依赖:
<dependency> <groupid>org.apache.httpcomponents</groupid> <artifactid>fluent-hc</artifactid> <version>4.5.2</version> </dependency>
并且这个fluent实现天然就是采用poolinghttpclientconnectionmanager完成的。它设置的maxtotal和defaultmaxperroute的值分别是200和100:
connmgr = new poolinghttpclientconnectionmanager(sfr); connmgr.setdefaultmaxperroute(100); connmgr.setmaxtotal(200);
唯一一点让人不爽的就是executor没有提供调整这两个值的方法。不过这也完全够用了,实在不行的话,还可以考虑重写executor方法,然后直接使用executor执行get请求:
executor.newinstance().execute(request.get(url)) .returncontent().asstring();
就这样!