java爬虫:HttpClient的简单使用
程序员文章站
2022-05-05 15:10:19
...
一:常用的类库有
- HttpClient
- Jsoup(通常用来解析返回的html页面)
二:常用的框架有
- WebMajic
三:HttpClient的使用
1:依赖
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
2:不带参数的get请求
public static void get() throws IOException {
//创建HttpClient对象
CloseableHttpClient httpClient = HttpClients.createDefault();
//设置url
HttpGet httpget = new HttpGet("http://www.ming3.top/");
//发送请求
CloseableHttpResponse response = httpClient.execute(httpget);
System.out.println(response.toString());
//获取返回值中的Entity数组对象(存放着html页面)
String content = EntityUtils.toString(response.getEntity(), "UTF-8");
System.out.println(content);
}
3:带参数的post请求
public static void post() throws IOException {
// 创建Httpclient对象
CloseableHttpClient httpclient = HttpClients.createDefault();
// 创建http POST请求
HttpPost httpPost = new HttpPost("http://www.ming3.top/wp-login.php");
// 设置2个post参数,一个是log、一个是pwd
List<NameValuePair> parameters = new ArrayList<NameValuePair>(0);
parameters.add(new BasicNameValuePair("log", "eighteen"));
parameters.add(new BasicNameValuePair("pwd", "[email protected]"));
// 构造一个form表单式的实体
UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(parameters);
// 将请求实体设置到httpPost对象中
httpPost.setEntity(formEntity);
//伪装浏览器请求
httpPost.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36");
// 执行请求
CloseableHttpResponse response = httpclient.execute(httpPost);
System.out.println(response);
String content = EntityUtils.toString(response.getEntity(), "UTF-8");
System.out.println(content);
}
4:当然还有带参数的get和不带参数的post,这里不再举例
5:使用post进行登录操作之后,常会返回需要重定向操作
如图所示:返回值是302,需要重定向,需要设置cookie
Ps:
上一篇: python---内置模块(2)
下一篇: 极简Scrapy爬虫4:items包装