Java网络爬虫(二)——GET请求
程序员文章站
2024-01-19 11:37:52
...
HttpClient
- 网络爬虫就是用程序帮助我们访问网络上的资源,我们一直以来都是使用HTTP协议访问互联网的网页,网络爬虫需要编写程序,在这里使用同样的HTTP协议访问网页。
- 这里我们使用Java的HTTP协议客户端 HttpClient这个技术,来实现抓取网页数据。
GET请求
- 代码:
public class HttpGetTest {
public static void main(String[] args) {
//创建HttpClient对象
CloseableHttpClient httpClient = HttpClients.createDefault();
//创建HttpGet对象,设置URL访问地址
HttpGet httpGet = new HttpGet("http://www.baidu.com");
CloseableHttpResponse response = null;//声明一个响应
try {
//使用HttpClient发起请求,获取response
response = httpClient.execute(httpGet);
//解析响应
if(response.getStatusLine().getStatusCode() == 200){
//获取响应体
HttpEntity httpEntity = response.getEntity();
//解析响应体
String content = EntityUtils.toString(httpEntity, "utf-8");
System.out.println(content);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
//关闭response
response.close();
//关闭HttpClient
httpClient.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
- 请求的结果
- 请求信息
- 响应的数据(部分)
- 请求信息
带参数的GET请求
-
模拟的带参数请求操作:在CSDN搜索框搜索关键字java
-
实现代码:
public class HttpGetParamTest {
public static void main(String[] args) throws URISyntaxException {
//创建HttpClient对象
CloseableHttpClient httpClient = HttpClients.createDefault();
//方法一:
//创建URIBuilder
URIBuilder uriBuilder = new URIBuilder("https://so.csdn.net/so/search/s.do");
//设置参数
uriBuilder.setParameter("q","java");
uriBuilder.setParameter("t",null);
uriBuilder.setParameter("u",null);
//创建HttpGet对象,设置URL访问地址
HttpGet httpGet = new HttpGet(uriBuilder.build());
/*
方法二:
//设置请求的地址
String uri = "https://so.csdn.net/so/search/s.do?q=java&t=&u=";
HttpGet httpGet = new HttpGet(uri);
*/
System.out.println("发起请求的信息:"+httpGet);
CloseableHttpResponse response = null;//声明一个响应
try {
//使用HttpClient发起请求,获取response
response = httpClient.execute(httpGet);
//解析响应
if(response.getStatusLine().getStatusCode() == 200){
//获取响应体
HttpEntity httpEntity = response.getEntity();
//解析响应体
String content = EntityUtils.toString(httpEntity, "utf-8");
System.out.println(content);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
//关闭response
response.close();
//关闭HttpClient
httpClient.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
- 结果
- 请求的信息
- 响应的数据(部分)
- 请求的信息