欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

java爬虫:HttpClient的简单使用

程序员文章站 2022-05-05 15:10:19
...

一:常用的类库有

  • HttpClient
  • Jsoup(通常用来解析返回的html页面)

二:常用的框架有

  • WebMajic

三:HttpClient的使用

1:依赖

        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.13</version>
        </dependency>

2:不带参数的get请求

    public static void get() throws IOException {
    	//创建HttpClient对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        //设置url
        HttpGet httpget = new HttpGet("http://www.ming3.top/");
        //发送请求
        CloseableHttpResponse response = httpClient.execute(httpget);
        System.out.println(response.toString());
        //获取返回值中的Entity数组对象(存放着html页面)
        String content = EntityUtils.toString(response.getEntity(), "UTF-8");
        System.out.println(content);
    }

3:带参数的post请求

    public static void post() throws IOException {
        // 创建Httpclient对象
        CloseableHttpClient httpclient = HttpClients.createDefault();
        // 创建http POST请求
        HttpPost httpPost = new HttpPost("http://www.ming3.top/wp-login.php");
        // 设置2个post参数,一个是log、一个是pwd
        List<NameValuePair> parameters = new ArrayList<NameValuePair>(0);
        parameters.add(new BasicNameValuePair("log", "eighteen"));
        parameters.add(new BasicNameValuePair("pwd", "[email protected]"));
        // 构造一个form表单式的实体
        UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(parameters);
        // 将请求实体设置到httpPost对象中
        httpPost.setEntity(formEntity);
        //伪装浏览器请求
        httpPost.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36");
        // 执行请求
        CloseableHttpResponse response = httpclient.execute(httpPost);
        System.out.println(response);
        String content = EntityUtils.toString(response.getEntity(), "UTF-8");
        System.out.println(content);
    }

4:当然还有带参数的get和不带参数的post,这里不再举例

5:使用post进行登录操作之后,常会返回需要重定向操作
java爬虫:HttpClient的简单使用如图所示:返回值是302,需要重定向,需要设置cookie

Ps:

HttpClient简易使用,写的很好

相关标签: 爬虫 java http