欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

java爬虫爬取的bug记录

程序员文章站 2022-03-26 17:11:14
java爬虫第一天-bug记录注意:实现爬虫要使用idea自带的maven。bug1:Cannot resolve symbol 'response'原因:try { CloseableHttpResponse response = httpClient.execute(httpGet); if(response.getStatusLine().getStatusCode() == 200) { String c...

java爬虫第一天-bug记录

注意:实现爬虫要使用idea自带的maven。
bug1:

Cannot resolve symbol 'response'

原因:

try {
             CloseableHttpResponse response = httpClient.execute(httpGet);

            if(response.getStatusLine().getStatusCode() == 200) {
                String content = EntityUtils.toString(response.getEntity() , "utf-8");
                System.out.println(content.length());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            try {
                response.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
            try {
                httpClient.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

解决办法:

 CloseableHttpResponse response = null;
        try {
             response = httpClient.execute(httpGet);

            if(response.getStatusLine().getStatusCode() == 200) {
                String content = EntityUtils.toString(response.getEntity() , "utf-8");
                System.out.println(content.length());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            try {
                response.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
            try {
                httpClient.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

bug2:

org.apache.http.client.ClientProtocolException
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:187)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
	at cn.itcast.crawler.test.HttpGetTest.main(HttpGetTest.java:21)
Caused by: org.apache.http.ProtocolException: Target host is not specified
	at org.apache.http.impl.conn.DefaultRoutePlanner.determineRoute(DefaultRoutePlanner.java:71)
	at org.apache.http.impl.client.InternalHttpClient.determineRoute(InternalHttpClient.java:125)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
	... 3 more
Exception in thread "main" java.lang.NullPointerException
	at cn.itcast.crawler.test.HttpGetTest.main(HttpGetTest.java:31)

原因:

CloseableHttpClient httpClient = HttpClients.createDefault();
        HttpGet httpGet = new HttpGet("");
        CloseableHttpResponse response = null;
        try {
             response = httpClient.execute(httpGet);

            if(response.getStatusLine().getStatusCode() == 200) {
                String content = EntityUtils.toString(response.getEntity() , "utf-8");
                System.out.println(content.length());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

解决办法:

CloseableHttpClient httpClient = HttpClients.createDefault();
        HttpGet httpGet = new HttpGet("http://www.itcast.cn");
        CloseableHttpResponse response = null;
        try {
             response = httpClient.execute(httpGet);

            if(response.getStatusLine().getStatusCode() == 200) {
                String content = EntityUtils.toString(response.getEntity() , "utf-8");
                System.out.println(content.length());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

bug3:注释掉test仍看不到日志输出。

log4j:WARN No appenders could be found for logger (org.apache.http.client.protocol.RequestAddCookies).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
</dependency>
        <!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-log4j12 -->
        <dependency >
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.25</version>
<!--            <scope>test</scope>-->
        </dependency>

解决:
因为1.2版本的要在 log4j.properties 文件里配置输出

# Global logging configuration 这个配置是调试用的配置,生产环境要改成INFO或更高级别
log4j.rootLogger=DEBUG, stdout
# Console output...
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern= %-d{yyyy-MM-dd HH:mm:ss} [ %t:%r ] - [ %p ] %m%n

源代码:

package cn.itcast.crawler.test;

import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import java.io.IOException;


public class HttpGetTest {
    public static void main(String[] args)  {
        CloseableHttpClient httpClient = HttpClients.createDefault();
        HttpGet httpGet = new HttpGet("http://www.itcast.cn");
        CloseableHttpResponse response = null;
        try {
            response = httpClient.execute(httpGet);

            if(response.getStatusLine().getStatusCode() == 200) {
                String content = EntityUtils.toString(response.getEntity() , "utf-8");
                System.out.println(content.length());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            try {
                response.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
            try {
                httpClient.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }


    }
}

本文地址:https://blog.csdn.net/m0_48333563/article/details/109564620