欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Hadoop之Java通过URL操作HDFS-yellowcong

程序员文章站 2022-04-11 15:28:25
...

简单写了如何通过java的Net来操作hdfs,这个例子中出现了不少的错误,比如hdfs的请求不是别,请求地址拒绝等问题,以及客户端调用不能使用本地的hadoop包的问题等。

Java文件读取

读取hadoop中的/NOTICE.txt的文件,使用Java提供的NET包

Hadoop之Java通过URL操作HDFS-yellowcong

环境搭建

1.配置jar包

导入jar包,需要导入的jar所在目录

#common包中的jar文件导入
hadoop-2.8.1\share\hadoop\common\*.jar
hadoop-2.8.1\share\hadoop\common\hadoop-common-2.8.1.jar

#客户端需要配置,里面有hdfs封装的client,不导入会报错
hadoop-2.8.1\share\hadoop\hdfs\hadoop-hdfs-client-2.8.1.jar

文件目录中,share文件目录中,存在我们开发所需要的工具包

Hadoop之Java通过URL操作HDFS-yellowcong

Hadoop之Java通过URL操作HDFS-yellowcong

Hadoop之Java通过URL操作HDFS-yellowcong

2.配置 log4j日志

log4j.rootLogger=WARN, stdout, R
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
# Pattern to output the caller's file name and line number.
#log4j.appender.stdout.layout.ConversionPattern=%5p [%t] (%F:%L) - %m%n
# Print the date in ISO 8601 format
log4j.appender.stdout.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=example.log
log4j.appender.R.MaxFileSize=100KB
# Keep one backup file
log4j.appender.R.MaxBackupIndex=1
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n
# Print only messages of level WARN or above in the package com.foo.
log4j.logger.com.foo=WARN

访问Hdfs代码

package com.yellowcong.demo;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.hdfs.util.IOUtilsClient;

public class Demo1 {

    //在Hadoop的hdfs上,文件路径
    private static final String PATH="hdfs://192.168.110.110:9000/NOTICE.txt";

    public static void main(String[] args) throws Exception {
        //设定开启HDFS协议
        URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());

        //开启URL
        InputStream in =  new URL(PATH).openStream();

        //读取文件
        BufferedReader reader = new BufferedReader(new InputStreamReader(in));
        String line = null;
        while((line =reader.readLine())!= null){
            System.out.println(line.toString());
        }

        //hadoop的IoUTils
//      org.apache.hadoop.io.IOUtils.copyBytes(in, System.out, 1024,true);
    }
}

常见错误

java.net.MalformedURLException: unknown protocol: hdfs

不知道有hdfs协议的问题,是由于默认的情况下,只支持http协议的,所以需要加上下面一段代码就可以解决问题

//设定开启HDFS协议
        URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());

Unable to load native-hadoop library for your platform

由于我们是在win系统下调用的hadoop客户端,导致不能使用本地的hadoop类,需要导入hadoop-hdfs-client-2.8.1.jar,这样就可以解决问题了

2017-08-19 12:55:52,374 [main] WARN  org.apache.hadoop.util.Shell - Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
2017-08-19 12:55:52,729 [main] WARN  org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.net.MalformedURLException: unknown protocol: hdfs

ConnectException: 拒绝连接

通过域名访问hdfs的资源的时候,出错,解决的方法主要有两种,修改后, stop-all.sh、start-all.sh重启生效
1、修改core-site.xml文件,将fs.defaultFS 修改为ip地址就可以了(第一种挺好用,是基于IP访问)
2、将/etc/hosts中添加 DHCP获取或者自己设置的IP地址 到localhost主机名的映射.(基于域名,主机名称访问)

cat: Call From localhost.localdomain/127.0.0.1 to 192.168.110.110:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
#查看端口
netstat -tpnl

Hadoop之Java通过URL操作HDFS-yellowcong

1、解决办法

修改core-site.xml文件,将fs.defaultFS 修改为ip地址就可以了

<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
 <property>
    <name>fs.defaultFS</name>
    <!--修改为本机的ip地址-->
    <value>hdfs://192.168.110.110:9000</value>
    <final>true</final>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/hadoop/tmp</value>
  </property>
</configuration>

Hadoop之Java通过URL操作HDFS-yellowcong

修改/etc/hosts

2.另外可以尝试在/etc/hosts中添加 DHCP获取或者自己设置的IP地址 到localhost主机名的映射.这种方式是基于主机名称的访问,对于ip访问无效,下面的图是错误的,需要将192192.168.110.110换成主机名称

vim /etc/hosts

Hadoop之Java通过URL操作HDFS-yellowcong

开放9000端口

无论做哪一个步,都需要开启9000端口,或者说是关闭防火墙
当我们直接访问linux的时候,肯定完犊子,需要开启9000端口

#开放9000端口
iptables -I INPUT -p tcp -m tcp --dport 9000 -m state --state NEW,ESTABLISHED -j ACCEPT

#重启防火墙
service iptables save 

#保存iptables
service iptables restart

Hadoop之Java通过URL操作HDFS-yellowcong

java.net.ConnectException: Connection timed out: no further information

连接超时,由于系统的反应比较的慢,所以导致了连接超时,出现这种情况大多是结点断了,没有连接上。

2017-08-19 13:38:00,030 [main] WARN  org.apache.hadoop.hdfs.DFSClient - No live nodes contain block BP-1843922645-127.0.0.1-1503069127033:blk_1073741827_1003 after checking nodes = [DatanodeInfoWithStorage[192.168.110.110:50010,DS-8f5c9529-30b0-4e1e-879a-85e56b5ed9b7,DISK]], ignoredNodes = null
2017-08-19 13:38:00,031 [main] WARN  org.apache.hadoop.hdfs.DFSClient - DFS chooseDataNode: got # 1 IOException, will wait for 2032.56248782335 msec.