欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Zeppelin的初体验--安装,hive on Zeppelin

程序员文章站 2022-05-01 11:03:00
...

简介

eppelin是一个基于Web的notebook,提供交互数据分析和可视化。后台支持接入多种数据处理引擎,如spark,hive等。支持多种语言: Scala(Apache Spark)、Python(Apache Spark)、SparkSQL、 Hive、 Markdown、Shell等。本文主要介绍Zeppelin中Interpreter和SparkInterpreter的实现原理。

  • 官方网址: http://zeppelin.apache.org/

  • Zeppelin 下载地址:
    wget https://mirror.bit.edu.cn/apache/zeppelin/zeppelin-0.8.2/zeppelin-0.8.2-bin-all.tgz

  • 解压 安装 Zeppelin

# 解压
[hadoop@hadoop001 software]$ tar -zxvf zeppelin-0.8.2-bin-all.tgz -C ~/app/
# 配置文件
[hadoop@hadoop001 ~]$ cd app/zeppelin-0.8.2-bin-all/conf

[hadoop@hadoop001 conf]$ vi zeppelin-site.xml 
# 修改两个配置 其他默认
<property>
  <name>zeppelin.server.addr</name>
  <value>hadoop001</value>  #自己主机的ip 或 0.0.0.0
  <description>Server binding address</description>
</property>

<property>
  <name>zeppelin.server.port</name>
  <value>8084</value>  #注意端口是否被占用 默认端口 8080
  <description>Server port.</description>
</property>

[hadoop@hadoop001 conf]$ vi zeppelin-env.sh
export JAVA_HOME=/root/apps/jdk1.8.0_221
export SPARK_HOME=/home/hadoop/app/spark-2.4.4-bin-2.6.0-cdh5.15.1
export SPARK_APP_NAME="ZeppelinAaron"
export HADOOP_CONF_DIR=/root/apps/hadoop/etc/hadoop

  • 启动 Zeppelin
    ./zeppelin-daemon.sh start
    -------------------------------------------------------至此 Zeppelin 已经完成安装-----------------------------------------------------------------------

hive on Zeppelin

  • 配置Hive Interpreter

      nterpreter 是Zeppelin里最重要的概念,每一种Interpreter对应一个引擎。Hive对应的Interpreter是Jdbc Interpreter, 因为Zeppelin是通过Hive的Jdbc接口来运行Hive SQL。

       接下来你可以在Zeppelin的Interpreter页面配置Jdbc Interpreter来启用Hive。首先我想说明的是Zeppelin的Jdbc Interpreter可以支持所有Jdbc协议的数据库,Zeppelin 的Jdbc Interpreter默认是连接Postgresql。

启动Hive,可以有2种选择:

  • 修改默认jdbc interpreter的配置项(这种配置下,在Note里用hive可以直接 %jdbc 开头)
  • 创建一个新的Jdbc interpreter,命名为hive (这种配置下,在Note里用hive可以直接 %hive 开头)

这里我会选用第2种方法。创建一个新的hive interpreter,然后配置以下基本的属性(你需要根据自己的环境做配置)

配置项
default.driver org.apache.hive.jdbc.HiveDriver (注: Zeppelin中无此jar,需要自己添加依赖)
default.url jdbc:hive2://hadoop001:10000 (端口号10000,是 hive 中 HiveServer2默认值)
default.user hadoop
default.password ***********
  • 添加依赖 (注: jar版本需要与hive的版本一致)
    Zeppelin的初体验--安装,hive on Zeppelin
       hive.url的默认配置形式是 jdbc:hive2://host:port/<db_name>, 这里的host是你的hiveserver2的机器名,port是 hiveserver2的thrift 端口 (如果你的hiveserver2用的是binary模式,那么对应的hive配置是hive.server2.thrift.port (默认是10000),如果是http模式,那么对应的hive配置是hive.server2.thrift.http.port,(默认是10001) 。db_name是你要连的hive 数据库的名字,默认是default。

  • hive 开启hiveserver2

[aaa@qq.com bin]$ ./hiveserver2 start
[aaa@qq.com lib]$ jps -m
9984 RunJar /home/hadoop/app/hive/lib/hive-service-1.1.0-cdh5.15.1.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///home/hadoop/app/hive/auxlib/hive-exec-1.1.0-cdh5.15.1-core.jar start

Zeppelin的初体验--安装,hive on Zeppelin

  • Zeppelin 操作hive中的表
    Zeppelin的初体验--安装,hive on Zeppelin

hive on zeppelin 问题解决:

  • java.lang.ClassNotFoundException: org.apache.hive.service.rpc.thrift.TCLIService$Iface
  • 导入jar–hive-service-1.1.0.jar
java.lang.ClassNotFoundException: org.apache.hive.service.rpc.thrift.TCLIService$Iface
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:208)
	at org.apache.commons.dbcp2.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:79)
	at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205)
	at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
	at org.apache.commons.dbcp2.PoolingDriver.connect(PoolingDriver.java:129)

  • java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.auth.HiveAuthUtils
  • 导入hive-common-1.1.0
java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.auth.HiveAuthUtils
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.hive.jdbc.HiveConnection.createUnderlyingTransport(HiveConnection.java:376)
	at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:396)
	at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:201)
	at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:168)
	at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:208)
	at org.apache.commons.dbcp2.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:79)
	at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205)
	at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
	at org.apache.commons.dbcp2.PoolingDriver.connect(PoolingDriver.java:129)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:270)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnectionFromPool(JDBCInterpreter.java:425)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:443)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:692)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
	at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
  • java.lang.NoClassDefFoundError: com/google/common/primitives/Ints
  • –导入jar guava-14.0.1.jar
java.lang.NoClassDefFoundError: com/google/common/primitives/Ints
	at org.apache.hive.service.cli.Column.<init>(Column.java:150)
	at org.apache.hive.service.cli.ColumnBasedSet.<init>(ColumnBasedSet.java:51)
	at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37)
	at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:367)
	at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:191)
	at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:191)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.getResults(JDBCInterpreter.java:567)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:749)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
	at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.google.common.primitives.Ints
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 20 more
  • java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
  • 解决: 查看HiveServer2日志 发现 权限问题
  • Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=hive, access=EXECUTE, inode="/tmp/hadoop-yarn/staging":hadoop:supergroup:drwx------
	java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:295)
	at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
	at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:737)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
	at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
相关标签: Hive Zeppelin