Zeppelin的初体验--安装,hive on Zeppelin
简介
eppelin是一个基于Web的notebook,提供交互数据分析和可视化。后台支持接入多种数据处理引擎,如spark,hive等。支持多种语言: Scala(Apache Spark)、Python(Apache Spark)、SparkSQL、 Hive、 Markdown、Shell等。本文主要介绍Zeppelin中Interpreter和SparkInterpreter的实现原理。
-
Zeppelin 下载地址:
wget https://mirror.bit.edu.cn/apache/zeppelin/zeppelin-0.8.2/zeppelin-0.8.2-bin-all.tgz
-
解压 安装 Zeppelin
# 解压
[hadoop@hadoop001 software]$ tar -zxvf zeppelin-0.8.2-bin-all.tgz -C ~/app/
# 配置文件
[hadoop@hadoop001 ~]$ cd app/zeppelin-0.8.2-bin-all/conf
[hadoop@hadoop001 conf]$ vi zeppelin-site.xml
# 修改两个配置 其他默认
<property>
<name>zeppelin.server.addr</name>
<value>hadoop001</value> #自己主机的ip 或 0.0.0.0
<description>Server binding address</description>
</property>
<property>
<name>zeppelin.server.port</name>
<value>8084</value> #注意端口是否被占用 默认端口 8080
<description>Server port.</description>
</property>
[hadoop@hadoop001 conf]$ vi zeppelin-env.sh
export JAVA_HOME=/root/apps/jdk1.8.0_221
export SPARK_HOME=/home/hadoop/app/spark-2.4.4-bin-2.6.0-cdh5.15.1
export SPARK_APP_NAME="ZeppelinAaron"
export HADOOP_CONF_DIR=/root/apps/hadoop/etc/hadoop
- 启动 Zeppelin
./zeppelin-daemon.sh start
-------------------------------------------------------至此 Zeppelin 已经完成安装-----------------------------------------------------------------------
hive on Zeppelin
-
配置Hive Interpreter
nterpreter 是Zeppelin里最重要的概念,每一种Interpreter对应一个引擎。Hive对应的Interpreter是Jdbc Interpreter, 因为Zeppelin是通过Hive的Jdbc接口来运行Hive SQL。
接下来你可以在Zeppelin的Interpreter页面配置Jdbc Interpreter来启用Hive。首先我想说明的是Zeppelin的Jdbc Interpreter可以支持所有Jdbc协议的数据库,Zeppelin 的Jdbc Interpreter默认是连接Postgresql。
启动Hive,可以有2种选择:
- 修改默认jdbc interpreter的配置项(这种配置下,在Note里用hive可以直接 %jdbc 开头)
- 创建一个新的Jdbc interpreter,命名为hive (这种配置下,在Note里用hive可以直接 %hive 开头)
这里我会选用第2种方法。创建一个新的hive interpreter,然后配置以下基本的属性(你需要根据自己的环境做配置)
配置项 | 值 |
---|---|
default.driver | org.apache.hive.jdbc.HiveDriver (注: Zeppelin中无此jar,需要自己添加依赖) |
default.url | jdbc:hive2://hadoop001:10000 (端口号10000,是 hive 中 HiveServer2默认值) |
default.user | hadoop |
default.password | *********** |
-
添加依赖 (注: jar版本需要与hive的版本一致)
hive.url的默认配置形式是 jdbc:hive2://host:port/<db_name>, 这里的host是你的hiveserver2的机器名,port是 hiveserver2的thrift 端口 (如果你的hiveserver2用的是binary模式,那么对应的hive配置是hive.server2.thrift.port (默认是10000),如果是http模式,那么对应的hive配置是hive.server2.thrift.http.port,(默认是10001) 。db_name是你要连的hive 数据库的名字,默认是default。 -
hive 开启hiveserver2
[aaa@qq.com bin]$ ./hiveserver2 start
[aaa@qq.com lib]$ jps -m
9984 RunJar /home/hadoop/app/hive/lib/hive-service-1.1.0-cdh5.15.1.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///home/hadoop/app/hive/auxlib/hive-exec-1.1.0-cdh5.15.1-core.jar start
- Zeppelin 操作hive中的表
hive on zeppelin 问题解决:
- java.lang.ClassNotFoundException: org.apache.hive.service.rpc.thrift.TCLIService$Iface
- 导入jar–hive-service-1.1.0.jar
java.lang.ClassNotFoundException: org.apache.hive.service.rpc.thrift.TCLIService$Iface
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.commons.dbcp2.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:79)
at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at org.apache.commons.dbcp2.PoolingDriver.connect(PoolingDriver.java:129)
- java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.auth.HiveAuthUtils
- 导入hive-common-1.1.0
java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.auth.HiveAuthUtils
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hive.jdbc.HiveConnection.createUnderlyingTransport(HiveConnection.java:376)
at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:396)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:201)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:168)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.commons.dbcp2.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:79)
at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at org.apache.commons.dbcp2.PoolingDriver.connect(PoolingDriver.java:129)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnectionFromPool(JDBCInterpreter.java:425)
at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:443)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:692)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
- java.lang.NoClassDefFoundError: com/google/common/primitives/Ints
- –导入jar guava-14.0.1.jar
java.lang.NoClassDefFoundError: com/google/common/primitives/Ints
at org.apache.hive.service.cli.Column.<init>(Column.java:150)
at org.apache.hive.service.cli.ColumnBasedSet.<init>(ColumnBasedSet.java:51)
at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37)
at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:367)
at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:191)
at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:191)
at org.apache.zeppelin.jdbc.JDBCInterpreter.getResults(JDBCInterpreter.java:567)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:749)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.google.common.primitives.Ints
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 20 more
- java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
- 解决: 查看HiveServer2日志 发现 权限问题
- Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=hive, access=EXECUTE, inode="/tmp/hadoop-yarn/staging":hadoop:supergroup:drwx------
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:295)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:737)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:820)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
下一篇: JQuery初体验(Demo)