1.Hive介绍
- 数据库OLTP 在线事务处理
- 数据仓库OLAP 在线分析处理 延迟高
- 类sql方式(HQL)
- 使用sql方式,用来读写,管理位于分布式存储系统上的大型数据集的数据仓库技术
- hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供完整的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。
- hive使用的是hdfs做为存储
- 使用maprecude做为计算模型
- 用于海量数据计算分析
2.安装
基于hadoop完全分布式环境(搭建过程略)
-
下载hive包
#wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.1.1/apache-hive-2.1.1-bin.tar.gz
-
解压到指定路径和创建软链接
#tar xf apache-hive-2.1.1-bin.tar.gz -C /soft/ #ln -s /soft/apache-hive-2.1.1-bin /soft/hive
-
配置环境变量
#vim /etc/profile 添加一下内容 HIVE_HOME=/soft/hive PATH=$PATH:$HIVE_HOME/bin #source /etc/profile
-
测试
#hive --version
3.配置mysql相关
- 安装mysql(过程省略)
-
创建相关数据库并授权
mysql>create database dbhive; mysql>use hive; mysql>grant all on dbhive.* to "hive"@"%" identified by "123456";
5. 配置hive
-
修改hive-site.xml(修改mysql相关配置)
#cd /soft/hive/conf #vim hive-site.xml <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>Username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.10.103:3306/dbhive</value> </property>
-
复制mysql驱动程序
#cp /root/mysql-connector-java-5.1.38-bin.jar /soft/hive/lib/
-
在mysql中初始化hive的schema
#cd /soft/hive/bin/ #./schematool -dbType mysql -initSchema which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/soft/jdk/bin:/soft/hadoop/bin:/soft/hadoop/sbin:/soft/hive/bin:/root/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:mysql://192.168.10.103:3306/dbhive Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: hive Starting metastore schema initialization to 2.1.0 Initialization script hive-schema-2.1.0.mysql.sql Initialization script completed schemaTool completed
-
查看mysql
mysql> use dbhive; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> show tables; +---------------------------+ | Tables_in_dbhive | +---------------------------+ | AUX_TABLE | | BUCKETING_COLS | | CDS | | COLUMNS_V2 | | COMPACTION_QUEUE | | COMPLETED_COMPACTIONS | | COMPLETED_TXN_COMPONENTS | | DATABASE_PARAMS | | DBS | | DB_PRIVS | | DELEGATION_TOKENS | | FUNCS | | FUNC_RU | | GLOBAL_PRIVS | | HIVE_LOCKS | | IDXS | | INDEX_PARAMS | | KEY_CONSTRAINTS | | MASTER_KEYS | | NEXT_COMPACTION_QUEUE_ID | | NEXT_LOCK_ID | | NEXT_TXN_ID | | NOTIFICATION_LOG | | NOTIFICATION_SEQUENCE | | NUCLEUS_TABLES | | PARTITIONS | | PARTITION_EVENTS | | PARTITION_KEYS | | PARTITION_KEY_VALS | | PARTITION_PARAMS | | PART_COL_PRIVS | | PART_COL_STATS | | PART_PRIVS | | ROLES | | ROLE_MAP | | SDS | | SD_PARAMS | | SEQUENCE_TABLE | | SERDES | | SERDE_PARAMS | | SKEWED_COL_NAMES | | SKEWED_COL_VALUE_LOC_MAP | | SKEWED_STRING_LIST | | SKEWED_STRING_LIST_VALUES | | SKEWED_VALUES | | SORT_COLS | | TABLE_PARAMS | | TAB_COL_STATS | | TBLS | | TBL_COL_PRIVS | | TBL_PRIVS | | TXNS | | TXN_COMPONENTS | | TYPES | | TYPE_FIELDS | | VERSION | | WRITE_SET | +---------------------------+ 57 rows in set (0.00 sec)
-
登录hive
#hive