Hive/Impala--HAProxy实现Impala/HiveServer2负载均衡
程序员文章站
2022-07-11 11:42:03
...
HAProxy安装
1、在集群中选择一个节点,使用yum方式安装HAProxy服务
yum -y install haproxy
2.启动与停止HAProxy服务,并将服务添加到自启动列表
service haproxy start
service haproxy stop
chkconfig haproxy on
Impala配置
将/etc/haproxy目录下的haproxy.cfg文件备份,新建haproxy.cfg文件,添加如下配置
#---------------------------------------------------------------------
# Example configuration for a possible web application. See the
# full configuration options online.
#
# http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#---------------------------------------------------------------------
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
#option http-server-close
#option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
listen stats
bind 0.0.0.0:1080
mode http
option httplog
maxconn 5000
stats refresh 30s
stats uri /stats
listen impalashell
bind 0.0.0.0:25003
mode tcp
option tcplog
balance leastconn
server hadoop1 hadoop1:21000 check
server hadoop2 hadoop2:21000 check
server hadoop3 hadoop3:21000 check
listen impalajdbc
bind 0.0.0.0:25004
mode tcp
option tcplog
balance leastconn
server hadoop1 hadoop1:21050 check
server hadoop2 hadoop2:21050 check
server hadoop3 hadoop3:21050 check
主要配置了HAProxy的http状态管理界面、impalashell和impalajdbc的负载均衡。
配置完成后重启HAProxy
service haproxy restart
浏览器访问http://{hostname}:1080/stats查看状态界面
Impala Shell测试
使用多个终端同时访问,并执行SQL语句,查看是否会通过HAProxy服务自动负载到其它Impala Daemon节点
使用Impala shell访问HAProxy服务的25003端口,命令如下
impala-shell -i hadoop1:25003
打开第一个终端访问并执行SQL
impala-shell -i hadoop1:25003
...
select * from my_first_table;
...
+----+------+
| id | name |
+----+------+
| 1 | john |
| 2 | tom |
| 3 | jim |
+----+------+
Fetched 3 row(s) in 7.20s
同时打开第二个终端访问并执行SQL
impala-shell -i hadoop1:25003
...
select * from my_first_table;
...
+----+------+
| id | name |
+----+------+
| 1 | john |
| 2 | tom |
| 3 | jim |
+----+------+
Fetched 3 row(s) in 7.20s
通过以上测试可以看到,两个终端执行的SQL不在同一个Impala Daemon,这样就实现了Impala Daemon服务的负载均衡。
Impala JDBC访问
url改变为haproxy的host以及impala jdbc负载均衡配置的端口:
jdbc:impala://hadoop1:25004
Hive Server2负载均衡
编辑/etc/haproxy/haproxy.cfg文件,在文件末尾增加如下配置
listen hivejdbc
bind 0.0.0.0:25005
mode tcp
option tcplog
balance leastconn
server hadoop1 hadoop1:10000 check
server hadoop2 hadoop2:10000 check
重启HAProxy服务
service haproxy restart
若集群做了kerberos认证,那么需要在cm的hive配置页里面搜索:HiveServer2 Load Balancer
修改参数值为:hadoop1:25005
保存配置,回到CM主页根据提示重启相应服务。
Beeline测试
使用Beeline访问HAProxy服务的25005端口,命令如下
[[email protected] ~]# beeline
beeline> !connect jdbc:hive2://hadoop1:25005
...
Enter username for jdbc:hive2://hadoop1:25005: hive
Enter password for jdbc:hive2://hadoop1:25005:
Hive JDBC连接
url改变为haproxy的host以及hive jdbc负载均衡配置的端口:
jdbc:hive2://hadoop1:25005