欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Impala负载均衡

程序员文章站 2022-07-11 09:09:10
...

如有不妥之处,欢迎随时留言沟通交流,谢谢~

Impala分为是三个组件,statestored/catalogd和impalad,其中statestored和catalogd是单点的,没有高可用的需求,因为这两个实例是无状态的,本身不存储任何数据,例如catalogd的数据存储在第三方数据库(例如mysql中),statestore的数据全都存储在内存中,可以通过简单的主备的方式来实现高可用,本文最后会提到。正常情况下只有master提供服务,slave只是运行状态但是不接受任何请求,当master出现问题之后再slave提升为master提供服务。

而对于impalad节点,每一个节点都可以提供jdbc和thrift等服务,并且对于连接到该impalad的查询作为coordinator节点(需要消耗一定的内存和CPU)存在,为了保证每一个节点的负载的平衡需要对于这些impalad做一下均衡,负载均衡分为四层负载均衡和七层负载均衡,前者是针对运输层的,后者是针对应用层的,区别在于前者不需要了解应用协议,只需要对传输层收到的IP数据包进行转发,而后者需要了解应用协议的,而对于impalad这种SQL服务器,就需要使用SQL协议的代理,所以七层代理对于impalad是有点不切实际的。

主要用的就是haproxy四层交换机的特性,讲所有指向haproxy主机和端口的请求,转发到相应的主机:端口上。

1、安装haproxy

yum install haproxy

2、配置文件

vim /etc/haproxy/haproxy.cfg

文件内容

global
    # To have these messages end up in /var/log/haproxy.log you will
    # need to:
    #
    # 1) configure syslog to accept network log events.  This is done
    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
    #    /etc/sysconfig/syslog
    #
    # 2) configure local2 events to go to the /var/log/haproxy.log
    #   file. A line like the following can be added to
    #   /etc/sysconfig/syslog
    #
    #    local2.*                       /var/log/haproxy.log
    #
    log         127.0.0.1 local0
    log         127.0.0.1 local1 notice
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    #stats socket /var/lib/haproxy/stats

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#
# You might need to adjust timing values to prevent timeouts.
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    maxconn                 3000

    #连接时间需要修改,因为如果时间较短的话会出现,任务在执行,但是连接已经断开所以获取不到结果的情况。
    timeout connect 3600000ms
    timeout client 3600000ms
    timeout server 3600000ms

#
# This sets up the admin page for HA Proxy at port 25002.
#
listen stats :25002
    balance
    mode http
    stats enable
    stats auth admin:admin

#配置修改
listen impala-shell :25003
    mode tcp
    option tcplog
    balance leastconn

    server impala2 impala-host02:21000
    server impala3 impala-host03:21000
    server impala4 impala-host04:21000
    server impala5 impala-host05:21000

# This is the setup for Impala. Impala client connect to load_balancer_host:25003.
# HAProxy will balance connections among the list of servers listed below.
# The list of Impalad is listening at port 21000 for beeswax (impala-shell) or original ODBC driver.
# For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000.

#配置修改
listen impala-jdbc:25001

    # impala负载均衡需要第四层的,所以填tcp
    mode tcp
    option tcplog
    balance leastconn

    #主机列表
    #impala-host02是impala的主机列表
    #由于是要对jdbc的请求进行转发,所以端口设置的是21050
    server impala2 impala-host02:21050
    server impala3 impala-host03:21050
    server impala4 impala-host04:21050
    server impala5 impala-host05:21050
                                    

3、启动

/usr/sbin/haproxy  -f   /etc/haproxy/haproxy.cfg

4、负载连接

haproxy-host-ip是haproxy的ip地址,主机名也可以,负载连接时用

impala-shell:haproxy-host-ip:25003  
impala-jdbc:haproxy-host-ip:25001


在haproxy上,使用impala-shell(localhost可以改成haproxy-host-ip):

[email protected]:~$ impala-shell -i localhost:25003
Starting Impala Shell without Kerberos authentication
Connected to localhost:25003
Server version: impalad version 2.9.0-cdh5.12.0 RELEASE (build 03c6ddbdcec39238be4f5b14a300d5c4f576097e)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.11.0-cdh5.14.2 (ed85dce) built on Tue Mar 27 13:39:48 PDT 2018)

You can run a single query from the command line using the '-q' option.
***********************************************************************************
[localhost:25003] > select * from t_sec_hurst_bs_dim limit 1;
Query: select * from t_sec_hurst_bs_dim limit 1
Query submitted at: 2018-07-25 20:16:07 (Coordinator: http://bigdata04:25000)
ERROR: AnalysisException: Could not resolve table reference: 't_sec_hurst_bs_dim'