欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

[HBase]记一个state异常 hbaserpcstate 

程序员文章站 2022-06-17 13:33:49
...

昨天联调的时候发现一个问题,请求发送后,页面上一直不返回数据,好像一直在waiting。用stack看了下,第一个线程在等待hbase RPC请求返回数据

"http-bio-8080-exec-5" daemon prio=10 tid=0x00002aaab3661000 nid=0x2bb6 in Object.wait() [0x0000000045258000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000d2067280> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:907)
        - locked <0x00000000d2067280> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
        at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:157)
        at $Proxy9.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.hbase.ipc.SecureRpcEngine.getProxy(SecureRpcEngine.java:198)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:335)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:312)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:364)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1293)
        - locked <0x00000000d1d791e0> (a java.lang.String)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1

 而其他线程则处于block状态

"http-bio-8080-exec-8" daemon prio=10 tid=0x0000000041d07800 nid=0x2f5b waiting for monitor entry [0x0000000046d74000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1283)
        - waiting to lock <0x00000000d1d791e0> (a java.lang.String)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1249)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1236)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:945)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:841)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:810)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:942)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:845)

因为之前看过对应的代码,请见 http://iwinit.iteye.com/blog/1815347 和 http://iwinit.iteye.com/blog/1811265

看代码是region location的时候RPC meta表所在regionserver的时候卡住了。因为系统刚起来的时候没有cache,所以请求都需要region location,.meta.只在一台rs上,所以都block住了。再看之前的RPC分析文章,第一个请求确实已经发出去了,处于wait状态,也就是等待IO线程notify。奇怪的是,jstack里并没有IO线程(IPC关键字)。也就是说第一个请求线程永远无法被唤醒了:(。Log也没有任何记录,开始各种debug,因为我们客户端是自己的client,SecureConnection,其receiveResponse方法如下

try {
        int id = in.readInt();                    // try to read an id
 
        if (LOG.isDebugEnabled())
          LOG.debug(getName() + " got value #" + id);
//这里把call删除掉了,后续IO线程会因为空闲而自动关闭退出
        Call call = calls.remove(id);
//协议中的state字段
        int state = in.readInt();     // read call status
        if (LOG.isDebugEnabled()) {
          LOG.debug("call #"+id+" state is " + state);
        }
//对state处理,注意这里没有从的else处理,也就是说如果state不是这3种的话,直接就不处理了,但是call请求确已经被删除掉了。。。
        if (state == Status.SUCCESS.state) {
          Writable value = ReflectionUtils.newInstance(valueClass, conf);
          value.readFields(in);                 // read value
          if (LOG.isDebugEnabled()) {
            LOG.debug("call #"+id+", response is:\n"+value.toString());
          }
          call.setValue(value);
        } else if (state == Status.ERROR.state) {
          call.setException(new RemoteException(WritableUtils.readString(in),
                                                WritableUtils.readString(in)));
        } else if (state == Status.FATAL.state) {
          // Close the connection
          markClosed(new RemoteException(WritableUtils.readString(in),
                                         WritableUtils.readString(in)));
        }
      } catch (IOException e) {

 debug的时候发现server端范围的state是5xxxxxx的数字。。。原因是sever段没有启用secure导致协议不一致了。。client没有容错导致。

 

 

相关标签: hbase rpc state