欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

logstash 出现的问题

程序员文章站 2022-06-09 12:03:57
...

以下所列问题,都是本人在使用Logstash做异库同步业务中,所遇到的问题,记录一下,供大家参考。很多问题解决后,想想感觉很崩溃,尤其是第二个问题。

一、

[FATAL][logstash.runner          ] An unexpected error occurred! {:error=>#<NoMethodError: undefined method `[]' for nil:NilClass>, :backtrace=>["/export/servers/logstash-6.2.2/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:187:in `buffer_flush'", "/export/servers/logstash-6.2.2/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:112:in `block in buffer_initialize'", "org/jruby/RubyKernel.java:1292:in `loop'", "/export/servers/logstash-6.2.2/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:110:in `block in buffer_initialize'"]}

[INFO ][logstash.pipeline        ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#<Thread:0x1b2da3f0 run>"}
[INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}
[ERROR][org.logstash.Logstash    ] java.lang.IllegalStateException: org.jruby.exceptions.RaiseException: (SystemExit) exit

这个问题很隐晦。当我们使用-t对logstash的配置文件进行测试的时候,没问题,但是当运行后,就会出现这样的错误。在我多次的测试中,发现这其实还是配置文件的问题,只是logstash架构自己不能检测出来。所以遇到这样的问题,别慌。细心检查自己的配置文件。

二、

[WARN ][logstash.outputs.webhdfs ] Failed to flush outgoing items {:outgoing_count=>1, :exception=>"LogStash::Error", :backtrace=>["org/logstash/ext/JrubyEventExtLibrary.java:202:in `sprintf'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.4/lib/logstash/outputs/webhdfs.rb:194:in `flush'", "org/jruby/RubyArray.java:2409:in `collect'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.4/lib/logstash/outputs/webhdfs.rb:189:in `flush'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1342:in `each'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/buffer.rb:216:in `buffer_flush'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/buffer.rb:193:in `buffer_flush'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/buffer.rb:159:in `buffer_receive'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.4/lib/logstash/outputs/webhdfs.rb:182:in `receive'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/outputs/base.rb:92:in `multi_receive'", "org/jruby/RubyArray.java:1613:in `each'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/outputs/base.rb:92:in `multi_receive'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/output_delegator_strategies/legacy.rb:22:in `multi_receive'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/output_delegator.rb:47:in `multi_receive'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/pipeline.rb:420:in `output_batch'", "org/jruby/RubyHash.java:1342:in `each'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/pipeline.rb:419:in `output_batch'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/pipeline.rb:365:in `worker_loop'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/pipeline.rb:330:in `start_workers'"]}

这个问题,其实是我花了很长时间都没弄懂的。最终在亚洲和涛哥的帮助下才解决的,我觉得不是我能力不够,而是真的太出乎意料了。

原因是我的输出路径。我们先看一下官网对path的举例: e.g.: /user/logstash/dt=%{+YYYY-MM-dd}/%{@source_host}-%{+HH}.log。同样的写法我用在了我的配置文件中,只是我把log替换为了TXT。最开始的测试中,完全没问题,所以在出现问题后,我就根本没想过居然是这的问题。最终我替换成了/hive/logs/output.txt,最终解决了这个问题。其实后面我有试过改成dt=%{+YYYY-MM-dd}/%{+HH},几次测试又没问题了,在接近上线的时候,又出现为了上面的问题。最终保险决定不使用这种格式了。这个问题我已经提交官网,后续我也会关注其解决情况。

三、

[WARN ][logstash.outputs.webhdfs ] webhdfs write caused an exception: 
{"RemoteException":{"exception":"AlreadyBeingCreatedException",
"javaClassName":"org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException",
"message":"Failed to APPEND_FILE /output for DFSClient_NONMAPREDUCE_-688998419_40 on 10.66.90.167 
because this file lease is currently owned by DFSClient_NONMAPREDUCE_-380528477_38 on 10.66.90.167\n\tat 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2932)\n\tat 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2683)\n\tat 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2982)\n\tat 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2950)\n\tat 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:654)\n\tat 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)\n\tat 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)\n\tat 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)\n\tat 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1727)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)\n"}}. 
Maybe you should increase retry_interval or reduce number of workers. Retrying...
这个问题就很简单了,在我的输出配置中,我设置了:
retry_interval => 30  # 间隔多久向HDFS尝试重写

如果上述问题不能很好的解决,其实最终会引发下面的问题:

[ERROR][logstash.outputs.webhdfs ] Max write retries reached. Events will be discarded. Exception: {"RemoteException":{"exception":"IOException","javaClassName":"java.io.IOException","message":"append: lastBlock=blk_1073768614_56374 of src=/output is not sufficiently replicated yet.\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2690)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2982)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2950)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:654)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1727)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)\n"}}
最终导致写入HDFS数据请求失败。

我当时配置的时间很短,当写入的数据比较多的时候,就会警告。虽然不影响使用,但是看着不舒服,就增大了这个配置。

后续我会不断更新此文章,如果大家在开发遇到有意思的问题,也欢迎共同探讨。

相关标签: 问题总结