欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

node.lock消失问题记录,ELKB-6.0.0安装,以及更新license日志

程序员文章站 2022-05-12 22:03:25
...

问题

发现硬盘爆满,装满了elastic的错误日志,具体报错为:

[2017-11-24T15:59:53,590][WARN ][o.e.e.NodeEnvironment    ] [es_node1] lock assertion failed
java.nio.file.NoSuchFileException: /tmp/elasticsearch/data/nodes/0/node.lock
[2017-11-24T15:59:53,755][DEBUG][o.e.a.a.c.s.TransportClusterStatsAction] [es_node1] failed to execute on node [PodBWQxJTGqJioeYk7TBsw]
org.elasticsearch.transport.RemoteTransportException: [es_node1][192.168.5.233:9300][cluster:monitor/stats[n]]
Caused by: java.lang.IllegalStateException: environment is not locked
Caused by: java.nio.file.NoSuchFileException: /tmp/elasticsearch/data/nodes/0/node.lock
[2017-11-24T15:59:53,935][ERROR][o.e.x.m.c.n.NodeStatsCollector] [es_node1] collector [node-stats] failed to collect data
org.elasticsearch.action.FailedNodeException: Failed node [PodBWQxJTGqJioeYk7TBsw]
Caused by: org.elasticsearch.transport.RemoteTransportException: [es_node1][192.168.5.233:9300][cluster:monitor/nodes/stats[n]]
Caused by: java.lang.IllegalStateException: environment is not locked
Caused by: java.nio.file.NoSuchFileException: /tmp/elasticsearch/data/nodes/0/node.lock

值得一记的是ELK的另一个Slave节点下也出现了这个问题,但是是在26号才开始报这个错的.

同时kibana在2天后奔溃,最后的日志是elastic还在初始化Monitoring,然后报RSP超时。

另外elastic日志中还看到了license到期的问题,elastic如果不申请license,就只有30天试用期,到期之后会有这些问题:

# License [will expire] on [Thursday, November 30, 2017]. If you have a new license, please update it.
# Otherwise, please reach out to your support contact.
#
# Commercial plugins operate with reduced functionality on license expiration:
# - security
# - Cluster health, cluster stats and indices stats operations are blocked
# - All data operations (read and write) continue to work
# - watcher
# - PUT / GET watch APIs are disabled, DELETE watch API continues to work
# - Watches execute and write to the history
# - The actions of the watches don’t execute
# - monitoring
# - The agent will stop collecting cluster and indices metrics
# - The agent will stop automatically cleaning indices older than [xpack.monitoring.history.duration]
# - graph
# - Graph explore APIs are disabled
# - ml
# - Machine learning APIs are disabled
# - deprecation
# - Deprecation APIs are disabled
# - upgrade
# - Upgrade API is disabled

处理

首先kibana崩溃可能是由于elastic的异常,license的问题也可以在30号之前处理,最需要解决的是的找到无限报错的问题,看报错可以知道是由于node.lock文件的消失,进data文件夹下发现确实这个文件都不见了.
因此寻找这个问题的发生原因,没找到,google上只搜到一个人和我有同样的问题,并且他也从3月开始有这个问题直到7月也没解决,他的版本是5.2.2.我的是5.6.2.
最后由于无法解决,决定安装6.0.0试试

6.0.0ELKB安装

首先在ELKB中文官网https://www.elastic.co/cn/products 下载最新的匹配的安装包,目前先不使用logstash,直接使用轻量级的filebeat
elasticsearch-6.0.0.tar.gz
filebeat-6.0.0-linux-x86_64.tar.gz
kibana-6.0.0-linux-x86_64.tar.gz
然后在linux下解压,修改ELB的配置文件

首先是elastic

vim config/elasticsearch.yml

内容与5.6基本一致:
配置了:

cluster.name: es_clusterNew

node.name: es_node1

path.data: /tmp/elasticsearch/data
path.logs: /tmp/elasticsearch/logs

network.host: Master
discovery.zen.ping.unicast.hosts: ["Master", "Slave1"]
discovery.zen.ping_timeout: 120s
client.transport.ping_timeout: 60s

xpack.security.enabled: false

然后JVM内存配置:

vim config/jvm.options

发现这版本默认的jvm内存使用为1G了

-Xms1g
-Xmx1g

顺带安装x-pack插件

./bin/elasticsearch-plugin install x-pack

然后是filebeat:

vim filebeat.yml

这次6.0变化最大的应该是filebeat,多出了很多配置项,如下filebeatmodules,应该是filebeat多了很多模式,并且在modules.d/下有各种配置文件,这里要把reload.enabled改为true才能使这些配置文件生效.

#============================= Filebeat modules ===============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes

多了一个elasticsearch template的设置如下

#==================== Elasticsearch template setting ==========================

setup.template.settings:
  index.number_of_shards: 3
  #index.codec: best_compression
  #_source.enabled: false

看来是关于一个数据分为多少个碎片,数据压缩相关的设置,在6.0之前,elastic数据默认是分为5个碎片.这里暂不作改动.
之后看了下,确实由filebeat提交的数据,分为3个碎片了,至于elastic本来默认的碎片数,是否改变还未验证
还多了一个kibana设置,可以通过KibanaAPI加载beat到dashboards,这里配置你kibana所在地址就可以

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  host: "Master:5601"

其他配置与以前版本相同
输入配置,这里多了一个enabled配置,需要改为true才有效:

#=========================== Filebeat prospectors =============================

filebeat.prospectors:

# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- type: log

  # Change to true to enable this prospector configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /home/hadoop/logs/test/*.log
    #- c:\programdata\elasticsearch\logs\*

然后是多行录入配置:

  multiline.pattern: ^\t.*$
  multiline.negate: false
  multiline.match: after
  max_line: 10

输出配置:

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["Master:9200"]

然后配置kibana

vim config/kibana.yml 

与5.6基本一致,简单设置一下

server.host: "Master"

server.name: "LogAnalysis"

elasticsearch.url: "http://Master:9200"

xpack.security.enabled: false

然后安装x-pack:

./bin/kibana-plugin install x-pack

启动elastic,filebeat,kibana
其中filebeat不能用start副命令了,并且官方启动命令也添加了-c [配置文件] 的部分.
发现6.0kibana的x-pack初次启动比5.6快很多.
这里要注意如果是从6.0以前的版本升级到6.0的话,date和log文件如果没有改变,会报错提示你升级kibana.
这里可以删除原来的date和log文件夹或者换个date和log目录以解决问题

license

到此安装完成,node.lock的问题是否解决目前不知道,先解决license的问题,我们安装elastic有30天的试用期,快到期的时候会在日志中每10分钟提示1次.
这里重新安装之后log中不在有license的提醒,我们可以通过一下命令查询

curl -XGET 'master:9200/_license'

结果:
node.lock消失问题记录,ELKB-6.0.0安装,以及更新license日志
可以看到确实只有一个月使用期.
到官网注册申请license,这个是免费的,可以持续一年,当到期,可以再次申请延长,注册地址
之后你的邮箱会收到一封邮件,点进去完成注册并获得license.下载对应版本的license文件.这里假设文件为license.json
根据官方文档,将license添加进elastic:

curl -XPUT -u elastic 'master:9200/_xpack/license' -H "Content-Type: application/json" -d @license.json

这里如果安装了xpack会需要输入密码,默认密码为changeme
如果返回失败:

{"acknowledged":false,"license_status":"valid","acknowledge":{"message":"This license update requires acknowledgement. To acknowledge the license, please read the following messages and update the license again, this time with the \"acknowledge=true\" parameter:","watcher":["Watcher will be disabled"],"logstash":["Logstash specific APIs will be disabled, but you can continue to manage and poll stored configurations"],"security":["The following X-Pack security functionality will be disabled: authentication, authorization, ip filtering, and auditing. Please restart your node after applying the license.","Field and document level access control will be disabled.","Custom realms will be ignored."],"monitoring":["Multi-cluster support is disabled for clusters with [BASIC] license. If you are\nrunning multiple clusters, users won't be able to access the clusters with\n[BASIC] licenses from within a single X-Pack Kibana instance. You will have to deploy a\nseparate and dedicated X-pack Kibana instance for each [BASIC] cluster you wish to monitor.","Automatic index cleanup is locked to 7 days for clusters with [BASIC] license."],"graph":["Graph will be disabled"],"ml":["Machine learning will be disabled"]}}

那就需要重新提交申请并将acknowledge设置为true:

curl -XPUT -u elastic 'master:9200/_xpack/license?acknowledge=true' -H "Content-Type: application/json" -d @license.json

再次查看许可
node.lock消失问题记录,ELKB-6.0.0安装,以及更新license日志
有了一年的使用期

至此安装6.0.0以及更新license完成.

node.lock问题

至于node.lock和kibana的问题需要继续观察.到2017-12-2号,还没有再次发生.但是为了防止事故一旦发生就炸掉硬盘,可以写一个简单的shell,”当检测到node.lock消失时,创建一个node.lock.”加入计划任务.来亡羊补牢. 到12月11号发现,12月8号和9号分别发生了一次node.lock消失,脚本生成了一个假的lock文件,但是文件不能通过elastic的验证,所以一直报错。并且,重启elastic之后也仍然报错。不过在这个版本,报错信息不会导致硬盘写满,对于lock文件错误的日志写一定数量之后就会停止。运行了几天错误日志也就几个1M的zip包。lock文件消失的原因还是没找到。由于假的lock文件不能通过elastic的验证,我们重启elastic还要删除这个文件。因此改为检查到文件消失就重启elastic。

#!/bin/bash
path=
now=`date "+%Y-%m-%d %H:%M:%S"`
logpath=
cd $path
echo "${now}" >> $logpath
if test ! -f 'node.lock' ;then
  echo "Lockfile disappeared" >> $logpath
  ps -aux|grep elasticsearch|awk '{print $2}'|xargs kill
  echo "kill elastic" >> $logpath
  sleep 1m
  cd /home/hadoop/elk/elasticsearch-6.0.0
  ./bin/elasticsearch -d
  echo "elastic restart" >> $logpath
else
  echo "All right" >> $logpath
fi

path是当时我node.lock消失的文件夹.
logpath用于检测消失的结果文件的位置
cd /home/hadoop/elk/elasticsearch-6.0.0这里写elastic的目录
加入计划任务:
crontab -e

0 */4 * * * /tmp/elasticsearch/prevent.sh

4小时检测一次

后记

1

发现,在集群的其他节点安装elastic后,license会自动加到其他节点。不需要手动重复XPUT过程。

2

后来还出现了kibana无法正常启动的问题

启动kibana多个模块报 elasticsearch is still initialize kibana的error。是kinaba的数据坏了
按网上方法删除kibana引索可以启动,但是会引发其他问题:会丢失所有的配置、索引、图形和报表。
以下代码会引起配置等丢失,慎用

curl -XDELETE 127.0.0.1:9200/.kibana