Elasticsearch5.X配置说明

程序员文章站 2022-07-09 19:10:28

...

elasticsearch.yml配置说明

此配置说明来源于互联网，在实际使用时也有自己增加的一些配置项说明，可以参考借鉴，可用于配置优化
注意：许多配置项实际在ELK*X中使用时，会发生错误，配置会失败，请小心使用

Cluster集群

集群中有多个节点，其中有一个为主节点，这个主节点是可以通过选举产生的，主从节点是对于集群内部来说的，外部是不需要知道
ELK一个重要的概念就是去中心化，对于集群外部来说，ELK集群是一个整体，你与任何一个节点的通信，效果都是一样的
cluster.name: elasticsearch
cluster.name用于指定集群名称，同一网段中elasticsearch会自动的找到具有相同cluster.name的集群

Node节点

ELK集群是由各个节点结成
node.name: node1
node.name用于指定节点名称，同一集群中的节点名称不能重复
node.master: true
node.master用于指定节点是否可以竟争主节点，默认集群中的第一台机器为master,如果这台机器停止就会重新选举master，默认为true
node.data: true
node.data用于指定节点是否存储数据，默认为true
结合master和data配置，可以得出几种配置来优化
默认配置：节点即可以竟争主节点也存储数据，这样对于一个主节点的压力会增大，配置形式如下：
node.master: true
node.data: true
数据存储：节点只用于数据存储，不竟争主节点，可作为负载器，配置形式如下：
node.master: false
node.data: true
节点协调：节点只用于竟争主节点，不存储数据，保有空闲资源，可作为协调器
node.master: true
node.data: false
数据搜索：节点不存储数据，也不竟争主节点，可做为一个搜索器，从节点中获取数据，生成搜索结果等
node.master: false
node.data: false
其它配置：一般保持默认，具体意义见官网
node.rack: rack314
node.max_local_storage_nodes: 1

Index索引

用于为数据建立检索索引，可快速检索数据
index.number_of_shards: 5
设置索引的分片数，默认为5。分片的数量需要根据机器的数量与性能调整，合理分片可最大化利用机器。建议一个节点平均的分片数在3个内
注意： number_of_shards只在索引创建时一次生成，后期不可以修改
index.number_of_replicas: 1
设置索引的副本数，默认为1。副本主要用于数据冗余，可在丢失节点的情况下保证数据健全。这个值是可以通过接口修改

Indices条目

indices.query.bool.max_clause_count: 10240
indices.query.bool.max_clause_count用于设置请求bool条件的最大条目数量，默认是1024。如果请求的bool条件条目数量大于此值时将不给查询

Paths路径

用于指定ELK数据与日志文件存放路径
path.conf: /path/conf
path.conf用于指定ELK配置文件路径，一般保持默认，除非你改变了配置文件的路径
path.data: /path/data
path.data用于指定ELK数据存放位置。也可以指定多个位置，用逗号分隔，如：/path/data1,/path/data2
path.work: /path/work
path.work用于指定临时文件存放路径，一般保持默认
path.logs: /path/logs
path.logs用于指定日志文件存放路径
path.plugins: /path/plugins
path.plugins用于存放ELK插件目录，一般保持默认，除非你改变了插件的路径

Plugin插件

用于插件加载
plugin.mandatory: mapper-attachments,lang-groovy
plugin.mandatory设置插件作为启动条件，如果一下插件没有安装，则该节点服务不会启动

Memory内存

bootstrap.mlockall: true
bootstrap.mlockall用于锁定内存，同时也要允许elasticsearch的进程可以锁住内存，linux下可以通过 ulimit -l unlimited 命令
当JVM开始写入交换空间时（swapping）ElasticSearch性能会低下，你应该保证它不会写入交换空间
jvm内存配置： ELK是基于JVM的，所以配置好JVM的内存属性对ELK的性能有会有帮助。
在ELK*X的配置文件中有一个jvm.options的配置文件，用于配置ELK的JVM内存和GC，JVM的主要两个内存配置属性：-Xms和-Xmx
-Xmx: 最大堆大小
-Xms: 初始堆大小
注意：这两个值尽可能设置为一样，避免频繁申请内存
其它配置优化：见http://www.cnblogs.com/redcreen/archive/2011/05/04/2037057.html

Network And HTTP

用于ELK网络通信的配置
network.bind_host: 0.0.0.0
network.bind_host用于设置绑定的ip地址，可以是ipv4或ipv6的，默认为***0
network.publish_host: 0.0.0.0
network.publish_host用于设置与其它节点交互的ip地址，如果不设置它会自动设置，值必须是个真实的ip地址
network.host: 0.0.0.0
network.host用于设置邦定与交互的ip地址，这个设置相同于同时设置了publish_host与bind_host，一般也只需要设置这个属性即可
transport.tcp.port: 9300
transport.tcp.port用于设置节点间交互的tcp端口，默认是9300
transport.tcp.compress: true
transport.tcp.compress用于设置是否压缩tcp传输的数据，默认为false，不压缩
http.port: 9200
http.port用于设置对外服务的http端口，默认为9200
http.max_content_length: 100mb
http.max_content_length用于设置请求内容的最大容量，默认100mb
http.enabled: true
http.enabled用于设置是否对外开放http协议服务，默认为true，开启http服务

Gateway

当使用shard gateway时，是为了尽可能的重用local data(本地数据)。以及控制怎样以及何时启动整个集群重启的初始化恢复过程
gateway.type: local
gateway.type设置gateway的类型，默认为local即为本地文件系统
gateway.recover_after_nodes: 3
gateway.recover_after_nodes用于设置一个集群中的N个节点启动后，才允许进行恢复处理，默认为3
gateway.recover_after_time: 5m
gateway.recover_after_time用于设置初始化恢复过程的超时时间，超时时间从上一个配置中配置的N个节点启动后算起
gateway.expected_nodes: 2
gateway.expected_nodes用于设置这个集群中期望有多少个节点，一旦这N个节点启动(并且recover_after_nodes也符合)，就立即开始恢复过程(不等待recover_after_time超时)

Recovery Throttling

用于配置在初始化恢复，副本分配，再平衡，或者添加和删除节点时控制节点间的分片分配
cluster.routing.allocation.node_initial_primaries_recoveries: 4
cluster.routing.allocation.node_initial_primaries_recoveries用于设置初始化数据恢复时，并发恢复线程的个数，默认为4
cluster.routing.allocation.node_concurrent_recoveries: 2
cluster.routing.allocation.node_concurrent_recoveries用于设置添加删除节点或负载均衡时并发恢复线程的个数，默认为2
indices.recovery.max_bytes_per_sec: 20mb
indices.recovery.max_bytes_per_sec用于设置恢复时的吞吐量，默认为0无限制，如果机器还有其他业务在跑还是需要限制一下
indices.recovery.concurrent_streams: 5
indices.recovery.concurrent_streams用于设置限制从其它分片恢复数据时最大同时打开并发流的个数，默认为5

Discovery节点发现

discovery.zen.minimum_master_nodes: 1
discovery.zen.minimum_master_nodes用于设置集群中合格的master节点，默认为3，对于大的集群来说，可以设置大一点的值(3-6)，主要为了防止ELK的脑裂问题。但如果集群过小，可能会导致集群达不到设置的master节点数量而启动失败
discovery.zen.ping.timeout: 3s
discovery.zen.ping.timeout用于设置探查的超时时间，默认3秒，提高一点以应对网络不好的时候，防止脑裂
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.multicast.enabled用于设置是否打开多播发现节点，默认是true。当多播不可用或集群跨网段的时候集群通信还是用单播吧
discovery.zen.ping.unicast.hosts: [“host1”,”host2:9300”]
discovery.zen.ping.unicast.hosts用于设置集群主节点的初始列表，当节点(主节点或者数据节点)启动时使用这个列表进行探测

Various其它的

action.destructive_requires_name: true
action.destructive_requires_name用于设置删除索引时需要明确的给出索引名称，默认为true

head等插件访问

ELK插件访问HTTP服务权限设置
http.cors.enabled: true
http.cors.enabled用于设置是否**插件访问
http.cors.allow-origin: “*”
http.cors.allow-origin用于设置允许访问的源，*表示所有源都可以访问
http.cors.allow-credentials: true
http.cors.allow-credentials用于设置是否允许证书访问

可深度内存调优属性

可设置查询缓存数量，内存使用量限制等，主要用于防止内存溢出

filter cache缓存设置

用于设置节点过滤器缓存，以使用分配给进程的总内存的百分比或特定数量的内存。所有在节点上的碎片共享一个节点缓存(这就是为什么它被称为节点)。缓存实现了LRU驱逐策略:当缓存变得满时，最近使用的数据被删除，以便为新数据让路
注意：这不是一个索引级别设置，而是一个节点级别设置(可以在节点配置中配置)
indices.cache.filter.size: 30%
indices.cache.filter.size用于设置filters缓存大小，默认是10%，可以接受一个百分比值，比如30%，或者一个确切的值，比如512mb

内存溢出避免设置

index.cache.field.max_size: 50000
index.cache.field.max_size用于设置缓存field的最大值
index.cache.field.expire: 10m
index.cache.field.expire用于设置缓存的过期时间
index.cache.field.type: soft
index.cache.field.type用于设置缓存类型为Soft Reference，它的主要特点是据有较强的引用功能，只有当内存不够的时候，才进行回收这类内存，因此在内存足够的时候，它们通常不被回收。另外，这些引用对象还能保证在Java抛出OutOfMemory异常之前，被设置为null，它可以用于实现一些常用图片的缓存，实现Cache的功能，保证最大限度的使用内存而不引起OutOfMemory

field data缓存设置

用于field data缓存设置，主要用于排序或者聚合查询的时候，将field的value加载到内存中来提高查询速度.当数据量过大的时候,我们分配的内存存储不了这么多数据的时候就会报错，需要设置合理的临界值
indices.fielddata.cache.size: unbounded
indices.fielddata.cache.size用于设置field data缓存大小，默认是unbounded，可以接受一个百分比值，比如30%，或者一个确切的值，比如512mb
indices.fielddata.cache.expire: -1
indices.fielddata.cache.expire用于设置field data缓存超时时间，默认是-1，可设置类似：5m

circuit breaker断路器设置

断路器是elasticsearch为了防止内存溢出的一种操作，每一种circuit breaker都可以指定一个内存界限触发此操作
indices.breaker.total.limit: 70%
indices.breaker.total.limit用于总的内存占用比率设置，*别的设定，默认值是JVM heap的70%，当内存占用达到这个数量的时候会触发内存回收
indices.breaker.fielddata.limit: 70%
indices.breaker.fielddata.limit用于设置fielddata数量内存占用比率，当内存占用超过此比率时会触发内存回收，默认值是JVM heap的70%
indices.breaker.fielddata.overhead: 1.03
indices.breaker.fielddata.overhead用于在系统要加载fielddata时会进行预先估计，当系统发现要加载进内存的值超过limit * overhead时会进行进行内存回收，默认是1.03
indices.breaker.request.limit: 40%
indices.breaker.request.limit用于设置请求限制的限制，默认为JVM堆的40%
indices.breaker.request.overhead: 1
indices.breaker.request.overhead用于请求时设定的一个预估系数，用来防止内存溢出，默认值是1

Translog设置

Lucene的更改只会在Lucene提交期间持久存储在磁盘中，这是一个相对较大的操作，因此在每次索引或删除操作后都不能执行。在进程退出或HW失败的情况下，在一次提交之后发生的更改将会丢失
为了防止这种数据丢失，每个分片都有一个事务日志或写入与之相关联的日志。任何索引或删除操作在内部Lucene索引处理后都被写入到translog中
在崩溃的情况下，当碎片恢复时，可以从事务日志重播最近的事务
Elasticsearch flush是执行Lucene提交并启动一个新的translog的过程。它在后台自动完成，以确保事务日志不会增长太大，这将使恢复过程中重播其操作需要相当长的时间。它也通过API暴露，尽管它很少需要手动执行
这个设置一般比较少改动，所以请看官方文档：https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-translog.html#_flush_settings

配置实例

此配置实例基于Elasticsearch5.6.0，系统环境为CentOS7 64位
此配置的角色是：主节点，数据节点，查询节点，插件节点(所以插件都安装此节点中)
注意：因为真实的集群节点机器比较多，所以配置了5个(主节点，数据节点)，10个(数据节点)，最小主节点数量与期望在线节点数据都配置得比较多，主要防止ELK的脑裂

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 这里的集群名称是假的
cluster.name:test
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 这里的节点名称也是假的
node.name:testA
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
######
node.master: true
node.data: true
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /hhdata/hhdata
#
# Path to log files:
#
path.logs: /hhdata/hhlogs
#
# ----------------------------------- Indices -----------------------------------
#
indices.query.bool.max_clause_count: 10240
#
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
# 这里的主节点列表也为假的
discovery.zen.ping.unicast.hosts: ["IP1:9300","IP2:9300","IP3:9300","IP4:9300","IP5:9300"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 3
# 暂时不能修改，会报错
#discovery.zen.ping.multicast.enabled: false
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
# 暂时不能修改，会报错
#gateway.type: local
#gateway.expected_nodes: 9
# 期望在线的节点数量
gateway.recover_after_nodes: 5
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

# 插件访问权限配置
http.cors.enabled: true
http.cors.allow-origin: "*"

Elasticsearch5.X配置说明

elasticsearch.yml配置说明

Cluster集群

Node节点

Index索引

Indices条目

Paths路径

Plugin插件

Memory内存

Network And HTTP

Gateway

Recovery Throttling

Discovery节点发现

Various其它的

head等插件访问

可深度内存调优属性

filter cache缓存设置

内存溢出避免设置

field data缓存设置

circuit breaker断路器设置

Translog设置

配置实例

Spring Boot + Vue 前后端分离开发之前端网络请求封装与配置

springboot配置内存数据库H2教程详解

深入浅析vue-cli@3.0 使用及配置说明

Vue项目中配置pug解析支持

C#配置文件操作类分享

网站标题优化设计的细则和相关说明

最新Maven及Tomcat配置（IDEA版试错无数终于成功了！）

Spring的配置，XML提示的配置，Bean的相关配置

在阿里云服务器上配置CentOS+Nginx+Python+Flask环境

系统变量在哪里设置（win10变量环境配置）

Elasticsearch5.X配置说明

elasticsearch.yml配置说明

Cluster集群

Node节点

Index索引

Indices条目

Paths路径

Plugin插件

Memory内存

Network And HTTP

Gateway

Recovery Throttling

Discovery节点发现

Various其它的

head等插件访问

可深度内存调优属性

filter cache缓存设置

内存溢出避免设置

field data缓存设置

circuit breaker断路器设置

Translog设置

配置实例

Spring Boot + Vue 前后端分离开发之前端网络请求封装与配置

springboot配置内存数据库H2教程详解

深入浅析vue-cli@3.0 使用及配置说明

Vue项目中配置pug解析支持

C#配置文件操作类分享

网站标题优化设计的细则和相关说明

最新Maven及Tomcat配置（IDEA版 试错无数 终于成功了！）

Spring的配置，XML提示的配置，Bean的相关配置

在阿里云服务器上配置CentOS+Nginx+Python+Flask环境

系统变量在哪里设置（win10变量环境配置）

最新Maven及Tomcat配置（IDEA版试错无数终于成功了！）