ES集群监控示例
一、前言
最近在研究ES集群监控方面的内容,结合公司目前的监控方案和外界的一些资料做了一些知识点的梳理和总结。
二、内容
ES监控的最主要作用是用于保障基于ES的服务正常运行以及在出现问题时为工程师提供解决问题的依据。综合我调查了的各类监控方案来说,目前ES监控主要针对三个级别,分别是集群级别、节点级别和索引级别。集群级别的监控主要是针对整个ES集群来说,包括集群的健康状况、集群的状态等。节点级别的监控主要是针对每个ES实例的监控,其中包括每个实例的查询索引指标和物理资源使用指标。索引级别的监控主要是针对每个索引来说,主要包括每个索引的性能指标,由于是针对每个索引的监控,因此一般含有多个索引的ES集群其索引级别的监控数据是非常多的。
针对这三类指标,首先集群级别的指标相较于节点级别和索引级别其量级较少,但是其每个指标都非常重要,可以只看集群级别指标获取ES集群的运行状态。其次,节点级别的指标更多的用于问题的排查,当发现集群出现问题时更可能多的时候会直接定位到具体的ES实例,通过查看单台实例的资源使用情况或者其他指标进行问题排查。最后,索引级别的监控的应用场景主要是为应用提供监控,例如某个应用使用到的索引其查询速度变慢就可以通过索引级别的监控判断是否是由于索引创建时一些不合理设置引起的。针对ES这三类监控内容下面将做更详细总结。
1. 集群监控
集群监控主要包括两个方面的内容,分别是集群健康情况和集群的运行状态。
集群健康状态
集群健康状态可以通过以下api获取:
返回结果示例如下:
{ "cluster_name" : "**** ", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 2, "number_of_data_nodes" : 2, "active_primary_shards" : 1280, "active_shards" : 2549, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 3, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 99.88244514106583 }
关键指标说明:
status:集群状态,分为green、yellow和red。
number_of_nodes/number_of_data_nodes:集群的节点数和数据节点数。
active_primary_shards:集群中所有活跃的主分片数。
active_shards:集群中所有活跃的分片数。
relocating_shards:当前节点迁往其他节点的分片数量,通常为0,当有节点加入或者退出时该值会增加。
initializing_shards:正在初始化的分片。
unassigned_shards:未分配的分片数,通常为0,当有某个节点的副本分片丢失该值就会增加。
number_of_pending_tasks:是指主节点创建索引并分配shards等任务,如果该指标数值一直未减小代表集群存在不稳定因素
active_shards_percent_as_number:集群分片健康度,活跃分片数占总分片数比例。
number_of_pending_tasks:pending task只能由主节点来进行处理,这些任务包括创建索引并将shards分配给节点。
集群状态信息
集群状态信息主要包含整个集群的一些统计信息,例如文档数、分片数、资源使用情况等。集群状态信息可以由以下api获取:http://ip:9200/_cluster/stats?pretty
返回结果示例:
{ "_nodes" : { "total" : 2, "successful" : 2, "failed" : 0 }, "cluster_name" : "****", "timestamp" : 1511246201848, "status" : "yellow", "indices" : { "count" : 268, "shards" : { "total" : 2629, "primaries" : 1320, "replication" : 0.9916666666666667, "index" : { "shards" : { "min" : 1, "max" : 48, "avg" : 9.809701492537313 }, "primaries" : { "min" : 1, "max" : 24, "avg" : 4.925373134328358 }, "replication" : { "min" : 0.0, "max" : 1.0, "avg" : 0.9888059701492538 } } }, "docs" : { "count" : 24331382, "deleted" : 1275153 }, "store" : { "size_in_bytes" : 14053778191, "throttle_time_in_millis" : 0 }, "fielddata" : { "memory_size_in_bytes" : 1172464, "evictions" : 0 }, "query_cache" : { "memory_size_in_bytes" : 39586256, "total_count" : 2292448334, "hit_count" : 28324446, "miss_count" : 2264123888, "cache_size" : 15576, "cache_count" : 484739, "evictions" : 469163 }, "completion" : { "size_in_bytes" : 0 }, "segments" : { "count" : 6575, "memory_in_bytes" : 112649529, "terms_memory_in_bytes" : 90138494, "stored_fields_memory_in_bytes" : 6917880, "term_vectors_memory_in_bytes" : 0, "norms_memory_in_bytes" : 3823616, "points_memory_in_bytes" : 3450143, "doc_values_memory_in_bytes" : 8319396, "index_writer_memory_in_bytes" : 0, "version_map_memory_in_bytes" : 0, "fixed_bit_set_memory_in_bytes" : 143704, "max_unsafe_auto_id_timestamp" : 1510727090177, "file_sizes" : { } } }, "nodes" : { "count" : { "total" : 2, "data" : 2, "coordinating_only" : 0, "master" : 2, "ingest" : 2 }, "versions" : [ "5.4.1" ], "os" : { "available_processors" : 64, "allocated_processors" : 64, "names" : [ { "name" : "Linux", "count" : 2 } ], "mem" : { "total_in_bytes" : 269956005888, "free_in_bytes" : 1114628096, "used_in_bytes" : 268841377792, "free_percent" : 0, "used_percent" : 100 } }, "process" : { "cpu" : { "percent" : 0 }, "open_file_descriptors" : { "min" : 4189, "max" : 4321, "avg" : 4255 } }, "jvm" : { "max_uptime_in_millis" : 1802902700, "versions" : [ { "version" : "1.8.0_92", "vm_name" : "Java HotSpot(TM) 64-Bit Server VM", "vm_version" : "25.92-b14", "vm_vendor" : "Oracle Corporation", "count" : 2 } ], "mem" : { "heap_used_in_bytes" : 15525080840, "heap_max_in_bytes" : 68318265344 }, "threads" : 558 }, "fs" : { "total_in_bytes" : 85857402880, "free_in_bytes" : 52003000320, "available_in_bytes" : 52003000320, "spins" : "true" }, "plugins" : [ { "name" : "analysis-ik", "version" : "5.4.1", "description" : "IK Analyzer for Elasticsearch", "classname" : "org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin", "has_native_controller" : false } ], "network_types" : { "transport_types" : { "netty4" : 2 }, "http_types" : { "netty4" : 2 } } } }
关键指标说明:
indices.count:索引总数。
indices.shards.total:分片总数。
indices.shards.primaries:主分片数量。
docs.count:文档总数。
store.size_in_bytes:数据总存储容量。
segments.count:段总数。
nodes.count.total:总节点数。
nodes.count.data:数据节点数。
nodes. process. cpu.percent:节点CPU使用率。
fs.total_in_bytes:文件系统使用总容量。
fs.free_in_bytes:文件系统剩余总容量。
2. 节点监控
节点监控主要针对各个节点,有很多指标对于保证ES集群的稳定运行非常重要。下面对节点监控指标进行介绍。节点指标可以通过以下api获取:http://ip:9200/_nodes/stats?pretty
返回结果示例:
{ "_nodes" : { "total" : 2, "successful" : 2, "failed" : 0 }, "cluster_name" : "****", "nodes" : { "GoFMtzcMSBq14uLgMjUgXQ" : { "timestamp" : 1511247337198, "name" : "es-node1", "transport_address" : "10.202.77.206:9300", "host" : "10.202.77.206", "ip" : "10.202.77.206:9300", "roles" : [ "master", "data", "ingest" ], "attributes" : { "rack" : "r1" }, "indices" : { "docs" : { "count" : 24315519, "deleted" : 1285643 }, "store" : { "size_in_bytes" : 7062401888, "throttle_time_in_millis" : 0 }, "indexing" : { "index_total" : 8203153, "index_time_in_millis" : 4879194, "index_current" : 0, "index_failed" : 2, "delete_total" : 2683650, "delete_time_in_millis" : 86088, "delete_current" : 0, "noop_update_total" : 2160809, "is_throttled" : false, "throttle_time_in_millis" : 0 }, "get" : { "total" : 11584659, "time_in_millis" : 3805852, "exists_total" : 7583121, "exists_time_in_millis" : 2006327, "missing_total" : 4001538, "missing_time_in_millis" : 1799525, "current" : 0 }, "search" : { "open_contexts" : 0, "query_total" : 86168324, "query_time_in_millis" : 816522806, "query_current" : 0, "fetch_total" : 27242455, "fetch_time_in_millis" : 11770783, "fetch_current" : 0, "scroll_total" : 19528, "scroll_time_in_millis" : 467760567, "scroll_current" : 2, "suggest_total" : 0, "suggest_time_in_millis" : 0, "suggest_current" : 0 }, "merges" : { "current" : 0, "current_docs" : 0, "current_size_in_bytes" : 0, "total" : 42839, "total_time_in_millis" : 12944553, "total_docs" : 921206028, "total_size_in_bytes" : 143903037197, "total_stopped_time_in_millis" : 0, "total_throttled_time_in_millis" : 44898, "total_auto_throttle_in_bytes" : 38083193135 }, "refresh" : { "total" : 474861, "total_time_in_millis" : 6266996, "listeners" : 0 }, "flush" : { "total" : 22969, "total_time_in_millis" : 1351372 }, "warmer" : { "current" : 0, "total" : 237713, "total_time_in_millis" : 62320 }, "query_cache" : { "memory_size_in_bytes" : 19612568, "total_count" : 1206027910, "hit_count" : 13501700, "miss_count" : 1192526210, "cache_size" : 7732, "cache_count" : 238859, "evictions" : 231127 }, "fielddata" : { "memory_size_in_bytes" : 549088, "evictions" : 0 }, "completion" : { "size_in_bytes" : 0 }, "segments" : { "count" : 3263, "memory_in_bytes" : 56005267, "terms_memory_in_bytes" : 44838300, "stored_fields_memory_in_bytes" : 3456464, "term_vectors_memory_in_bytes" : 0, "norms_memory_in_bytes" : 1816320, "points_memory_in_bytes" : 1725007, "doc_values_memory_in_bytes" : 4169176, "index_writer_memory_in_bytes" : 0, "version_map_memory_in_bytes" : 380, "fixed_bit_set_memory_in_bytes" : 71648, "max_unsafe_auto_id_timestamp" : -1, "file_sizes" : { } }, "translog" : { "operations" : 906, "size_in_bytes" : 330172 }, "request_cache" : { "memory_size_in_bytes" : 341576706, "evictions" : 4514215, "hit_count" : 11732783, "miss_count" : 11284220 }, "recovery" : { "current_as_source" : 0, "current_as_target" : 0, "throttle_time_in_millis" : 127 } }, "os" : { "timestamp" : 1511247337312, "cpu" : { "percent" : 2, "load_average" : { "1m" : 0.83, "5m" : 0.63, "15m" : 0.54 } }, "mem" : { "total_in_bytes" : 134978002944, "free_in_bytes" : 528519168, "used_in_bytes" : 134449483776, "free_percent" : 0, "used_percent" : 100 }, "swap" : { "total_in_bytes" : 21474832384, "free_in_bytes" : 522665984, "used_in_bytes" : 20952166400 }, "cgroup" : { "cpuacct" : { "control_group" : "/user.slice", "usage_nanos" : 4515277013133505 }, "cpu" : { "control_group" : "/user.slice", "cfs_period_micros" : 100000, "cfs_quota_micros" : -1, "stat" : { "number_of_elapsed_periods" : 0, "number_of_times_throttled" : 0, "time_throttled_nanos" : 0 } } } }, "process" : { "timestamp" : 1511247337312, "open_file_descriptors" : 4191, "max_file_descriptors" : 524288, "cpu" : { "percent" : 0, "total_in_millis" : 1098851240 }, "mem" : { "total_virtual_in_bytes" : 61150134272 } }, "jvm" : { "timestamp" : 1511247337314, "uptime_in_millis" : 1804038292, "mem" : { "heap_used_in_bytes" : 10168489464, "heap_used_percent" : 29, "heap_committed_in_bytes" : 34159132672, "heap_max_in_bytes" : 34159132672, "non_heap_used_in_bytes" : 171324200, "non_heap_committed_in_bytes" : 181989376, "pools" : { "young" : { "used_in_bytes" : 794711064, "max_in_bytes" : 1605304320, "peak_used_in_bytes" : 1605304320, "peak_max_in_bytes" : 1605304320 }, "survivor" : { "used_in_bytes" : 26434328, "max_in_bytes" : 200605696, "peak_used_in_bytes" : 200605696, "peak_max_in_bytes" : 200605696 }, "old" : { "used_in_bytes" : 9347344072, "max_in_bytes" : 32353222656, "peak_used_in_bytes" : 24281906576, "peak_max_in_bytes" : 32353222656 } } }, "threads" : { "count" : 302, "peak_count" : 392 }, "gc" : { "collectors" : { "young" : { "collection_count" : 114294, "collection_time_in_millis" : 7084525 }, "old" : { "collection_count" : 4, "collection_time_in_millis" : 13639 } } }, "buffer_pools" : { "direct" : { "count" : 632, "used_in_bytes" : 1084444548, "total_capacity_in_bytes" : 1084444547 }, "mapped" : { "count" : 5750, "used_in_bytes" : 6970687717, "total_capacity_in_bytes" : 6970687717 } }, "classes" : { "current_loaded_count" : 13247, "total_loaded_count" : 13550, "total_unloaded_count" : 303 } }, "thread_pool" : { "bulk" : { "threads" : 8, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 8, "completed" : 1283041 }, "fetch_shard_started" : { "threads" : 1, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 64, "completed" : 1116 }, "fetch_shard_store" : { "threads" : 1, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 64, "completed" : 1105 }, "flush" : { "threads" : 4, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 5, "completed" : 45930 }, "force_merge" : { "threads" : 0, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 0, "completed" : 0 }, "generic" : { "threads" : 4, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 44, "completed" : 206262 }, "get" : { "threads" : 32, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 32, "completed" : 8690669 }, "index" : { "threads" : 32, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 32, "completed" : 35 }, "listener" : { "threads" : 0, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 0, "completed" : 0 }, "management" : { "threads" : 5, "queue" : 0, "active" : 1, "rejected" : 0, "largest" : 5, "completed" : 1795360 }, "refresh" : { "threads" : 10, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 10, "completed" : 444326281 }, "search" : { "threads" : 49, "queue" : 0, "active" : 0, "rejected" : 19, "largest" : 49, "completed" : 127485601 }, "snapshot" : { "threads" : 0, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 0, "completed" : 0 }, "warmer" : { "threads" : 2, "queue" : 0, "active" : 0, "rejected" : 0, "largest" : 5, "completed" : 578631 } }, "fs" : { "timestamp" : 1511247337315, "total" : { "total_in_bytes" : 85857402880, "free_in_bytes" : 51980320768, "available_in_bytes" : 51980320768, "spins" : "true" }, "data" : [ { "path" : "/DATA/esdir/nodes/0", "mount" : "/ (rootfs)", "type" : "rootfs", "total_in_bytes" : 32196526080, "free_in_bytes" : 12431843328, "available_in_bytes" : 12431843328 }, { "path" : "/kdump/esdir/nodes/0", "mount" : "/kdump (/dev/mapper/VolGroup00-LVkdump)", "type" : "xfs", "total_in_bytes" : 53660876800, "free_in_bytes" : 39548477440, "available_in_bytes" : 39548477440, "spins" : "true" } ], "io_stats" : { "devices" : [ { "device_name" : "dm-2", "operations" : 45465602, "read_operations" : 24338409, "write_operations" : 21127193, "read_kilobytes" : 253166215, "write_kilobytes" : 361639346 } ], "total" : { "operations" : 45465602, "read_operations" : 24338409, "write_operations" : 21127193, "read_kilobytes" : 253166215, "write_kilobytes" : 361639346 } } }, "transport" : { "server_open" : 429, "rx_count" : 220683999, "rx_size_in_bytes" : 205215158152, "tx_count" : 220683992, "tx_size_in_bytes" : 417290114794 }, "http" : { "current_open" : 5, "total_opened" : 256040 }, "breakers" : { "request" : { "limit_size_in_bytes" : 20495479603, "limit_size" : "19gb", "estimated_size_in_bytes" : 0, "estimated_size" : "0b", "overhead" : 1.0, "tripped" : 0 }, "fielddata" : { "limit_size_in_bytes" : 20495479603, "limit_size" : "19gb", "estimated_size_in_bytes" : 549088, "estimated_size" : "536.2kb", "overhead" : 1.03, "tripped" : 0 }, "in_flight_requests" : { "limit_size_in_bytes" : 34159132672, "limit_size" : "31.8gb", "estimated_size_in_bytes" : 0, "estimated_size" : "0b", "overhead" : 1.0, "tripped" : 0 }, "parent" : { "limit_size_in_bytes" : 23911392870, "limit_size" : "22.2gb", "estimated_size_in_bytes" : 549088, "estimated_size" : "536.2kb", "overhead" : 1.0, "tripped" : 0 } }, "script" : { "compilations" : 20, "cache_evictions" : 0 }, "discovery" : { "cluster_state_queue" : { "total" : 0, "pending" : 0, "committed" : 0 } }, "ingest" : { "total" : { "count" : 0, "time_in_millis" : 0, "current" : 0, "failed" : 0 }, "pipelines" : { "xpack_monitoring_2" : { "count" : 0, "time_in_millis" : 0, "current" : 0, "failed" : 0 } } } }, "vcMKsI-mR8qdqaLpbObL2Q" : { } } }
关键指标说明:
name:节点名。
roles:节点角色。
indices.docs.count:索引文档数。
segments.count:段总数。
jvm.heap_used_percent:内存使用百分比。
thread_pool.{bulk, index, get, search}.{active, queue, rejected}:线程池的一些信息,包括bulk、index、get和search线程池,主要指标有active(激活)线程数,线程queue(队列)数和rejected(拒绝)线程数量。
以下一些指标是一个累加值,当节点重启之后会清零。
indices.indexing.index_total:索引文档数。
indices.indexing.index_time_in_millis:索引总耗时。
indices.get.total:get请求数。
indices.get.time_in_millis:get请求总耗时。
indices.search.query_total:search总请求数。
indices.search.query_time_in_millis:search请求总耗时。indices.search.fetch_total:fetch操作总数量。
indices.search.fetch_time_in_millis:fetch请求总耗时。
jvm.gc.collectors.young.collection_count:年轻代垃圾回收次数。
jvm.gc.collectors.young.collection_time_in_millis:年轻代垃圾回收总耗时。
jvm.gc.collectors.old.collection_count:老年代垃圾回收次数。
jvm.gc.collectors.old.collection_time_in_millis:老年代垃圾回收总耗时。
一些需要计算的指标:
节点监控的计算指标主要分为两类,分别为请求速率指标和请求处理延迟指标,下面作具体介绍。
index_per_min:每分钟索引请求数量。计算公式如下:
索引请求率=(index_total两次采集差值)/(系统时间差值(ms))×60000 (公式1)
indexAverge_per_min:索引请求处理延迟。计算公式如下:
索引延迟=(index_time_in_millis两次采集差值)/(index_total两次采集差值) (公式2)
get_per_min:每分钟get请求数量,计算公式如(公式1),更改相应参数。
getAverage_per_min:get请求处理延迟,计算公式如(公式2) ,更改相应参数。
merge_per_min:每分钟merge请求数量,计算公式如(公式1),更改相应参数。
mergeAverage_per_min:merge请求处理延迟,计算公式如(公式2) ,更改相应参数。
searchQuery_per_min:每分钟query请求数量,计算公式如(公式1),更改相应参数。
searchQueryAverage_per_min:query请求延迟,计算公式如(公式2) ,更改相应参数。
searchFetch_per_min:每分钟fetch请求数量,计算公式如(公式1),更改相应参数。
searchFetchAverage_per_min:fetch请求延迟,计算公式如(公式2) ,更改相应参数。
youngGc_per_min:每分钟young gc数量,计算公式如(公式1),更改相应参数。
youngGcAverage_per_min:young gc请求延迟,计算公式如(公式2) ,更改相应参数。
oldGc_per_min:每分钟old gc数量,计算公式如(公式1),更改相应参数。
oldGcAverage_per_min:old gc请求延迟,计算公式如(公式2) ,更改相应参数。
3. 索引监控
索引监控指标主要针对单个索引,不过也可以通过“_all”对集群中所有索引进行监控。节点监控指标可以通过以下api获取:http://ip:9200/_stats?pretty。
返回结果示例(由于指标太多,删除掉部分索引数据)
{ "_shards" : { "total" : 2632, "successful" : 2629, "failed" : 0 }, "_all" : { "primaries" : { "docs" : { "count" : 24331733, "deleted" : 1275141 }, "store" : { "size_in_bytes" : 7059955543, "throttle_time_in_millis" : 0 }, "indexing" : { "index_total" : 5365017, "index_time_in_millis" : 1252523, "index_current" : 0, "index_failed" : 2, "delete_total" : 52127, "delete_time_in_millis" : 15391, "delete_current" : 0, "noop_update_total" : 2603991, "is_throttled" : false, "throttle_time_in_millis" : 0 }, "get" : { "total" : 10460612, "time_in_millis" : 2008642, "exists_total" : 7379342, "exists_time_in_millis" : 1484824, "missing_total" : 3081270, "missing_time_in_millis" : 523818, "current" : 0 }, "search" : { "open_contexts" : 0, "query_total" : 86083291, "query_time_in_millis" : 814756904, "query_current" : 0, "fetch_total" : 27165655, "fetch_time_in_millis" : 10714355, "fetch_current" : 0, "scroll_total" : 10423, "scroll_time_in_millis" : 232782387, "scroll_current" : 0, "suggest_total" : 0, "suggest_time_in_millis" : 0, "suggest_current" : 0 }, "merges" : { "current" : 0, "current_docs" : 0, "current_size_in_bytes" : 0, "total" : 20183, "total_time_in_millis" : 6543256, "total_docs" : 577868277, "total_size_in_bytes" : 74246755005, "total_stopped_time_in_millis" : 0, "total_throttled_time_in_millis" : 0, "total_auto_throttle_in_bytes" : 27682406400 }, "refresh" : { "total" : 208002, "total_time_in_millis" : 2211142, "listeners" : 0 }, "flush" : { "total" : 16736, "total_time_in_millis" : 671392 }, "warmer" : { "current" : 0, "total" : 261008, "total_time_in_millis" : 56753 }, "query_cache" : { "memory_size_in_bytes" : 19880448, "total_count" : 1121482891, "hit_count" : 14419521, "miss_count" : 1107063370, "cache_size" : 7873, "cache_count" : 245098, "evictions" : 237225 }, "fielddata" : { "memory_size_in_bytes" : 536240, "evictions" : 0 }, "completion" : { "size_in_bytes" : 0 }, "segments" : { "count" : 3279, "memory_in_bytes" : 57186862, "terms_memory_in_bytes" : 45772185, "stored_fields_memory_in_bytes" : 3463240, "term_vectors_memory_in_bytes" : 0, "norms_memory_in_bytes" : 1958784, "points_memory_in_bytes" : 1725005, "doc_values_memory_in_bytes" : 4267648, "index_writer_memory_in_bytes" : 0, "version_map_memory_in_bytes" : 0, "fixed_bit_set_memory_in_bytes" : 71952, "max_unsafe_auto_id_timestamp" : -1, "file_sizes" : { } }, "translog" : { "operations" : 1077, "size_in_bytes" : 400134 }, "request_cache" : { "memory_size_in_bytes" : 341607885, "evictions" : 4519712, "hit_count" : 11726897, "miss_count" : 11291340 }, "recovery" : { "current_as_source" : 0, "current_as_target" : 0, "throttle_time_in_millis" : 127 } }, "total" : { "docs" : { "count" : 48623000, "deleted" : 2423749 }, "store" : { "size_in_bytes" : 14070449695, "throttle_time_in_millis" : 0 }, "indexing" : { "index_total" : 10693484, "index_time_in_millis" : 2604719, "index_current" : 0, "index_failed" : 2, "delete_total" : 104254, "delete_time_in_millis" : 30121, "delete_current" : 0, "noop_update_total" : 2603991, "is_throttled" : false, "throttle_time_in_millis" : 0 }, "get" : { "total" : 16293442, "time_in_millis" : 2804849, "exists_total" : 10394888, "exists_time_in_millis" : 1959283, "missing_total" : 5898554, "missing_time_in_millis" : 845566, "current" : 0 }, "search" : { "open_contexts" : 0, "query_total" : 172162337, "query_time_in_millis" : 1592080354, "query_current" : 0, "fetch_total" : 54485249, "fetch_time_in_millis" : 21057369, "fetch_current" : 0, "scroll_total" : 20835, "scroll_time_in_millis" : 463340209, "scroll_current" : 0, "suggest_total" : 0, "suggest_time_in_millis" : 0, "suggest_current" : 0 }, "merges" : { "current" : 0, "current_docs" : 0, "current_size_in_bytes" : 0, "total" : 36961, "total_time_in_millis" : 12229220, "total_docs" : 1097422316, "total_size_in_bytes" : 139540796667, "total_stopped_time_in_millis" : 0, "total_throttled_time_in_millis" : 0, "total_auto_throttle_in_bytes" : 55134126080 }, "refresh" : { "total" : 413646, "total_time_in_millis" : 4321375, "listeners" : 0 }, "flush" : { "total" : 33458, "total_time_in_millis" : 1355230 }, "warmer" : { "current" : 0, "total" : 485695, "total_time_in_millis" : 114052 }, "query_cache" : { "memory_size_in_bytes" : 39548144, "total_count" : 2295122735, "hit_count" : 28356972, "miss_count" : 2266765763, "cache_size" : 15628, "cache_count" : 485134, "evictions" : 469506 }, "fielddata" : { "memory_size_in_bytes" : 1078872, "evictions" : 0 }, "completion" : { "size_in_bytes" : 0 }, "segments" : { "count" : 6581, "memory_in_bytes" : 112538237, "terms_memory_in_bytes" : 90044807, "stored_fields_memory_in_bytes" : 6920432, "term_vectors_memory_in_bytes" : 0, "norms_memory_in_bytes" : 3802944, "points_memory_in_bytes" : 3450186, "doc_values_memory_in_bytes" : 8319868, "index_writer_memory_in_bytes" : 0, "version_map_memory_in_bytes" : 0, "fixed_bit_set_memory_in_bytes" : 143704, "max_unsafe_auto_id_timestamp" : 1510727090177, "file_sizes" : { } }, "translog" : { "operations" : 2154, "size_in_bytes" : 756365 }, "request_cache" : { "memory_size_in_bytes" : 683172171, "evictions" : 9039384, "hit_count" : 23461066, "miss_count" : 22580797 }, "recovery" : { "current_as_source" : 0, "current_as_target" : 0, "throttle_time_in_millis" : 786 } } }, "indices" :{ } }
关键指标说明(indexname泛指索引名称):
indexname.primaries.docs.count:索引文档数量。
以下一些指标是一个累加值,当节点重启之后会清零。
indexname.primaries.indexing.index_total:索引文档数。
indexname.primaries.indexing.index_time_in_millis:索引总耗时。
indexname.primaries.get.total:get请求数。
indexname.primaries.get.time_in_millis:get请求总耗时。
indexname.primaries.search.query_total:search总请求数。
indexname.primaries.search.query_time_in_millis:search请求总耗时。indices.search.fetch_total:fetch操作总数量。
indexname.primaries.search.fetch_time_in_millis:fetch请求总耗时。
indexname.primaries.refresh.total:refresh请求总量。
indexname.primaries.refresh.total_time_in_millis:refresh请求总耗时。
indexname.primaries.flush.total:flush请求总量。
indexname.primaries.flush.total_time_in_millis:flush请求总耗时。
计算指标:
索引计算指标和节点监控的计算指标一样分为两类,分别为请求速率指标和请求处理延迟指标并且计算方式一样,这里不做赘述。