prometheus之钉钉报警配置
程序员文章站
2022-03-03 20:33:19
1.上传安装包1.上传最新得二进制安装包并解压tar xf alertmanager-0.20.0-rc.0.linux-amd64.tar.gztar xf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz2.改名mv alertmanager-0.20.0-rc.0.linux-amd64 alertmanagermv prometheus-webhook-dingtalk-0.3.0.linux-amd64 prometheus-we...
1.上传安装包
1.上传最新得二进制安装包并解压
tar xf alertmanager-0.20.0-rc.0.linux-amd64.tar.gz
tar xf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
2.改名
mv alertmanager-0.20.0-rc.0.linux-amd64 alertmanager
mv prometheus-webhook-dingtalk-0.3.0.linux-amd64 prometheus-webhook-dingtalk
2.启动钉钉插件
钉钉创建机器人拿webhook上网一大堆
nohup ./prometheus-webhook-dingtalk --ding.profile="ops_dingding=自己钉钉得webhook" &
3.配置alertmanager
# 1.配置文件
vim alertmanager.yml
global:
resolve_timeout: 5m
route:
receiver: webhook
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
group_by: [alertname]
routes:
- receiver: webhook
group_wait: 10s
match:
team: node
receivers:
- name: webhook
webhook_configs:
- url: http://10.10.9.200:8060/dingtalk/ops_dingding/send #钉钉插件地址,ops_dingding和启动插件指定得名字一样
send_resolved: true
# 2.启动alertmanager
nohup ./alertmanager --config.file=alertmanager.yml &
4.配置prometheus报警规则
#1.配置报警规则
vim rules.yml
groups:
- name: test-rule
rules:
- alert: 主机状态
expr: up == 0
for: 2m
labels:
status: warning
annotations:
summary: "{{$labels.instance}}:服务器关闭"
description: "{{$labels.instance}}:服务器关闭"
#2.修改prometheus配置让报警生效
vim prometheus.yml
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ["10.10.9.200:9093"] #alertmanager地址
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rules.yml"#指定报警规则文件
# - "second_rules.yml"
3.重启prometheus
5.实验配置是否生效
1.关闭node监控
2.钉钉报警信息
[FIRING:1] 主机状态
Labels
alertname: 主机状态
instance: linux
job: node_export
status: warning
Annotations
description: linux:服务器关闭
summary: linux:服务器关闭
Source: http://test:9090/graph?g0.expr=up+%3D%3D+0&g0.tab=1
promethus报警状态
· Inactive:这里什么都没有发生。
· Pending:已触发阈值,但未满足告警持续时间(即rule中的for字段)
· Firing:已触发阈值且满足告警持续时间。警报发送到Notification Pipeline,经过处理,发送给接受者这样目的是多次判断失败才发告警,减少邮件。
本文地址:https://blog.csdn.net/weixin_43999932/article/details/107608046