Centos7 部署 Prometheus、Grafana、Cadvisor

程序员文章站 2022-06-22 16:18:24

...

概述
Prometheus （中文名：普罗米修斯）是由 SoundCloud 开发的开源监控报警系统和时序列数据库(TSDB).自2012年起,许多公司及组织已经采用 Prometheus,并且该项目有着非常活跃的开发者和用户社区.现在已经成为一个独立的开源项目。Prometheus 在2016加入 CNCF ( Cloud Native Computing Foundation ), 作为在 kubernetes 之后的第二个由基金会主持的项目。Prometheus 的实现参考了Google内部的监控实现，与源自Google的Kubernetes结合起来非常合适。另外相比influxdb的方案，性能更加突出，而且还内置了报警功能。它针对大规模的集群环境设计了拉取式的数据采集方式，只需要在应用里面实现一个metrics接口，然后把这个接口告诉Prometheus就可以完成数据采集了。

Prometheus应该是为数不多的适合Docker、Mesos、Kubernetes环境的监控系统之一。近几年随着k8s的流行，Prometheus成为了一个越来越流行的监控工具。

在业务层用作埋点系统 Prometheus支持各个主流开发语言（Go，java，python，ruby官方提供客户端，其他语言有第三方开源客户端）。我们可以通过客户端方面的对核心业务进行埋点。如下单流程、添加购物车流程。在应用层用作应用监控系统一些主流应用可以通过官方或第三方的导出器，来对这些应用做核心指标的收集。如redis,mysql。在系统层用作系统监控除了常用软件， Prometheus也有相关系统层和网络层exporter,用以监控服务器或网络。集成其他的监控 Prometheus还可以通过各种exporter，集成其他的监控系统，收集监控数据，如AWS CloudWatch,JMX，Pingdom等等。

Prometheus基本原理是通过HTTP协议周期性抓取被监控组件的状态，这样做的好处是任意组件只要提供HTTP接口就可以接入监控系统，不需要任何SDK或者其他的集成过程。这样做非常适合虚拟化环境比如VM或者Docker。

node-exporter组件负责收集节点上的metrics监控数据，并将数据推送给Prometheus, Prometheus负责存储这些数据，grafana将这些数据通过网页以图形的形式展现给用户。

作为新一代的监控框架，Prometheus 具有以下特点：
 多维数据模型（时序列数据由metric名和一组key/value组成）
 非常高效的存储，平均一个采样数据占~3.5bytes左右，320万的时间序列，每30秒采样，保持60天，消耗磁盘大概228G
 在多维度上灵活的查询语言( PromQL )，可以对多个 metrics 进行乘法、加法、连接、取分数位等操作
 不依赖分布式存储，单主节点工作
 通过基于HTTP的pull方式采集时序数据
 可以通过push gateway进行时序列数据推送(pushing)
 可以通过服务发现或者静态配置去获取要采集的目标服务器
 多种可视化图表及仪表盘支持

Prometheus 相关组件，Prometheus生态系统由多个组件组成，其中许多是可选的：
1、Prometheus Server：主服务,用来抓取和存储时序数据，定期从静态配置的 targets 或者服务发现（主要是DNS、consul、k8s、mesos等）的 targets 拉取数据，提供PromQL查询语言的支持

2、Client Library：用来构造应用或 exporter 代码 (go,java,python,ruby)，为需要监控的服务生成相应的 metrics 并暴露给 Prometheus server。当 Prometheus server 来 pull 时，直接返回实时状态的 metrics

3、Push Gateway：网关可用来支持短连接任务，这类 jobs 可以直接向 Prometheus server 端推送它们的 metrics。这种方式主要用于服务层面的 metrics，对于机器层面的 metrices，需要使用 node exporter

4、Grafana：可视化的dashboard (两种选择,promdash 和 grafana.目前主流选择是 grafana)

5、Exporters: 用于暴露已有的第三方服务的 metrics 给 Prometheus，支持其他数据源的指标导入到Prometheus，支持数据库、硬件、消息中间件、存储系统、http服务器、jmx等。

6、 Alertmanager: 是独立于Prometheus的一个组件，可以支持Prometheus的查询语句，提供十分灵活的报警方式。从 Prometheus server 端接收到 alerts 后，会进行去除重复数据，分组，并路由到对收的接受方式，发出报警。常见的接收方式有：电子邮件，pagerduty，OpsGenie, webhook 等。

前提
前提：VirtualBox CentOS7
物理机IP   192.168.18.8
虚拟机3IP：192.168.18.103(VServer3)

VPrometheus node3安装：Prometheus,Grafana,Node_exporter
其他监控节点安装：Node_exporter和Cadvisor(非K8S集群节点并运行docker的虚拟机环境，如果是K8S集群节点，kubelet自带Cadvisor,不需安装)

一：CentOS7 虚拟机IP配置
1.#cd /etc/sysconfig/network-scripts

2.#vi ifcfg-enp0s3
TYPE=Ethernet
DEVICE=enp0s3
NAME=enp0s3
ONBOOT=yes
DEFROUTE=yes
BOOTPROTO=static
IPADDR=192.168.18.103
NETMASK=255.255.255.0
DNS1=192.168.18.1
GATEWAY=192.168.18.1
BROADCAST=192.168.18.1

3.#service network restart

4.#ip address

二：虚拟机hostname 设置(重启生效)
1.#hostname
或
#hostnamectl

2.#vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=VServer1

3.#vi /etc/hosts
最后一行加上修改后的IP地址及对应的主机名：
192.168.18.103 VServer3

4.#vi /etc/hostname
修改为VServer3

5.#reboot         ##重启虚拟机

centos7系统，有systemctl restart systemd-hostnamed服务，重启这个服务即可
#systemctl restart systemd-hostnamed

6.#hostname
或
#hostnamectl

7.#yum update
CentOS升级（包括系统版本和内核版本）

8.#reboot     ##重启虚拟机

三、安装Prometheus
1.下载、解压
#wget https://github.com/prometheus/prometheus/releases/download/v2.18.1/prometheus-2.18.1.linux-amd64.tar.gz
#tar -zvxf prometheus-2.18.1.linux-amd64.tar.gz
#mv prometheus-2.18.1.linux-amd64 /usr/local/
#ln -s /usr/local/prometheus-2.18.1.linux-amd64/ /usr/local/prometheus

2.配置promethes.yml
# 全局配置
global:
scrape_interval:     15s # 设置抓取间隔，默认为1分钟
evaluation_interval: 15s #估算规则的默认周期，每15秒计算一次规则。默认1分钟
# scrape_timeout #默认抓取超时，默认为10s

# Alertmanager相关配置
alerting:
alertmanagers:
- static_configs:
    - targets:
      # - alertmanager:9093

# 规则文件列表，使用'evaluation_interval' 参数去抓取
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

# 抓取配置列表
scrape_configs:
- job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

3.创建prometheus的用户及数据存储目录，为了安全，我们使用普通用户来启动prometheus服务。作为一个时序型的数据库产品，prometheus的数据默认会存放在应用所在目录下，我们需要修改为 /data/prometheus下。
#useradd -s /sbin/nologin -M prometheus
#mkdir /data/prometheus -p
#修改目录属主
#chown -R prometheus:prometheus /usr/local/prometheus
#chown -R prometheus:prometheus /data/prometheus/

4.创建Systemd服务启动prometheus，prometheus的启动很简单，只需要直接启动解压目录的二进制文件prometheus即可，但是为了更加方便对prometheus进行管理，这里使用systemd来启停prometheus。

#touch /etc/systemd/system/prometheus.service
#vi /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/prometheus-2.18.1.linux-amd64/prometheus --config.file=/usr/local/prometheus-2.18.1.linux-amd64/prometheus.yml --storage.tsdb.retention=5d --storage.tsdb.path=/data/prometheus/
Restart=on-failure
[Install]
WantedBy=multi-user.target

备注：在service文件里面，我们定义了启动的命令，定义了数据存储在/data/prometheus路径下，否则默认会在prometheus二进制的目录的data下。

#chmod +x /etc/systemd/system/prometheus.service

5.启动prometheus

#mkdir /script
#如果已经存在 #cd /script

#touch prometheus_service.sh
#vi prometheus_service.sh

systemctl daemon-reload
systemctl restart prometheus
systemctl status prometheus
systemctl enable prometheus

#chmod +x prometheus_service.sh

#sh /script/prometheus_service.sh

备注：
prometheus启动参数：
#./prometheus --web.listen-address=0.0.0.0:9090 --web.read-timeout=5m --web.max-connection=10 --storage.tsdb.min-block-duration=1h --storage.tsdb.max-block-duration=1h --storage.tsdb.retention=15d --storage.tsdb.path=/data/prometheus/ --query.max-concurrency=20 --query.timeout=3m --web.enable-admin-api --web.enable-lifecycle

--web.read-timeout=5m         请求链接的最大等待时间
--web.max-connection          最大链接数
--storage.tsdb.retention=15d 数据保留期限的设置，企业中设置15天为宜
--storage.tsdb.path               数据存储路径
--query.max-concurrency     最大并发查询用户数量
--query.timeout                     查询timout设置时间
--storage.tsdb.max-block-duration 设置数据块最小时间跨度，默认 2h 的数据量。监控数据是按块（block）存储，每一个块中包含该时间窗口内的所有样本数据（data chunks)
--storage.tsdb.min-block-duration 设置数据块最大时间跨度，默认为最大保留时间的 10%
--web.enable-admin-api     控制对admin HTTP API的访问，其中包括删除时间序列等功能
--web.enable-lifecycle     支持热更新，直接执行 curl -X POST

6.访问
http://192.168.1.103:9090/graph

# ps -ef | grep prometheus

安装完成，大功告成

yml配置文件语法检查:
#/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml

四、安装Grafana
1.下载、解压
#wget https://dl.grafana.com/oss/release/grafana-7.0.1.linux-amd64.tar.gz
#tar -zxvf grafana-7.0.1.linux-amd64.tar.gz
#mv grafana-7.0.1.l /usr/local/
#ln -s /usr/local/grafana-7.0.1/ /usr/local/grafana

2.创建grafana用户及数据存放目录
#useradd -s /sbin/nologin -M grafana
#mkdir /data/grafana
#chown -R grafana:grafana /usr/local/grafana
#chown -R grafana:grafana /data/grafana/

3. 修改配置文件
修改 /usr/local/grafana/conf/defaults.ini 文件，配置为上面新建的数据目录。
data = /data/grafana/data
logs = /data/grafana/log
plugins = /data/grafana/plugins
provisioning = /data/grafana/conf/provisioning

.4 把grafana-server添加到systemd中,新增 grafana-server.service 文件，使用systemd来管理grafana服务

#touch /etc/systemd/system/grafana-server.service
#vi /etc/systemd/system/grafana-server.service

[Unit]
Description=Grafana
After=network.target

[Service]
User=grafana
Group=grafana
Type=notify
ExecStart=/usr/local/grafana/bin/grafana-server -homepath /usr/local/grafana
Restart=on-failure

[Install]
WantedBy=multi-user.target

#chmod +x /etc/systemd/system/grafana-server.service

5.启动grafana并将其设置开机自启

#mkdir /script
#如果已经存在 #cd /script

#touch grafana_service.sh
#vi grafana_service.sh

systemctl daemon-reload
systemctl restart grafana-server
systemctl status grafana-server
systemctl enable grafana-server

#chmod +x grafana_service.sh

#sh /script/grafana_service.sh

6.访问
http://192.168.18.103:3000/login

# ps -ef | grep grafana

注意：grafana 初始账号密码：admin/admin
http://192.168.18.103:3000 进入主页

安装完成，大功告成

注意：忘记grafana的admin密码重置命令

#cd /usr/local/grafana/bin

#grafana-cli admin reset-admin-password admin1234

五、安装exporter
1、下载、解压
#wget https://github.com/prometheus/node_exporter/releases/download/v1.0.0/node_exporter-1.0.0.linux-amd64.tar.gz
#tar -zxvf node_exporter-1.0.0.linux-amd64.tar.gz
#mkdir -p /usr/local/prometheus_exporter
#mv node_exporter-1.0.0.linux-amd64 /usr/local/prometheus_exporter/
#cd /usr/local/prometheus_exporter/
#ln -s /usr/local/node_exporter-1.0.0.linux-amd64/ /usr/local/node_exporter

#chown -R prometheus:prometheus /usr/local/node_exporter

2.将node_exporter加入到系统服务当中
#touch /etc/systemd/system/node_exporter.service
#vi /etc/systemd/system/node_exporter.service

[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target

#chmod +x /etc/systemd/system/node_exporter.service

3.启动node_exporter并将其设置开机自启

#mkdir /script
#如果已经存在 #cd /script

#touch node_exporter_service.sh
#vi node_exporter_service.sh

systemctl daemon-reload
systemctl restart node_exporter
systemctl status node_exporter
systemctl enable node_exporter

#chmod +x node_exporter_service.sh

#sh /script/node_exporter_service.sh

4.检查node_exporter是否已启动，node_exporter默认的端口是9100
#systemctl status node_exporter
#ss -ntl |grep 9100

#/usr/local/node_exporter/node_exporter --version

#/usr/local/node_exporter/node_exporter --help

# ps -ef | grep node

http://192.168.18.103:9100/metrics

5、配置Prometheus，收集node exporter的数据（将客户端加入到Prometheus监控中
注意：将配置文件中的ip地址改成你的被监控客户端的ip，node_exporter的端口号默认是9100）
编辑prometheus.yml文件，增加后面4行：

#cd /usr/local/prometheus/
#vi prometheus.yml

scrape_configs:
- job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9090']

   #采集node exporter监控数据
- job_name: 'node1'
    static_configs:      ##静态配置
    - targets: ['localhost:9100','192.168.18.101:9100','192.168.18.102:9100']

然后重启prometheus，打开prometheus页面查看是不是有对应的数据了，在prometheus的web界面看到这个节点是up的状态了
http://192.168.18.103:9090/targets

【redis_exporter 配置】:

./node_exporter &

提示：默认redis_exporter端口为9121

scrape_configs:
## config for the multiple Redis targets that the exporter will scrape
- job_name: 'redis_exporter_targets'
    static_configs:
      - targets:
        - redis://first-redis-host:6379
        - redis://second-redis-host:6380
        - redis://second-redis-host:6381
        - redis://second-redis-host:6382
    metrics_path: /scrape
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: <<REDIS-EXPORTER-HOSTNAME>>:9121

## config for scraping the exporter itself
- job_name: 'redis_exporter'
    static_configs:
      - targets:
        - <<REDIS-EXPORTER-HOSTNAME>>:9121

## config for the multiple Redis targets that the exporter will scrape
- job_name: 'redis_exporter_targets'
    static_configs:
      - targets:
        - redis://192.101.11.153:7001
        - redis://192.101.11.153:7002
        - redis://192.101.11.154:7003
        - redis://192.101.11.154:7004
- redis://192.101.11.155:7005
        - redis://192.101.11.155:7006
    metrics_path: /scrape
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.101.11.153:9121

## config for scraping the exporter itself
- job_name: 'redis_exporter'
    static_configs:
      - targets:
        - 192.101.11.153:9121

【kafka_exporter 配置】:

./kafka_exporter --kafka.server=kafkaIP或者域名:9092 & （只需填写kafka集群的一个ip即可）
./kafka_exporter --kafka.server=192.101.11.162:9092 &

提示：默认kafka_exporter端口为9308

注意：1个kafka集群只需要1个exporter，在集群上的任意1台服务器部署。
- job_name: 'kafka-cluster'
      static_configs:
        - targets: ['kafkaIP或者域名:9308']

- job_name: 'kafka-cluster'
static_configs:
- targets: ['192.101.11.162:9308','192.101.11.163:9308','192.101.11.164:9308']

六、安装cadvisor
(一)二进制运行方式
1.下载
https://github.com/google/cadvisor/releases/latest

2.把cadvisor添加到systemd中,新增 cadvisor.service 文件，使用systemd来管理cadvisor服务

#touch /etc/systemd/system/cadvisor.service
#vi /etc/systemd/system/cadvisor.service

[Unit]
Description=cadvisor
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/prometheus_exporter/cadvisor
Restart=on-failure

[Install]
WantedBy=multi-user.target

#chmod +x /etc/systemd/system/cadvisor.service

3.启动cadvisor并将其设置开机自启

#mkdir /script
#如果已经存在 #cd /script

#touch cadvisor_service.sh
#vi cadvisor_service.sh

systemctl daemon-reload
systemctl restart cadvisor
systemctl status cadvisor
systemctl enable cadvisor

#chmod +x cadvisor_service.sh

#sh /script/cadvisor_service.sh

# ps -ef | grep cadvisor

(二)镜像docker运行方式
1.下载镜像并启动
#docker pull google/cadvisor

#docker images

#运行
#docker run --name cadvisor -p 8080:8080 --volume=/:/rootfs:ro --volume=/var/run:/var/run:rw --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro --volume=/dev/disk/:/dev/disk:ro --detach=true docker.io/google/cadvisor

#docker ps -a

2.登录访问

http://localhost:8080/containers/

3、配置Prometheus，收集node exporter的数据（将客户端加入到Prometheus监控中
注意：将配置文件中的ip地址和端口号改成你的被监控客户端的ip和port

编辑prometheus.yml文件，增加下面几行：

- job_name: 'cadvisor'
    static_configs:
    - targets:
      - '192.168.18.181:8080'
      labels:
        group: 'cadvisor'

4.需要对模板就行修改，在设置---templating 点击，修改label_values(up{job="container"}, instance)，其中注意修改为cadvisor，它是在prometheus里面的job_name标签:
label_values(up{job="cadvisor"}, instance)

5.介绍

cAdvisor（Container Advisor）用于收集正在运行的容器资源使用和性能信息。

CAdvisor是一个简单易用的工具，相比于使用Docker命令行工具，用户不用再登录到服务器中即可以可视化图表的形式查看主机上所有容器的运行状态。
而在多主机的情况下，在所有节点上运行一个CAdvisor再通过各自的UI查看监控信息显然不太方便，同时CAdvisor默认只保存2分钟的监控数据。
好消息是CAdvisor已经内置了对Prometheus的支持。访问http://localhost:8080/metrics即可获取到标准的Prometheus监控样本输出

七.帮助文档

cadvisor使用文档：https://github.com/google/cadvisor
grafana 图表模板：https://grafana.com/grafana/dashboards
prometheus 文档：https://prometheus.io/docs/introduction/overview

Centos7 部署 Prometheus、Grafana、Cadvisor

使用 Docker 部署 Grafana + Prometheus 监控 MySQL 数据库

Grafana+Prometheus+Exporter +cAdvisor监控服务器和docker运行状态（一）

K8s 部署 Prometheus + Grafana

Prometheus+Alertmanager+Grafana监控组件容器化部署