Prometheus怎么用来帮助解决性能问题之准备篇

程序员文章站 2024-03-17 16:22:04

...

Prometheus怎么用来帮助解决性能问题？

问题分析：

解决性能问题，首先你得知道问题什么时候发生的，发生的时候具体发生了什么？

Prometheus里的TSDB已经记录了问题发生的时间和数据，这些数据从哪儿来，我们要给prometheus提供哪些数据呢？

我们一般就是监控各个节点的CPU, MEMORY, NETWORK，JVM。

这些指数可以告诉你执行效率，内存使用情况，网络吞吐量，JVM垃圾回收的情况。有了这些我们就可以开始判断这个服务器是否有病，哦不，是否有性能问题。

准备工作：

首先我们要监控的就是app，一般有问题的就是它，其次是系统硬件问题，所以我们先写一个基于springboot的app，其中包含了正常的和非正常的（慢的，占资源多的）接口。监控肯定是靠spring boot actuator和micrometer集成到prometheus，获得jvm的情况。

JVM使用spring boot actuator自带的JVM metrics就够了，不过要和prometheus一起使用还要带上它的好基友micrometer，它就像log里slf4j，不干实事，就是一个统一接口的facade，把两者连接起来了，代码里在方法上@Timed下就好了。具体依赖如下

<dependency>

    <groupId>org.springframework.boot</groupId>

    <artifactId>spring-boot-starter-actuator</artifactId>

</dependency>

<dependency>

    <groupId>io.micrometer</groupId>

    <artifactId>micrometer-core</artifactId>

</dependency>

<dependency>

    <groupId>io.micrometer</groupId>

    <artifactId>micrometer-registry-prometheus</artifactId>

</dependency>

具体实现的代码见：https://github.com/nealshan2/performance-sample-api

开始监控前的app与prometheus集成：

启动app和prometheus，访问http://localhost:8080/normal/say，app没问题。

上一篇中已经讲解了如何启动prometheus和如何配置。没玩过的请参考上一篇。

打开http://localhost:9090，进入菜单status下的targets选项，发现metrics接口404找不到：

原来到里spring boot 2.0里，这些metrics要自己手动inlcude

management:

    endpoints:

        web:

            exposure:

                include: health,info,prometheus

重启api，可以在http://localhost:8080/actuator里看到prometheus了，点开里面已经有很多jvm的信息了。

但是我们的prometheus里还是404，因为路径不是http://localhost:8080/metrics，而是

http://localhost:8080/actuator/prometheus

所以我们prometheus的配置里还要重新指定下metrics的路径

metrics_path: '/actuator/prometheus'

具体代码：

scrape_configs:

    - job_name: 'perforamnce-sample-api'

    # Override the global default and scrape targets from this job every 5 seconds.

    scrape_interval: 5s

    metrics_path: '/actuator/prometheus'

    static_configs:

        - targets: ['localhost:8080']
    
          labels:

            group: 'dev'

重启prometheus，访问http://localhost:9090/targets 都已经可以访问了

第一个监控：

这时候我们进入到graph下，随便选一个metric。

例如：http_server_requests_seconds_count

执行看看请求数，如果为0，你可以先请求下我们的http://localhost:8080/normal/say

然后在执行下metric，就会发现已经被监控了请求数了。

Prometheus怎么用来帮助解决性能问题之准备篇

光看数字怎么行，可以选择用自带的graph

Prometheus怎么用来帮助解决性能问题之准备篇

自带的Graph不能满足你？Grafana了解一下。

https://prometheus.io/docs/visualization/grafana/

按照链接里的官方教程，你也许可以启动了，配置很简单，大家自己花两分钟启动下Grafana。

新建一个Dashboard，在添加一个Panel，可以是任何图形，选择metric是http_server_requests_seconds_count。

输入http就有提示，还可以试试jvm。下一篇我们会讲讲如何更好的利用metric来监控和分析出接口的问题。

集成Grafana唯一遇到的小问题就是，如果你的metric在prometheus里能看到，但是在Grafana里却找不到，重新保存下datasource就行了。

到这里我们只是监控了app的请求数。

下一篇，我们将来写好自己prometheus rules，配合Grafana的图形界面，可以很好的定位问题。