Promethues + Grafana + AlertManager使用总结

Prometheus是一个开源监控报警系统和时序列数据库,通常会使用Grafana来美化数据展示。

1. 监控系统基础架

1.1核心组件

Prometheus Server, 主要用于抓取数据和存储时序数据,另外还提供查询和 Alert Rule 配置管理。
exporters ,数据采样器,例如采集机器数据的node_exporter,采集MongoDB 信息的 MongoDB exporter 等等。
alertmanager ,用于告警通知管理。
Grafana ,监控数据图表化展示模块。

2. 基础组件安装

由于是学习研究使用,这里通过docker快速安装环境。

2.1 安装Node Exporter

docker-compose-node-export.yml

version: '3'
services:
  node-exporter:
    image: prom/node-exporter
    container_name: node-exporter
    hostname: node-exporter
    restart: always
    ports:
      - "9100:9100"

2.2 安装Alert Manager

docker-compose-alertmanager.yml

version: '3'
services:
  alertmanager:
    image: prom/alertmanager
    container_name: alertmanager
    hostname: alertmanager
    restart: always
    volumes:
      - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
    ports:
      - "9093:9093"

alertmanager.yml

global:
  smtp_smarthost: 'smtp.qq.com:25'  		#QQ服务器
  smtp_from: '793272861@qq.com'        	#发邮件的邮箱
  smtp_auth_username: '793272861@qq.com'  	#发邮件的邮箱用户名,也就是你的邮箱
  smtp_auth_password: '****************'  	#发邮件的邮箱密码
  smtp_require_tls: false        		#不进行tls验证
 
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 10m
  receiver: live-monitoring

receivers:
- name: 'live-monitoring'
  email_configs:
  - to: '793272861@qq.com'        		#收邮件的邮箱

2.3 安装Prometheus

docker-compose-prometheus.yml

version: '3'
services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    hostname: prometheus
    restart: always
    volumes:
      - /data/docker_file/prometheus/data:/prometheus
      - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
 
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['alertmanager:9093']
 
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
 
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
# 配置定时任务,轮询拉取监控数据
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']
  - job_name: 'node-exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['node-exporter:9100']

Prometheus服务发现机制

通过consul实现自动服务发现

访问:http://localhost:9090/

2.4 安装Grafana

docker-compose-grafana.yml

version: '3'
services:
  grafana:
    image: grafana/grafana
    container_name: grafana
    hostname: grafana
    restart: always
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - /data/docker_file/grafana/data:/var/lib/grafana
      - /data/docker_file/grafana/log:/var/log/grafana
    ports:
      - "3000:3000"

添加数据源(Prometheus)

访问:http://localhost:30000/ , 默认用户名:admin,密码:admin

2.5 Docker-Compose脚本

version: '3'
services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    hostname: prometheus
    restart: always
    volumes:
      - /data/docker_file/prometheus/data:/prometheus
      - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    networks:
      - monitor
  alertmanager:
    image: prom/alertmanager
    container_name: alertmanager
    hostname: alertmanager
    restart: always
    volumes:
      - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
    ports:
      - "9093:9093"
    networks:
      - monitor
  grafana:
    image: grafana/grafana
    container_name: grafana
    hostname: grafana
    restart: always
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - /data/docker_file/grafana/data:/var/lib/grafana
      - /data/docker_file/grafana/log:/var/log/grafana
    ports:
      - "3000:3000"
    networks:
      - monitor
  node-exporter:
    image: prom/node-exporter
    container_name: node-exporter
    hostname: node-exporter
    restart: always
    ports:
      - "9100:9100"
    networks:
      - monitor
networks:
  monitor:
    driver: bridge
 

3. 配置Grafana DashBoard

Grafana通过PromQL查询语句从Prometheus拉取数据,并有Pannel进行渲染,一个个Grafana Pannel 组成一个Grafana DashBoard。

3.1下载Grafana DashBoard文件

可以从官网下载已经写好的Grafana DashBoard文件,导入到我们Grafana系统就可以直接使用。

推荐的Grafana DashBoard

JVM (Micrometer)
Spring Boot 2.1 Statistics
主机基础监控(cpu,内存,磁盘,网络)
Node Exporter for Prometheus Dashboard CN
Druid Connection Pool Dashboard

导入Grafana DashBoard

3.2 添加修改Grafana Panel(扩展)

官方自带的Spring Boot 2.1 Statistics Dashboard没有展示第三方请求的数据报表,我们以此为例,添加第三方请求的Client Request Count报表和Client Response Time报表。

Client Request Count

irate(http_client_requests_seconds_count{instance="$instance", application="$application", uri!~".*actuator.*"}[5m])

注意:应用中的Meter的名称必须为http.client.requests

Client Response Time

irate(http_client_requests_seconds_sum{instance="$instance", application="$application",uri!~".*actuator.*"}[5m]) / irate(http_client_requests_seconds_count{instance="$instance", application="$application",uri!~".*actuator.*"}[5m])

4. Spring Boot 集成Micrometer

Metrics(译:指标,度量)

Micrometer提供了与供应商无关的接口,包括 timers(计时器)gauges(量规)counters(计数器)distribution summaries(分布式摘要)long task timers(长任务定时器)。它具有维度数据模型,当与维度监视系统结合使用时,可以高效地访问特定的命名度量,并能够跨维度深入研究。

4.1 引入依赖

<dependency>
 	<groupId>io.micrometer</groupId>
   	<artifactId>micrometer-registry-prometheus</artifactId>
   	<version>${micrometer.version}</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

4.2 开启Prometheus功能

spring:
  application:
    name: spring-boot-node

management:
  metrics:
    # 1.添加全局的tags,后面可以作为变量搜索数据
    tags:
      application: ${spring.application.name}
  endpoints:
    web:
      exposure:
      	# 2.打开prometheus端点功能
        include: 'health,prometheus'

4.3 实现第三方请求的监控

基于OkHttpMetricsEventListener可以有好的对OkHttp Client的请求进行监控。

配置OkHttp Client事件监听

@Bean("okHttpClient")
public OkHttpClient okHttpClient(ConnectionPool connectionPool) {
    return new OkHttpClient().newBuilder().connectionPool(connectionPool)
            .connectTimeout(5, TimeUnit.SECONDS)
            .readTimeout(10, TimeUnit.SECONDS)
            .eventListener(eventListener())
            .build();
}

/**
* 事件监听器 OkHttpMetricsEventListener
* metricsProperties.getWeb().getClient().getRequestsMetricName() equals 'http.client.request',可称为度量。
* @return
*/
private EventListener eventListener(){
    return OkHttpMetricsEventListener.builder(
    meterRegistry, metricsProperties.getWeb().getClient().getRequestsMetricName())
    .build();
}

原理:OkHttpMetricsEventListener.java

public class OkHttpMetricsEventListener extends EventListener {

    /**
     * Header name for URI patterns which will be used for tag values.
     */
    public static final String URI_PATTERN = "URI_PATTERN";

    @Override
    public void callFailed(Call call, IOException e) {
        CallState state = callState.remove(call);
        if (state != null) {
            state.exception = e;
            // 请求完成时,注册监控数据
            time(state);
        }
    }

    @Override
    public void responseHeadersEnd(Call call, Response response) {
        CallState state = callState.remove(call);
        if (state != null) {
            state.response = response;
            // 请求完成时,注册监控数据
            time(state);
        }
    }

    private void time(CallState state) {
        String uri = state.response == null ? "UNKNOWN" :
            (state.response.code() == 404 || state.response.code() == 301 ? "NOT_FOUND" : urlMapper.apply(state.request));

        // 定义一些Tag或者是变量,在Prometheus和Grafana中可以使用
        Iterable<Tag> tags = Tags.concat(extraTags, Tags.of(
            "method", state.request != null ? state.request.method() : "UNKNOWN",
            "uri", uri,
            "status", getStatusMessage(state.response, state.exception),
            "host", state.request != null ? state.request.url().host() : "UNKNOWN"
        ));

        // 注册计时器监控数据,此时Prometheus可以通过Spring Boot Actuator提供的/actuator/promotheus断点来pull数据
        Timer.builder(this.requestsMetricName)
            .tags(tags)
            .description("Timer of OkHttp operation")
            .register(registry)
            .record(registry.config().clock().monotonicTime() - state.startTime, TimeUnit.NANOSECONDS);
    }

}

4.4 Spring Boot集成案例

Spring Boot Node

5. 参考文档

【1】Grafana Dashboards

【2】Centos7.X 搭建Prometheus+node-exporter+Grafana实时监控平台

【3】Micrometer 快速入门

【4】JVM应用度量框架Micrometer实战

【5】SpringBoot+Prometheus:微服务开发中自定义业务监控指标的几点经验

Published by

风君子

独自遨游何稽首 揭天掀地慰生平

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注