Prometheus是一个开源监控报警系统和时序列数据库,通常会使用Grafana来美化数据展示。
1. 监控系统基础架
1.1核心组件
Prometheus Server, 主要用于抓取数据和存储时序数据,另外还提供查询和 Alert Rule 配置管理。
exporters ,数据采样器,例如采集机器数据的node_exporter,采集MongoDB 信息的 MongoDB exporter 等等。
alertmanager ,用于告警通知管理。
Grafana ,监控数据图表化展示模块。
2. 基础组件安装
由于是学习研究使用,这里通过docker快速安装环境。
2.1 安装Node Exporter
docker-compose-node-export.yml
version: '3'
services:
  node-exporter:
    image: prom/node-exporter
    container_name: node-exporter
    hostname: node-exporter
    restart: always
    ports:
      - "9100:9100"
2.2 安装Alert Manager
docker-compose-alertmanager.yml
version: '3'
services:
  alertmanager:
    image: prom/alertmanager
    container_name: alertmanager
    hostname: alertmanager
    restart: always
    volumes:
      - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
    ports:
      - "9093:9093"
alertmanager.yml
global:
  smtp_smarthost: 'smtp.qq.com:25'  		#QQ服务器
  smtp_from: '793272861@qq.com'        	#发邮件的邮箱
  smtp_auth_username: '793272861@qq.com'  	#发邮件的邮箱用户名,也就是你的邮箱
  smtp_auth_password: '****************'  	#发邮件的邮箱密码
  smtp_require_tls: false        		#不进行tls验证
 
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 10m
  receiver: live-monitoring
receivers:
- name: 'live-monitoring'
  email_configs:
  - to: '793272861@qq.com'        		#收邮件的邮箱
2.3 安装Prometheus
docker-compose-prometheus.yml
version: '3'
services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    hostname: prometheus
    restart: always
    volumes:
      - /data/docker_file/prometheus/data:/prometheus
      - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
 
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['alertmanager:9093']
 
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
 
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
# 配置定时任务,轮询拉取监控数据
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']
  - job_name: 'node-exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['node-exporter:9100']
2.4 安装Grafana
docker-compose-grafana.yml
version: '3'
services:
  grafana:
    image: grafana/grafana
    container_name: grafana
    hostname: grafana
    restart: always
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - /data/docker_file/grafana/data:/var/lib/grafana
      - /data/docker_file/grafana/log:/var/log/grafana
    ports:
      - "3000:3000"
添加数据源(Prometheus)
访问:http://localhost:30000/ , 默认用户名:admin,密码:admin
2.5 Docker-Compose脚本
version: '3'
services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    hostname: prometheus
    restart: always
    volumes:
      - /data/docker_file/prometheus/data:/prometheus
      - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    networks:
      - monitor
  alertmanager:
    image: prom/alertmanager
    container_name: alertmanager
    hostname: alertmanager
    restart: always
    volumes:
      - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
    ports:
      - "9093:9093"
    networks:
      - monitor
  grafana:
    image: grafana/grafana
    container_name: grafana
    hostname: grafana
    restart: always
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - /data/docker_file/grafana/data:/var/lib/grafana
      - /data/docker_file/grafana/log:/var/log/grafana
    ports:
      - "3000:3000"
    networks:
      - monitor
  node-exporter:
    image: prom/node-exporter
    container_name: node-exporter
    hostname: node-exporter
    restart: always
    ports:
      - "9100:9100"
    networks:
      - monitor
networks:
  monitor:
    driver: bridge
 
3. 配置Grafana DashBoard
Grafana通过PromQL查询语句从Prometheus拉取数据,并有Pannel进行渲染,一个个Grafana Pannel 组成一个Grafana DashBoard。
3.1下载Grafana DashBoard文件
可以从官网下载已经写好的Grafana DashBoard文件,导入到我们Grafana系统就可以直接使用。
推荐的Grafana DashBoard
JVM (Micrometer)
Spring Boot 2.1 Statistics
主机基础监控(cpu,内存,磁盘,网络)
Node Exporter for Prometheus Dashboard CN
Druid Connection Pool Dashboard
导入Grafana DashBoard
3.2 添加修改Grafana Panel(扩展)
官方自带的Spring Boot 2.1 Statistics Dashboard没有展示第三方请求的数据报表,我们以此为例,添加第三方请求的Client Request Count报表和Client Response Time报表。
Client Request Count
irate(http_client_requests_seconds_count{instance="$instance", application="$application", uri!~".*actuator.*"}[5m])
注意:应用中的Meter的名称必须为http.client.requests
Client Response Time
irate(http_client_requests_seconds_sum{instance="$instance", application="$application",uri!~".*actuator.*"}[5m]) / irate(http_client_requests_seconds_count{instance="$instance", application="$application",uri!~".*actuator.*"}[5m])
4. Spring Boot 集成Micrometer
Metrics(译:指标,度量)
Micrometer提供了与供应商无关的接口,包括 timers(计时器), gauges(量规), counters(计数器), distribution summaries(分布式摘要), long task timers(长任务定时器)。它具有维度数据模型,当与维度监视系统结合使用时,可以高效地访问特定的命名度量,并能够跨维度深入研究。
4.1 引入依赖
<dependency>
 	<groupId>io.micrometer</groupId>
   	<artifactId>micrometer-registry-prometheus</artifactId>
   	<version>${micrometer.version}</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
4.2 开启Prometheus功能
spring:
  application:
    name: spring-boot-node
management:
  metrics:
    # 1.添加全局的tags,后面可以作为变量搜索数据
    tags:
      application: ${spring.application.name}
  endpoints:
    web:
      exposure:
      	# 2.打开prometheus端点功能
        include: 'health,prometheus'
4.3 实现第三方请求的监控
基于OkHttpMetricsEventListener可以有好的对OkHttp Client的请求进行监控。
配置OkHttp Client事件监听
@Bean("okHttpClient")
public OkHttpClient okHttpClient(ConnectionPool connectionPool) {
    return new OkHttpClient().newBuilder().connectionPool(connectionPool)
            .connectTimeout(5, TimeUnit.SECONDS)
            .readTimeout(10, TimeUnit.SECONDS)
            .eventListener(eventListener())
            .build();
}
/**
* 事件监听器 OkHttpMetricsEventListener
* metricsProperties.getWeb().getClient().getRequestsMetricName() equals 'http.client.request',可称为度量。
* @return
*/
private EventListener eventListener(){
    return OkHttpMetricsEventListener.builder(
    meterRegistry, metricsProperties.getWeb().getClient().getRequestsMetricName())
    .build();
}
原理:OkHttpMetricsEventListener.java
public class OkHttpMetricsEventListener extends EventListener {
    /**
     * Header name for URI patterns which will be used for tag values.
     */
    public static final String URI_PATTERN = "URI_PATTERN";
    @Override
    public void callFailed(Call call, IOException e) {
        CallState state = callState.remove(call);
        if (state != null) {
            state.exception = e;
            // 请求完成时,注册监控数据
            time(state);
        }
    }
    @Override
    public void responseHeadersEnd(Call call, Response response) {
        CallState state = callState.remove(call);
        if (state != null) {
            state.response = response;
            // 请求完成时,注册监控数据
            time(state);
        }
    }
    private void time(CallState state) {
        String uri = state.response == null ? "UNKNOWN" :
            (state.response.code() == 404 || state.response.code() == 301 ? "NOT_FOUND" : urlMapper.apply(state.request));
        // 定义一些Tag或者是变量,在Prometheus和Grafana中可以使用
        Iterable<Tag> tags = Tags.concat(extraTags, Tags.of(
            "method", state.request != null ? state.request.method() : "UNKNOWN",
            "uri", uri,
            "status", getStatusMessage(state.response, state.exception),
            "host", state.request != null ? state.request.url().host() : "UNKNOWN"
        ));
        // 注册计时器监控数据,此时Prometheus可以通过Spring Boot Actuator提供的/actuator/promotheus断点来pull数据
        Timer.builder(this.requestsMetricName)
            .tags(tags)
            .description("Timer of OkHttp operation")
            .register(registry)
            .record(registry.config().clock().monotonicTime() - state.startTime, TimeUnit.NANOSECONDS);
    }
}









