系统环境

  • 操作系统: CentOS 7.6
  • Docker 版本: 19.03.5
  • Prometheus 版本: 2.36.0
  • Kubernetes 版本: 1.20.0
  • RabbitMQ 版本: 3.8.27

监控方式

RabbitMQ内部集成Prometheus来获取指标

使用独立程序来获取指标(RabbitMQ_exporter)

不管什么版本都能使用,要单独启动exporter进程rabbitmq_exporter:https://github.com/kbudde/rabbitmq_exporter

本文RabbitMQ集群是部署在k8s集群内,且版本为3.8.27,可以参考博客k8s部署Rabbitmq集群一文,如若你是k8s集群外部的rabbitmq集群,需要通过Endpoints代理到k8s集群中,再执行如下操作,本文会介绍两种监控获取指标的方式

使用rabbitmq-exporter方式监控

部署rabbitmq-exporter

[root@k8s01 rabbitmq]# vim rabbitmq-exporter.yaml 
apiVersion: v1
kind: Service
metadata:
  name: rabbitmq-exporter
  namespace: monitoring
  labels:
    k8s-app: rabbitmq-exporter
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: api
    port: 9099
    protocol: TCP
  selector:
    k8s-app: rabbitmq-exporter
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rabbitmq-exporter
  namespace: monitoring
  labels:
    k8s-app: rabbitmq-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: rabbitmq-exporter
  template:
    metadata:
      labels:
        k8s-app: rabbitmq-exporter
    spec:
      containers:
      - name: rabbitmq-exporter
        image: kbudde/rabbitmq-exporter:v0.29.0
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 9099
        env:
         - name: PUBLISH_PORT
           value: "9099"
         - name: RABBIT_CAPABILITIES
           value: "bert,no_sort"
         - name: RABBIT_USER
           value: "super"
         - name: RABBIT_PASSWORD
           value: "super"
         - name: RABBIT_URL
           value: http://rabbitmq-cluster.tools-env:15672
           
[root@k8s01 rabbitmq]# kubectl apply -f rabbitmq-exporter.yaml 
service/rabbitmq-exporter created
deployment.apps/rabbitmq-exporter created         

部署完成后,可以执行以下命令查看是否收集到指标

[root@k8s01 rabbitmq]# kubectl get -n monitoring po -owide|grep rabbitmq-exporter
rabbitmq-exporter-549b5fddfc-qnjj6   1/1     Running   0          11h     10.244.1.233    k8s05.xxx.local   <none>           <none>

[root@k8s01 rabbitmq]# curl http://10.244.1.233:9099/metrics
...
# HELP rabbitmq_version_info A metric with a constant '1' value labeled by rabbitmq version, erlang version, node, cluster.
# TYPE rabbitmq_version_info gauge
...

Prometheus配置监控rabbitmq

之前k8s部署peometheus了,部署过程中将Prometheus的配置文件存储在Kubernetes的 ConfigMap资源里进行存储,所以需要修改ConfigMap资源中的配置内容,在配置中添加 rabbitmq相关配置

[root@k8s01 prometheus]# cat prometheus-config.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval:     15s
      evaluation_interval: 15s
      external_labels:
        cluster: "kubernetes"

    alerting:
      alertmanagers:
      - static_configs:
        - targets: ["alertmanager:9093"]

    rule_files:
    - /etc/prometheus/*-rule.yml
        
    scrape_configs:
    - job_name: prometheus
      static_configs:
      - targets: ['127.0.0.1:9090']
        labels:
          instance: prometheus 
          
#添加如下配置
    - job_name: 'rabbitmq_nodes'
      static_configs:
      - targets: ['rabbitmq-exporter:9099'] #这里填写rabbitmq-exporter地址
        labels:
          instance: rabbitmq-cluster
...

[root@k8s01 prometheus]# kubectl apply -f prometheus-config.yaml 
configmap/prometheus-config changed

重载prometheus配置

[root@k8s01 prometheus]# curl -XPOST http://10.x.x.x:30089/-/reload

查看Prometheus UI页面

prom-rabbit-1

Grafana引入监控模板

登入Grafana界面,点击Grafana左侧栏菜单,选择Manage菜单,进入后点击右上角 Import按钮,设置Import的ID号为2121,引入rabbitmq-exporter模板,然后点击Load按钮进入配置数据库,选择使用Prometheus数据库,之后点击Import按钮进入看板:
prom-rabbit-2

使用rabbitmq内置方式Prometheus插件监控

由于已经使用部署好了rabbitmq集群,这里不做过多复述,如若你是k8s集群外部的rabbitmq集群,需要通过Endpoints代理到k8s集群中

启动rabbitmq_prometheus插件

已经启动无需执行,如未启动需要执行命令开启

rabbitmq-plugins enable rabbitmq_prometheus

指标校验

rabbitmq_prometheus 插件启用成功后,可以通过curl工具测试是否输出Prometheus需要的指标,如下所示

[root@k8s01 prometheus]# curl -s localhost:15692/metrics | head -n 6
# TYPE erlang_mnesia_held_locks gauge
# HELP erlang_mnesia_held_locks Number of held locks.
erlang_mnesia_held_locks 0

Prometheus配置监控rabbitmq

之前k8s部署peometheus了,部署过程中将Prometheus的配置文件存储在Kubernetes的 ConfigMap资源里进行存储,所以需要修改ConfigMap资源中的配置内容,在配置中添加 rabbitmq相关配置

[root@k8s01 prometheus]# cat prometheus-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval:     15s
      evaluation_interval: 15s
      external_labels:
        cluster: "kubernetes"

    alerting:
      alertmanagers:
      - static_configs:
        - targets: ["alertmanager:9093"]

    rule_files:
    - /etc/prometheus/*-rule.yml
        
    scrape_configs:
    - job_name: prometheus
      static_configs:
      - targets: ['127.0.0.1:9090']
        labels:
          instance: prometheus 

    - job_name: 'rabbitmq_cluster'
      static_configs:
      - targets: ['rabbitmq-0.rabbitmq.tools-env.svc.cluster.local:15692','rabbitmq-1.rabbitmq.tools-env.svc.cluster.local:15692','rabbitmq-2.rabbitmq.tools-env.svc.cluster.local:15692','rabbitmq-3.rabbitmq.tools-env.svc.cluster.local:15692','rabbitmq-4.rabbitmq.tools-env.svc.cluster.local:15692'] #根据实际部署位置修改
        labels:
          instance: rabbitmq-cluster
...
[root@k8s01 prometheus]# kubectl apply -f prometheus-config.yaml 
configmap/prometheus-config configured

重载prometheus配置

[root@k8s01 prometheus]# curl -XPOST http://10.x.x.x:30089/-/reload

查看Prometheus UI页面

prom-rabbit-3

Grafana引入监控模板

登入Grafana界面,点击Grafana左侧栏菜单,选择Manage菜单,进入后点击右上角 Import按钮,设置Import的ID号为10991,引入rabbitmq_prometheus模板,然后点击Load按钮进入配置数据库,选择使用Prometheus数据库,之后点击Import按钮进入看板:
prom-rabbit-4

RabbitMQ告警配置

上述完成了对RabbitMQ集群的监控,平时很少看监控页面,需要配置对应的告警规则来实现对集群的监控,由于已经部署好了prometheus且通过ConfigMap管理配置文件,修改ConfigMap,增加告警规则即可
注意:本文告警规则是基于rabbitmq_prometheus编写的,rabbitmq-exporter并不适用

[root@k8s01 prometheus]# vim prometheus-config.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
...
  rabbitmq-rule.yml: |
    groups:
    - name: RabbitMQ节点宕机
      rules:
      - alert: RabbitmqNodeDown
        expr: sum(rabbitmq_build_info) < 3
        for: 0m
        labels:
          severity: error
        annotations:
          summary: "Rabbitmq node down (instance {{ $labels.instance }})"
          description: "RabbitMQ集群中运行的节点少于3个\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

    - name: RabbitMQ节点未分发
      rules:            
      - alert: RabbitmqNodeNotDistributed
        expr: erlang_vm_dist_node_state < 3
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: "Rabbitmq node not distributed (instance {{ $labels.instance }})"
          description: "RabbitMQ集群分发链接状态未启动\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"         
                
    - name: RabbitMQ内存高于90%
      rules:                    
      - alert: RabbitmqMemoryHigh
        expr: rabbitmq_process_resident_memory_bytes / rabbitmq_resident_memory_limit_bytes * 100 > 90
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Rabbitmq memory high (instance {{ $labels.instance }})"
          description: "RabbitMQ集群节点使用超过90%的已分配 RAM\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"          
                
    - name: RabbitMQ未确认消息过高
      rules:    
      - alert: RabbitmqTooManyUnackMessages
        expr: sum(rabbitmq_queue_messages_unacked) BY (queue) > 1000
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Rabbitmq too many unack messages (instance {{ $labels.instance }})"
          description: "RabbitMQ集群未确认的消息大于1000\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
                  
    - name: RabbitMQ节点总连接数太高
      rules:    
      - alert: RabbitmqTooManyConnections
        expr: rabbitmq_connections > 1000
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Rabbitmq too many connections (instance {{ $labels.instance }})"
          description: "RabbitMQ集群节点的总连接数大于1000\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"               
                  
    - name: RabbitMQ没有队列消费者
      rules:                      
      - alert: RabbitmqNoQueueConsumer
        expr: rabbitmq_queue_consumers < 1
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Rabbitmq no queue consumer (instance {{ $labels.instance }})"
          description: "RabbitMQ集群队列的消费者少于1个\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        
[root@k8s01 prometheus]# kubectl apply -f prometheus-config.yaml 
configmap/prometheus-config configured        

重载prometheus配置

[root@k8s01 prometheus]# curl -XPOST http://10.105.x.x:30089/-/reload

查看Prometheus UI界面

打开Prometheus UI界面,点击Alerts,查看规则是否生效
prom-rabbit-5

文章作者: 鲜花的主人
版权声明: 本站所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 爱吃可爱多
监控服务 Prometheus Kubernetes Kubernetes 监控服务 Prometheus
喜欢就支持一下吧
打赏
微信 微信
支付宝 支付宝