Prometheus Exporter黑盒监控k8s服务
系统环境:
- 操作系统: CentOS 7.6
- Docker 版本: 20.10.8
- Prometheus 版本: 2.36.0
- Kubernetes 版本: 1.20.0
- BlackBox Exporter 版本: 0.21.0
BlackBox Exporter
BlackBox Exporter是Prometheus官方提供的黑盒监控解决方案,允许用户通过HTTP、HTTPS、DNS、TCP以及ICMP的方式对网络进行探测,这种探测方式常常用于探测一个服务的运行状态,观察服务是否正常运行
白盒与黑盒监控
在监控系统中会经常提到白盒监控与黑盒监控两个关键词,对这俩关键词进行一下简单解释
黑盒监控
黑盒监控指的是以用户的身份测试服务的运行状态。常见的黑盒监控手段包括HTTP探针、TCP探针、DNS探测、ICMP等。黑盒监控常用于检测站点与服务可用性、连通性,以及访问效率等
白盒监控
白盒监控一般指的是日常对服务器状态的监控,如服务器资源使用量、容器的运行状态、中间件的稳定情况等一系列比较直观的监控数据,这些都是支撑业务应用稳定运行的基础设施。通过白盒能监控,可以使我们能够了解系统内部的实际运行状况,而且还可以通过对监控指标数据的观察与分析,可以让我们提前预判服务器可能出现的问题,针对可能出现的问题进行及时修正,避免造成不可预估的损失
白盒与黑盒监控的区别
黑盒监控与白盒监控有着很大的不同,俩者的区别主要是,黑盒监控是以故障为主导,当被监控的服务发生故障时,能快速进行预警。而白盒监控则更偏向于主动的和提前预判方式,预测可能发生的故障。
一套完善的监控系统是需要黑盒监控与白盒监控俩者配合同时工作的,白盒监控预判可能存在的潜在问题,而黑盒监控则是快速发现已经发生的问题
k8s部署BlackBox Exporter
创建BlackBox的ConfigMap
为了方便对BlackBox Exporter组件的配置参数进行修改,将其配置文件存入 ConfigMap资源中,其中ConfigMap资源文件blackbox-exporter-config.yaml
[root@k8s01 blackbox]# vim blackbox-exporter-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: blackbox-exporter
namespace: monitoring
labels:
app: blackbox-exporter
data:
blackbox.yml: |-
modules:
## ----------- DNS 检测配置 -----------
dns_tcp:
prober: dns
dns:
transport_protocol: "tcp"
preferred_ip_protocol: "ip4"
query_name: "kubernetes.default.svc.cluster.local" # 用于检测域名可用的网址
query_type: "A"
## ----------- TCP 检测模块配置 -----------
tcp_connect:
prober: tcp
timeout: 5s
## ----------- ICMP 检测配置 -----------
ping:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: "ip4"
## ----------- HTTP GET 2xx 检测模块配置 -----------
http_get_2xx:
prober: http
timeout: 10s
http:
method: GET
preferred_ip_protocol: "ip4"
valid_http_versions: ["HTTP/1.1","HTTP/2"]
valid_status_codes: [200] # 验证的HTTP状态码,默认为2xx
no_follow_redirects: false # 是否不跟随重定向
## ----------- HTTP GET 3xx 检测模块配置 -----------
http_get_3xx:
prober: http
timeout: 10s
http:
method: GET
preferred_ip_protocol: "ip4"
valid_http_versions: ["HTTP/1.1","HTTP/2"]
valid_status_codes: [301,302,304,305,306,307] # 验证的HTTP状态码,默认为2xx
no_follow_redirects: false # 是否不跟随重定向
## ----------- HTTP POST 监测模块 -----------
http_post_2xx:
prober: http
timeout: 10s
http:
method: POST
preferred_ip_protocol: "ip4"
valid_http_versions: ["HTTP/1.1", "HTTP/2"]
#headers: # HTTP头设置
# Content-Type: application/json
#body: '{}' # 请求体设置
[root@k8s01 blackbox]# kubectl apply -f blackbox-exporter-config.yaml
configmap/blackbox-exporter created
参考 BlackBox Exporter 的 Github 提供的 示例配置文件
创建BlackBox Exporter的Deployment
[root@k8s01 blackbox]# vim blackbox-exporter-deploy.yaml
apiVersion: v1
kind: Service
metadata:
name: blackbox-exporter
namespace: monitoring
labels:
k8s-app: blackbox-exporter
spec:
type: ClusterIP
ports:
- name: http
port: 9115
targetPort: 9115
selector:
k8s-app: blackbox-exporter
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: blackbox-exporter
namespace: monitoring
labels:
k8s-app: blackbox-exporter
spec:
replicas: 1
selector:
matchLabels:
k8s-app: blackbox-exporter
template:
metadata:
labels:
k8s-app: blackbox-exporter
spec:
containers:
- name: blackbox-exporter
image: prom/blackbox-exporter:v0.21.0
imagePullPolicy: IfNotPresent
args:
- --config.file=/etc/blackbox_exporter/blackbox.yml
- --web.listen-address=:9115
- --log.level=info
ports:
- name: http
containerPort: 9115
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 100m
memory: 50Mi
livenessProbe:
tcpSocket:
port: 9115
initialDelaySeconds: 5
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
readinessProbe:
tcpSocket:
port: 9115
initialDelaySeconds: 5
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
volumeMounts:
- name: config
mountPath: /etc/blackbox_exporter
volumes:
- name: config
configMap:
name: blackbox-exporter
defaultMode: 420
[root@k8s01 blackbox]# kubectl apply -f blackbox-exporter-deploy.yaml
service/blackbox-exporter created
deployment.apps/blackbox-exporter created
查看BlackBox Exporter状态
[root@k8s01 blackbox]# kubectl get -n monitoring po|grep blackbox
blackbox-exporter-5bbdb6b595-fhpxq 1/1 Running 0 63s
[root@k8s01 blackbox]# kubectl get -n monitoring svc|grep blackbox
blackbox-exporter ClusterIP 10.96.0.25 <none> 9115/TCP
Prometheus添加探测配置
创建DNS探测配置
创建Prometheus规则,添加使用BlackBox Exporter探测指定DNS服务器健康状态的配置,由于已经基于k8s部署prometheus,且将其配置参数写到ConfigMap资源中,然后通过挂载ConfigMap到Pod内部,这样修改ConfigMap就可以修改Prometheus配置
[root@k8s01 prometheus]# vim prometheus-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: "kubernetes"
scrape_configs:
...
################################## Kubernetes BlackBox DNS ###################################
- job_name: "kubernetes-dns"
metrics_path: /probe
params:
module: [dns_tcp]
static_configs:
- targets:
- kube-dns.kube-system:53
- 8.8.4.4:53
- 8.8.8.8:53
- 223.5.5.5
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter.monitoring:9115
上述参数解释:
################ DNS 服务器监控 ###################
- job_name: "kubernetes-dns"
metrics_path: /probe
params:
## 配置要使用的模块,要与blackbox exporter配置中的一致
## 这里使用DNS模块
module: [dns_tcp]
static_configs:
## 配置要检测的地址
- targets:
- kube-dns.kube-system:53
- 8.8.4.4:53
- 8.8.8.8:53
- 223.5.5.5
relabel_configs:
## 将上面配置的静态DNS服务器地址转换为临时变量 “__param_target”
- source_labels: [__address__]
target_label: __param_target
## 将 “__param_target” 内容设置为 instance 实例名称
- source_labels: [__param_target]
target_label: instance
## BlackBox Exporter 的 Service 地址
- target_label: __address__
replacement: blackbox-exporter.monitoring:9115
################ DNS 服务器监控 ###################
- job_name: "kubernetes-dns"
metrics_path: /probe
params:
## 配置要使用的模块,要与blackbox exporter配置中的一致
## 这里使用DNS模块
module: [dns_tcp]
static_configs:
## 配置要检测的地址
- targets:
- kube-dns.kube-system:53
- 8.8.4.4:53
- 8.8.8.8:53
- 223.5.5.5
relabel_configs:
## 将上面配置的静态DNS服务器地址转换为临时变量 “__param_target”
- source_labels: [__address__]
target_label: __param_target
## 将 “__param_target” 内容设置为 instance 实例名称
- source_labels: [__param_target]
target_label: instance
## BlackBox Exporter 的 Service 地址
- target_label: __address__
replacement: blackbox-exporter.monitoring:9115
特别注意BlackBox Exporter的Service地址,根据blackbox部署的namespace来更改
创建Service探测配置
创建用于探测Kubernetes服务的配置,对那些配置了prometheus.io/http-probe: “true” 标签的Kubernetes Service资源的健康状态进行探测,由于已经基于k8s部署prometheus,且将其配置参数写到ConfigMap资源中,然后通过挂载ConfigMap到Pod内部,这样修改ConfigMap就可以修改Prometheus配置
[root@k8s01 prometheus]# vim prometheus-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: "kubernetes"
scrape_configs:
...
################################## Kubernetes BlackBox Services ###################################
- job_name: 'kubernetes-services'
metrics_path: /probe
params:
module:
- "http_get_2xx"
- "http_get_3xx"
kubernetes_sd_configs:
- role: service
relabel_configs:
- action: keep
source_labels: [__meta_kubernetes_service_annotation_prometheus_io_http_probe]
regex: "true"
- action: replace
source_labels:
- "__meta_kubernetes_service_name"
- "__meta_kubernetes_namespace"
- "__meta_kubernetes_service_annotation_prometheus_io_http_probe_port"
- "__meta_kubernetes_service_annotation_prometheus_io_http_probe_path"
target_label: __param_target
regex: (.+);(.+);(.+);(.+)
replacement: $1.$2:$3$4
- target_label: __address__
replacement: blackbox-exporter.monitoring:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
上述参数解释
- job_name: "kubernetes-services"
metrics_path: /probe
## 使用HTTP_GET_2xx与HTTP_GET_3XX模块
params:
module:
- "http_get_2xx"
- "http_get_3xx"
## 使用Kubernetes动态服务发现,且使用Service类型的发现
kubernetes_sd_configs:
- role: service
relabel_configs:
## 设置只监测Kubernetes Service中Annotation里配置了注解prometheus.io/http_probe: true的service
- action: keep
source_labels: [__meta_kubernetes_service_annotation_prometheus_io_http_probe]
regex: "true"
- action: replace
source_labels:
- "__meta_kubernetes_service_name"
- "__meta_kubernetes_namespace"
- "__meta_kubernetes_service_annotation_prometheus_io_http_probe_port"
- "__meta_kubernetes_service_annotation_prometheus_io_http_probe_path"
target_label: __param_target
regex: (.+);(.+);(.+);(.+)
replacement: $1.$2:$3$4
## BlackBox Exporter 的 Service 地址
- target_label: __address__
replacement: blackbox-exporter.monitoring:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
特别注意BlackBox Exporter的Service地址,根据blackbox部署的namespace来更改
更新Prometheus的ConfigMap
[root@k8s01 prometheus]# kubectl apply -f prometheus-config.yaml
configmap/prometheus-config configured
Prometheus重新加载配置
[root@k8s01 prometheus]# curl -XPOST http://10.105.x.x:30089/-/reload
Prometheus监控探测k8s应用
本文使用k8s部署nginx应用做测试,部署nginx的nginx-deploy.yaml内容如下:
[root@k8s01 ~]# vim nginx-deploy.yaml
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
k8s-app: nginx
annotations:
prometheus.io/http-probe: "true" ### 设置该服务执行HTTP探测
prometheus.io/http-probe-port: "80" ### 设置HTTP探测的接口
prometheus.io/http-probe-path: "/" ### 设置HTTP探测的地址
spec:
type: ClusterIP
ports:
- name: http
port: 80
targetPort: 80
selector:
app: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.19
ports:
- containerPort: 80
[root@k8s01 ~]# kubectl apply -f nginx-deploy.yaml
service/nginx created
deployment.apps/nginx created
查看Prometheus UI界面
访问Prometheus的UI界面,进入Status中的Targets界面查看:
可以看到Prometheus已经按照配置的DNS服务器地址列表,执行定时探测DNS服务器 的健康状况,除此之外Prometheus还会定期探测那些添加了特定注解的,存在于k8s集群中的Service资源的健康状态
Grafana引BlackBox Exporter监控模板
登入Grafana界面,点击Grafana左侧栏菜单,选择Manage菜单,进入后点击右上角 Import按钮,设置Import的ID号为9965,引入BlackBox Exporter模板,然后点击Load按钮进入配置数据库,选择使用Prometheus数据库,之后点击Import按钮进入看板: