Prometheus Blackbox 端点监控教程
Blackbox Exporter 是 Prometheus 官方提供的黑盒探测工具,可以通过 HTTP、HTTPS、DNS、TCP、ICMP 等协议对外部端点进行主动探测。它特别适合监控网站可用性、API 响应时间、SSL 证书过期等场景。本文将在搬瓦工 VPS 上部署 Blackbox Exporter 并集成到 Prometheus 监控体系中。
一、安装 Blackbox Exporter
1.1 二进制安装
useradd --no-create-home --shell /bin/false blackbox_exporter
cd /tmp
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz
tar xzf blackbox_exporter-0.25.0.linux-amd64.tar.gz
cp blackbox_exporter-0.25.0.linux-amd64/blackbox_exporter /usr/local/bin/
chown blackbox_exporter:blackbox_exporter /usr/local/bin/blackbox_exporter
mkdir -p /etc/blackbox_exporter
1.2 Docker 安装
docker run -d \
--name blackbox-exporter \
--restart unless-stopped \
-p 9115:9115 \
-v /opt/blackbox/blackbox.yml:/config/blackbox.yml \
prom/blackbox-exporter:latest \
--config.file=/config/blackbox.yml
二、配置探测模块
cat > /etc/blackbox_exporter/blackbox.yml <<'EOF'
modules:
http_2xx:
prober: http
timeout: 10s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
valid_status_codes: [200, 301, 302]
method: GET
follow_redirects: true
preferred_ip_protocol: "ip4"
http_post_2xx:
prober: http
timeout: 10s
http:
method: POST
valid_status_codes: [200, 201]
tcp_connect:
prober: tcp
timeout: 10s
icmp_ping:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: "ip4"
dns_resolve:
prober: dns
timeout: 5s
dns:
query_name: "example.com"
query_type: "A"
valid_rcodes:
- NOERROR
http_ssl:
prober: http
timeout: 10s
http:
valid_status_codes: [200, 301, 302]
method: GET
fail_if_ssl: false
fail_if_not_ssl: true
tls_config:
insecure_skip_verify: false
EOF
三、创建 Systemd 服务
cat > /etc/systemd/system/blackbox_exporter.service <<EOF
[Unit]
Description=Blackbox Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=blackbox_exporter
Group=blackbox_exporter
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter \
--config.file=/etc/blackbox_exporter/blackbox.yml \
--web.listen-address=:9115
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl start blackbox_exporter
systemctl enable blackbox_exporter
ICMP 探测需要额外权限:
setcap cap_net_raw+ep /usr/local/bin/blackbox_exporter
四、Prometheus 集成配置
编辑 /etc/prometheus/prometheus.yml,添加 Blackbox 抓取任务:
4.1 HTTP 端点监控
scrape_configs:
- job_name: 'blackbox-http'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://your-website.com
- https://api.your-service.com
- https://blog.your-domain.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
4.2 TCP 端口监控
- job_name: 'blackbox-tcp'
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- your-server:22
- your-server:3306
- your-server:6379
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
4.3 ICMP Ping 监控
- job_name: 'blackbox-icmp'
metrics_path: /probe
params:
module: [icmp_ping]
static_configs:
- targets:
- 8.8.8.8
- your-other-server-ip
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
五、SSL 证书监控
Blackbox Exporter 会自动采集 SSL 证书信息,包括过期时间。常用的告警规则:
groups:
- name: ssl_alerts
rules:
- alert: SSLCertExpiringSoon
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
for: 1h
labels:
severity: warning
annotations:
summary: "SSL 证书即将过期: {{ $labels.instance }}"
description: "SSL 证书将在 {{ $value | humanizeDuration }} 后过期。"
- alert: SSLCertExpired
expr: probe_ssl_earliest_cert_expiry - time() < 0
for: 0m
labels:
severity: critical
annotations:
summary: "SSL 证书已过期: {{ $labels.instance }}"
六、告警规则
cat > /etc/prometheus/rules/blackbox_alerts.yml <<'EOF'
groups:
- name: blackbox_alerts
rules:
- alert: EndpointDown
expr: probe_success == 0
for: 2m
labels:
severity: critical
annotations:
summary: "端点不可达: {{ $labels.instance }}"
description: "{{ $labels.instance }} 连续 2 分钟探测失败。"
- alert: SlowResponse
expr: probe_duration_seconds > 5
for: 5m
labels:
severity: warning
annotations:
summary: "响应缓慢: {{ $labels.instance }}"
description: "{{ $labels.instance }} 响应时间超过 5 秒。"
- alert: HttpStatusError
expr: probe_http_status_code >= 400
for: 2m
labels:
severity: warning
annotations:
summary: "HTTP 错误: {{ $labels.instance }}"
description: "返回状态码 {{ $value }}。"
EOF
七、Grafana 仪表板
在 Grafana 中导入 Blackbox Exporter 仪表板:
- 仪表板 ID
7587:Prometheus Blackbox Exporter,展示所有探测目标的可用性和响应时间。 - 仪表板 ID
13659:Blackbox Exporter (HTTP prober),专注 HTTP 探测指标。
常用 PromQL 查询
# 探测成功率
avg_over_time(probe_success[1h]) * 100
# HTTP 响应时间
probe_duration_seconds
# SSL 证书剩余天数
(probe_ssl_earliest_cert_expiry - time()) / 86400
# HTTP 状态码
probe_http_status_code
# DNS 解析时间
probe_dns_lookup_time_seconds
八、手动探测测试
# HTTP 探测
curl "http://localhost:9115/probe?target=https://your-website.com&module=http_2xx"
# TCP 探测
curl "http://localhost:9115/probe?target=your-server:22&module=tcp_connect"
# 查看指标详情
curl "http://localhost:9115/probe?target=https://your-website.com&module=http_2xx" | grep probe_
九、常见问题
ICMP 探测失败
需要给 blackbox_exporter 二进制文件设置 CAP_NET_RAW 权限,或以 root 用户运行。
DNS 探测超时
检查 VPS 的 DNS 配置是否正常,尝试使用指定的 DNS 服务器进行探测。
总结
Blackbox Exporter 是 Prometheus 生态中实现外部探测监控的关键组件,特别适合监控网站可用性和 SSL 证书状态。配合 AlertManager 可以实现端点故障的自动告警通知。选购搬瓦工 VPS 请参考 全部方案,使用优惠码 NODESEEK2026 可享受 6.77% 的循环折扣。