kube-prometheus-stack 面试题
30 道题- 分类
- 可观测性
- 子分类
- metrics
- 题目数
- 30 道
1 kube-prometheus-stack 的核心架构是什么?
答案:
kube-prometheus-stack 是 Prometheus Operator、Prometheus、Alertmanager、Grafana 和 Node Exporter 等组件的组合 Helm Chart,提供 K8s 集群的完整监控方案。
架构组成:
kube-prometheus-stack (Helm Chart)
│
├── Prometheus Operator → CRD 控制器
├── Prometheus → Server 实例
├── Alertmanager → 告警管理集群
├── Grafana → 可视化面板
├── Node Exporter → 节点指标
├── Kube State Metrics → K8s 对象状态
├── Prometheus Adapter → 自定义/HPA Metrics
└── Exporters → 其他 Exporters
核心 CRD:
| CRD | API 版本 | 用途 |
|---|---|---|
| Prometheus | monitoring.coreos.com/v1 | Prometheus 实例 |
| Alertmanager | monitoring.coreos.com/v1 | Alertmanager 实例 |
| ServiceMonitor | monitoring.coreos.com/v1 | Service 指标采集 |
| PodMonitor | monitoring.coreos.com/v1 | Pod 指标采集 |
| PrometheusRule | monitoring.coreos.com/v1 | 告警和记录规则 |
| Probe | monitoring.coreos.com/v1 | Blackbox 探针 |
| AlertmanagerConfig | monitoring.coreos.com/v1 | 告警路由配置 |
2 kube-prometheus-stack 默认采集哪些指标?
答案:
kube-prometheus-stack 内置多个默认指标采集任务,覆盖 K8s 集群核心组件。
默认 Job:
| Job 名 | 采集目标 | 指标用途 |
|---|---|---|
| kubernetes-apiservers | API Server | 请求延迟、错误率、资源版本 |
| kubernetes-nodes | Node 节点 | 节点 CPU/内存/磁盘 |
| kubernetes-cadvisor | kubelet cAdvisor | 容器 CPU/内存/网络/磁盘 |
| kubernetes-service-endpoints | Service 端点 | 应用指标 |
| kubernetes-pods | Annotation 标记的 Pod | 应用自定义指标 |
| kubelet | kubelet 指标 | Pod 状态、容器操作 |
| kube-state-metrics | KSM 服务 | Deployment/Pod/Node 等对象状态 |
| node-exporter | 节点相关 | 节点硬件、OS、文件系统 |
| prometheus-operator | Operator 自身 | Operator 运行状态 |
| prometheus | Prometheus 自身 | 采集/存储/查询性能 |
| alertmanager | Alertmanager | 告警处理状态 |
| grafana | Grafana | Grafana 运行状态 |
| windows-exporter | Windows 节点 | Windows 节点指标 |
3 ServiceMonitor 的工作原理是什么?
答案:
ServiceMonitor 是 kube-prometheus-stack 的核心 CRD,定义如何从 Service 后端采集指标。
工作流程:
```mermaid
graph TD
SM["ServiceMonitor CRD"] -->|"selector 匹配 Service"| SVC["Service (Selector)"]
SVC -->|"label 匹配 Pod"| POD["Pod (metrics 端点)"]
POD -->|"/metrics"| PROM["Prometheus Server"]
**ServiceMonitor 定义:**
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-monitor
namespace: monitoring
spec:
# 选择要监控的 Service
selector:
matchLabels:
app: myapp
# 采集端点
endpoints:
- port: http-metrics # Service 端口名
interval: 15s
path: /metrics
scheme: http
timeout: 10s
# 过滤和改写标签
relabelings:
- sourceLabels: [__meta_kubernetes_pod_node_name]
targetLabel: node
# 指标过滤
metricRelabelings:
- sourceLabels: [__name__]
regex: "go_.*"
action: drop
# Service 所属命名空间
namespaceSelector:
matchNames:
- default
- production
关联条件:
ServiceMonitor → Service
ServiceMonitor.spec.selector 匹配 Service.metadata.labels
ServiceMonitor.namespaceSelector 匹配 Service 的命名空间
Service → Pod
Service.spec.selector 匹配 Pod.metadata.labels
Pod 的端口名匹配 ServiceMonitor.endpoints.port
4 PodMonitor 与 ServiceMonitor 的区别是什么?
答案:
PodMonitor 直接采集 Pod 指标,不经过 Service 层;ServiceMonitor 通过 Service 发现后端 Pod。
对比分析:
| 维度 | ServiceMonitor | PodMonitor |
|---|---|---|
| 采集入口 | Service 端点 | Pod 直接采集 |
| LB 负载均衡 | Service 天然负载均衡 | 直接访问每个 Pod |
| Endpoint 过滤 | 通过 Service Label | 通过 Pod Label + annotation |
| 适用场景 | Deployment 类型负载 | DaemonSet / StatefulSet |
| 端口发现 | Service 端口名 | Pod 容器端口名 |
| 网络策略 | 需开放 Service 端口 | 需直接访问 Pod IP |
PodMonitor 示例:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: myapp-podmonitor
spec:
selector:
matchLabels:
app: myapp-daemon
podMetricsEndpoints:
- port: metrics
interval: 10s
path: /metrics
namespaceSelector:
any: true
选择建议:
ServiceMonitor:
- Deployment 或 ReplicaSet(多副本)
- 需要 Service 层负载均衡
- Service 端口有明确的 metrics 端口
PodMonitor:
- DaemonSet(每个节点一个 Pod)
- Headless Service 场景
- StatefulSet(每个 Pod 独立采集)
- 需要采集每个 Pod 的精确指标
5 kube-prometheus-stack 的 PrometheusRule 如何管理告警规则?
答案:
PrometheusRule CRD 将 Prometheus 告警规则和记录规则作为 K8s 资源管理,支持动态更新。
PrometheusRule 结构:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: node-alerts
labels:
role: alert-rules
prometheus: k8s
spec:
groups:
# 告警规则组
- name: node-alerts
interval: 30s
rules:
- alert: NodeCPUUsageHigh
expr: (100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Node XQOPEN $labels.instance XQCLOSE CPU usage > 80%"
- alert: NodeMemoryUsageHigh
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
for: 5m
labels:
severity: critical
annotations:
summary: "Node XQOPEN $labels.instance XQCLOSE memory > 90%"
# 记录规则组
- name: node-recording
interval: 1m
rules:
- record: node:node_cpu_utilization:ratio
expr: (1 - avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])))
Prometheus 关联规则:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: k8s
spec:
ruleSelector:
matchLabels:
role: alert-rules
prometheus: k8s
规则热更新:
规则更新流程:
1. 创建/更新 PrometheusRule CR
2. Prometheus Operator 检测到变更
3. 动态重新加载 Prometheus 规则
4. 无需重启 Prometheus Server
验证规则加载:
kubectl get prometheusrule -n monitoring
kubectl exec prometheus-k8s-0 -- wget -qO- http://localhost:9090/api/v1/rules
6 AlertmanagerConfig CRD 的作用是什么?
答案:
AlertmanagerConfig CRD 允许在 K8s 中以声明式方式管理 Alertmanager 的路由、接收者和抑制规则。
AlertmanagerConfig 定义:
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: myapp-alerts
namespace: monitoring
labels:
alertmanagerConfig: myapp
spec:
route:
groupBy: ['namespace']
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receiver: 'slack-notifications'
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
receivers:
- name: 'slack-notifications'
slackConfigs:
- apiURL:
name: slack-webhook
key: url
channel: '#alerts'
title: 'XQOPEN template "slack.title" . XQCLOSE'
text: 'XQOPEN template "slack.text" . XQCLOSE'
- name: 'pagerduty-critical'
pagerDutyConfigs:
- routingKey:
name: pagerduty-key
key: routing_key
severity: critical
inhibitRules:
- sourceMatch:
- name: severity
value: critical
targetMatch:
- name: severity
value: warning
equal: ['namespace']
Alertmanager 关联:
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: main
spec:
alertmanagerConfigSelector:
matchLabels:
alertmanagerConfig: myapp
alertmanagerConfigNamespaceSelector:
matchNames:
- monitoring
支持的通知类型:
- Slack
- PagerDuty
- Email (SMTP)
- OpsGenie
- VictorOps
- WeChat
- Telegram
- Discord
- Webhook
- Pushover
- SNS
7 kube-prometheus-stack 的 Grafana Dashboard 如何管理?
答案:
kube-prometheus-stack 内置大量预配置的 Grafana Dashboard,同时支持通过 ConfigMap 添加自定义面板。
内置 Dashboard:
| Dashboard 名 | 用途 |
|---|---|
| Kubernetes / API Server | API Server 性能监控 |
| Kubernetes / Nodes | 节点资源监控 |
| Kubernetes / Pods | Pod 状态和资源监控 |
| Kubernetes / Deployments | Deployment 资源监控 |
| Kubernetes / StatefulSets | StatefulSet 监控 |
| Kubernetes / Kubelet | Kubelet 运行状态 |
| Kubernetes / Networking | 网络流量和策略 |
| Kubernetes / Persistent Volumes | 存储卷监控 |
| Node Exporter / Nodes | Node Exporter 全指标 |
| Node Exporter / USE Method | USE 方法论仪表盘 |
| Prometheus / Overview | Prometheus 自身性能 |
| Prometheus / Remote Write | Remote Write 状态 |
| Alertmanager / Overview | 告警处理状态 |
自定义 Dashboard:
apiVersion: v1
kind: ConfigMap
metadata:
name: custom-dashboard
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
custom-dashboard.json: |
{
"title": "Custom App Dashboard",
"panels": [...]
}
Dashboard Sidecar 配置:
grafana:
sidecar:
dashboards:
enabled: true
label: grafana_dashboard
labelValue: "1"
searchNamespace: ALL
folderAnnotation: grafana_folder
8 kube-prometheus-stack 的 Prometheus Adapter 的作用是什么?
答案:
Prometheus Adapter 将 Prometheus 指标暴露为 K8s Custom Metrics API,支持 HPA 和 Vertical Pod Autoscaler 基于自定义指标扩缩容。
架构:
```mermaid
graph TD
PD["Pod / Deployment"] -->|"/metrics"| PROM["Prometheus"]
PROM -->|"PromQL 查询"| ADAPTER["Prometheus Adapter"]
ADAPTER -->|"Custom Metrics API"| API["K8s API Server"]
API --> HPA["HPA (Horizontal Pod Autoscaler)"]
API --> VPA["VPA (Vertical Pod Autoscaler)"]
**配置示例:**
```yaml
# prometheus-adapter 配置
prometheus-adapter:
prometheus:
url: http://prometheus-operated:9090
port: 9090
rules:
default: false
custom:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
HPA 使用:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 100
9 kube-prometheus-stack 如何管理 Prometheus 实例的资源配置?
答案:
kube-prometheus-stack 通过 Prometheus CRD 的 spec 字段精细化控制 Prometheus 实例的资源分配和存储。
Prometheus CRD 资源配置:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: k8s
spec:
# 实例副本数
replicas: 2
# 容器资源
resources:
requests:
memory: 4Gi
cpu: 2
limits:
memory: 8Gi
# 存储
storage:
volumeClaimTemplate:
spec:
storageClassName: fast-ssd
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 100Gi
# TSDB 配置
retention: 15d
retentionSize: 80GB
walCompression: true
# 查询配置
query:
maxConcurrency: 50
maxSamples: 50000000
timeout: 2m
# 采集配置
scrapeInterval: 30s
scrapeTimeout: 10s
evaluationInterval: 30s
# 其他配置
externalLabels:
cluster: production-us-east
externalUrl: https://prometheus.example.com
# 规则选择
ruleSelector:
matchLabels:
role: alert-rules
ruleNamespaceSelector:
matchNames:
- monitoring
# ServiceMonitor 选择
serviceMonitorSelector:
matchLabels:
app: monitored
serviceMonitorNamespaceSelector: {}
资源估算:
| 集群规模 | Node 数 | Pod 数 | Prometheus 资源 | 存储 |
|---|---|---|---|---|
| 小型 | < 10 | < 500 | 2Core / 4GB | 50GB |
| 中型 | 10-50 | 500-2000 | 4Core / 8GB | 200GB |
| 大型 | 50-200 | 2000-10000 | 8Core / 16GB | 1TB |
| 超大型 | > 200 | > 10000 | 16Core / 32GB | 2TB+ |
10 kube-prometheus-stack 的 NetworkPolicy 如何配置?
答案:
kube-prometheus-stack 通过 NetworkPolicy 控制监控组件间的网络访问,保障安全隔离。
默认网络策略:
# 允许 Prometheus 访问所有 namespace 的 metrics 端点
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-prometheus-scraping
spec:
podSelector: {}
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
podSelector:
matchLabels:
app.kubernetes.io/name: prometheus
ports:
- port: 9090
- port: 10250 # kubelet
- port: 10255 # kubelet-readonly
组件间通信策略:
Prometheus → ServiceMonitor targets (all namespaces)
Prometheus → Alertmanager (monitoring)
Grafana → Prometheus (monitoring)
Grafana → Alertmanager (monitoring)
Alertmanager → Webhook/Slack (external)
建议策略:
1. monitoring namespace 内全通
2. 仅允许 Prometheus 出站到目标 namespace
3. 仅允许 Grafana 入站到监控系统
4. 阻止外部访问 Prometheus API
11 kube-prometheus-stack 如何集成 Kube State Metrics?
答案:
kube-prometheus-stack 默认内置 Kube State Metrics(KSM),用于采集 K8s 对象状态指标。
KSM 采集的指标:
| 资源对象 | 核心指标 | 用途 |
|---|---|---|
| Node | node_status, node_condition | 节点健康、资源容量 |
| Pod | pod_status, pod_container_* | Pod 运行状态、重启次数 |
| Deployment | deployment_* | 期望副本数、可用副本数 |
| StatefulSet | statefulset_* | 就绪副本数 |
| DaemonSet | daemonset_* | 期望/就绪/调度副本数 |
| Service | service_* | 服务数量 |
| Namespace | namespace_* | 命名空间状态 |
| PersistentVolume | pv_, pvc_ | 存储卷状态、容量 |
| Endpoint | endpoint_* | 端点状态 |
| HorizontalPodAutoscaler | hpa_* | HPA 当前/期望副本数 |
配置:
kube-state-metrics:
enabled: true
collectors:
- deployments
- pods
- nodes
- statefulsets
- daemonsets
- persistentvolumeclaims
- persistentvolumes
metricLabelsAllowlist:
- pods=[*]
- nodes=[*]
- deployments=[app,version]
namespace: monitoring
resources:
limits:
memory: 512Mi
cpu: 200m
关键告警规则:
# 基于 KSM 指标的告警
- alert: KubeDeploymentReplicasMismatch
expr: (kube_deployment_spec_replicas - kube_deployment_status_replicas_available) > 0
for: 10m
labels:
severity: warning
- alert: KubePodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[5m]) > 0
for: 5m
labels:
severity: critical
12 kube-prometheus-stack 的多集群监控方案是什么?
答案:
kube-prometheus-stack 支持多种多集群监控方案,覆盖联邦到全局视图的多种架构。
方案一:Prometheus 联邦
```mermaid
graph TD
A["集群 A: Prometheus-A"] -->|"/federate"| GLOBAL["全局 Prometheus (Global)"]
B["集群 B: Prometheus-B"] -->|"/federate"| GLOBAL
GLOBAL --> GRAFANA["Grafana (Global)"]
**方案二:Thanos/VM 全局聚合**
graph TD
A["集群 A: kube-prometheus-stack"] -->|"Remote Write"| GLOBAL["全局 Thanos / VictoriaMetrics"]
B["集群 B: kube-prometheus-stack"] -->|"Remote Write"| GLOBAL
GLOBAL --> GRAFANA["Grafana (Global)"]
**多集群配置:**
```yaml
# 集群 A
prometheus:
externalLabels:
cluster: cluster-a
environment: production
remoteWrite:
- url: http://thanos-receiver:19291/api/v1/receive
# 集群 B
prometheus:
externalLabels:
cluster: cluster-b
environment: production
remoteWrite:
- url: http://thanos-receiver:19291/api/v1/receive
Grafana 多集群统一视图:
grafana:
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Thanos
type: prometheus
url: http://thanos-query:9090
access: proxy
isDefault: true
13 kube-prometheus-stack 的监控性能指标有哪些?
答案:
kube-prometheus-stack 自带对 Prometheus 自身性能的监控,关键指标涵盖采集、存储和查询三个维度。
采集性能:
# 采集目标状态
prometheus_target_interval_length_seconds
# 采集失败率
rate(prometheus_target_scrapes_total{job="prometheus"}[5m])
# 采集延迟
prometheus_target_scrape_duration_seconds
# 采集样本数
rate(prometheus_target_scrapes_exceeded_sample_limit_total[5m])
存储性能:
# TSDB Head 序列数
prometheus_tsdb_head_series
# 块数量
prometheus_tsdb_blocks_loaded
# WAL 写入速率
rate(prometheus_tsdb_wal_written_bytes_total[5m])
# 存储大小
prometheus_tsdb_storage_blocks_bytes
查询性能:
# 查询延迟
prometheus_engine_query_duration_seconds
# 并发查询数
prometheus_engine_queries_concurrent_max
# 查询队列
prometheus_engine_query_queue_length
# 查询超时
rate(prometheus_engine_queries_failed_total[5m])
Operator 性能:
# Operator 队列
prometheus_operator_reconcile_errors_total
# Reconcile 延迟
prometheus_operator_reconcile_duration_seconds
# 协调次数
prometheus_operator_reconcile_operations_total
14 kube-prometheus-stack 如何配置 Grafana 的数据源?
答案:
kube-prometheus-stack 通过 Helm values 或 ConfigMap 预配置 Grafana 数据源。
默认数据源:
grafana:
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-operated:9090
access: proxy
isDefault: true
editable: false
- name: Alertmanager
type: alertmanager
url: http://alertmanager-operated:9093
access: proxy
isDefault: false
editable: false
添加额外数据源:
grafana:
additionalDataSources:
- name: Loki
type: loki
url: http://loki:3100
access: proxy
- name: Tempo
type: tempo
url: http://tempo:3200
access: proxy
- name: Jaeger
type: jaeger
url: http://jaeger-query:16686
access: proxy
通过 Secret 管理敏感信息:
apiVersion: v1
kind: Secret
metadata:
name: grafana-datasources
namespace: monitoring
stringData:
datasources.yaml: |
apiVersion: 1
datasources:
- name: CloudWatch
type: cloudwatch
jsonData:
authType: keys
defaultRegion: us-east-1
secureJsonData:
accessKey: <access-key>
secretKey: <secret-key>
15 kube-prometheus-stack 如何配置 Prometheus 的额外 Scrape Config?
答案:
对于 Prometheus Operator CRD 无法覆盖的采集场景,kube-prometheus-stack 支持通过额外 Scrape Config 实现。
配置方式:
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: 'kube-controller-manager'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_endpoint_port_name]
action: keep
regex: "http-metrics"
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: 'etcd'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /etc/kubernetes/pki/etcd/ca.crt
cert_file: /etc/kubernetes/pki/etcd/server.crt
key_file: /etc/kubernetes/pki/etcd/server.key
relabel_configs:
- source_labels: [__meta_kubernetes_endpoint_port_name]
action: keep
regex: "etcd"
通过 Secret 管理:
apiVersion: v1
kind: Secret
metadata:
name: additional-scrape-configs
namespace: monitoring
stringData:
prometheus-additional.yaml: |
- job_name: 'external-service'
static_configs:
- targets: ['external-service.example.com:9090']
basic_auth:
username: 'admin'
password: 'password'
prometheus:
prometheusSpec:
additionalScrapeConfigsSecret:
name: additional-scrape-configs
key: prometheus-additional.yaml
16 kube-prometheus-stack 的升级策略和注意事项是什么?
答案:
kube-prometheus-stack 升级涉及 CRD 变更、配置迁移和组件版本更新,需遵循特定流程。
升级流程:
1. 备份现有配置
helm get values prometheus-stack > backup.yaml
2. 更新 Helm Repo
helm repo update
3. 检查 CRD 变更
kubectl get crd | grep monitoring.coreos.com
4. 升级 CRD(部分版本需手动)
kubectl apply -f https://...
5. 升级 Chart
helm upgrade prometheus-stack prometheus-community/kube-prometheus-stack -f values.yaml
6. 验证升级
kubectl get pods -n monitoring
kubectl get prometheus -n monitoring
升级注意事项:
CRD 兼容性:
- 检查 API 版本变更(v1alpha1 → v1)
- 部分 CRD 需手动 apply
- Helm 不会动管理的 CRD
Grafana 版本:
- 注意 Grafana 大版本升级
- 插件兼容性
- Dashboard 索引变更
Prometheus 版本:
- PromQL 语法变更
- Remote Write 协议版本
- 告警规则兼容性
数据保留:
- 升级不丢失已有数据
- 但建议升级前做 snapshot 备份
- 检查 storage 配置是否正确
17 kube-prometheus-stack 的 Pod Security Policy 兼容性是什么?
答案:
kube-prometheus-stack 各组件需要不同的安全上下文运行,在 PSP/OOP 环境下需特别配置。
组件安全需求:
| 组件 | 安全上下文 | 说明 |
|---|---|---|
| Prometheus | runAsUser: 1000 | 存储卷写入 |
| Alertmanager | runAsUser: 1000 | 存储数据 |
| Grafana | runAsUser: 472 | Dashboard 持久化 |
| Node Exporter | hostPID, hostNetwork | 节点指标采集 |
| Kube State Metrics | 非 root | 只读 API 访问 |
| Prometheus Operator | 非 root | 创建 Pod |
Pod Security Admission 配置:
# monitoring namespace 标签
metadata:
name: monitoring
labels:
pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/audit: privileged
pod-security.kubernetes.io/warn: baseline
# 或使用 SecurityContextConstraint (OpenShift)
apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
name: prometheus-scc
allowHostPID: true
allowHostNetwork: true
18 kube-prometheus-stack 的 K8s 事件监控如何实现?
答案:
kube-prometheus-stack 通过 kube-state-metrics 和事件导出器监控 K8s 事件。
事件采集方案:
| 方案 | 工具 | 类型 | 说明 |
|---|---|---|---|
| 事件到 Metrics | kube-state-metrics | 指标 | 事件计数和状态 |
| 事件到 Logs | 事件导出器 | 日志 | 事件详情采集 |
| 事件到告警 | PrometheusRule | 告警 | 基于事件的告警 |
事件告警规则:
- alert: KubePodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[10m]) > 0
for: 5m
labels:
severity: critical
- alert: KubeNodeNotReady
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
for: 5m
labels:
severity: critical
- alert: KubePersistentVolumeFillingUp
expr: (kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes) < 0.1
for: 5m
labels:
severity: critical
事件导出器部署:
events-exporter:
enabled: true
config:
exporters:
- type: prometheus
routes:
- match:
- severity: Warning
type: BackOff
- severity: Warning
type: Failed
19 kube-prometheus-stack 如何配置告警通知模板?
答案:
kube-prometheus-stack 通过 Alertmanager 配置和模板系统实现告警通知的自定义格式化。
内置模板:
alertmanager:
config:
global:
slack_api_url: '<webhook>'
route:
receiver: 'default'
receivers:
- name: 'default'
slack_configs:
- channel: '#alerts'
title: 'XQOPEN template "slack.title" . XQCLOSE'
text: 'XQOPEN template "slack.text" . XQCLOSE'
templates:
- '/etc/alertmanager/config/template_*.tmpl'
自定义模板:
alertmanager:
templateFiles:
custom.tmpl: |
XQOPEN define "slack.title" XQCLOSE
[XQOPEN .Status | toUpper XQCLOSE] XQOPEN .GroupLabels.alertname XQCLOSE
XQOPEN end XQCLOSE
XQOPEN define "slack.text" XQCLOSE
*告警详情*
> 集群: XQOPEN .ExternalURL XQCLOSE
> 告警名: XQOPEN .GroupLabels.alertname XQCLOSE
> 严重级别: XQOPEN .CommonLabels.severity XQCLOSE
> 开始时间: XQOPEN .StartsAt.Format "2006-01-02 15:04:05" XQCLOSE
> 告警信息:
XQOPEN range .Alerts XQCLOSE
> XQOPEN .Annotations.summary XQCLOSE
> XQOPEN .Annotations.description XQCLOSE
XQOPEN end XQCLOSE
XQOPEN end XQCLOSE
XQOPEN define "email.subject" XQCLOSE
[XQOPEN .Status | toUpper XQCLOSE] XQOPEN .GroupLabels.alertname XQCLOSE - XQOPEN .GroupLabels.severity XQCLOSE
XQOPEN end XQCLOSE
模板变量:
| 变量 | 说明 |
|---|---|
| {{ .Status }} | firing / resolved |
| {{ .Alerts }} | 告警列表 |
| {{ .GroupLabels }} | 分组标签 |
| {{ .CommonLabels }} | 公共标签 |
| {{ .ExternalURL }} | Alertmanager 外部 URL |
| {{ .StartsAt }} | 开始时间 |
| {{ .EndsAt }} | 结束时间 |
| {{ .Annotations }} | 注释信息 |
| {{ .Labels }} | 标签信息 |
20 kube-prometheus-stack 的持久化存储配置是什么?
答案:
kube-prometheus-stack 各组件根据数据特性使用不同的存储配置。
组件存储需求:
| 组件 | 是否需要持久化 | 存储类型 | 说明 |
|---|---|---|---|
| Prometheus | 是 | PVC (SSD) | TSDB 数据,IOPS 敏感 |
| Alertmanager | 是 | PVC | 静默和通知状态 |
| Grafana | 推荐 | PVC | Dashboard 持久化 |
| Node Exporter | 否 | - | 无状态 |
| KSM | 否 | - | 无状态 |
Prometheus 存储配置:
prometheus:
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: premium-rwo
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi
# 存储容量限制
retention: 15d
retentionSize: 170GB
# WAL 配置
walCompression: true
Grafana 存储配置:
grafana:
persistence:
enabled: true
storageClassName: standard-rwo
accessModes: ["ReadWriteOnce"]
size: 10Gi
sidecar:
dashboards:
enabled: true
label: grafana_dashboard
Alertmanager 存储配置:
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: standard-rwo
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
21 kube-prometheus-stack 如何自定义 Prometheus 启动参数?
答案:
kube-prometheus-stack 通过 prometheusSpec 字段直接传递原生 Prometheus 启动参数。
配置方式:
prometheus:
prometheusSpec:
# 基础参数
retention: 30d
retentionSize: 100GB
scrapeInterval: 30s
evaluationInterval: 30s
# 外部标签
externalLabels:
cluster: production
region: us-east-1
# 远程写入
remoteWrite:
- url: http://victoriametrics:8428/api/v1/write
queueConfig:
capacity: 10000
maxSamplesPerSend: 1000
batchSendDeadline: 5s
# 远程读取
remoteRead:
- url: http://victoriametrics:8428/api/v1/read
# 查询参数
query:
maxConcurrency: 50
maxSamples: 50000000
timeout: 5m
# 内存限制
enableFeatures:
- memory-snapshot-on-shutdown
# TSDB 参数
tsdb:
outOfOrderTimeWindow: 30s
enableExemplarStorage: true
exemplarsRetention: 7d
通过 extraArgs 传递:
prometheus:
prometheusSpec:
additionalArgs:
- name: storage.tsdb.retention.size
value: "100GB"
- name: web.enable-lifecycle
value: "true"
- name: web.external-url
value: "https://prometheus.example.com"
22 kube-prometheus-stack 如何管理 Prometheus 的持久化 WAL?
答案:
kube-prometheus-stack 支持 WAL 持久化配置,确保 Prometheus 重启后数据不丢失。
WAL 配置:
prometheus:
prometheusSpec:
# WAL 压缩(默认开启)
walCompression: true
# 存储规范
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: ssd
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
# WAL 独立存储(可选项)
walVolumeClaimTemplate:
spec:
storageClassName: fast-ssd
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
WAL 监控:
# WAL 写入速率
rate(prometheus_tsdb_wal_written_bytes_total[5m])
# WAL 段数量
prometheus_tsdb_wal_segments_current
# WAL 截断时间
prometheus_tsdb_wal_truncate_duration_seconds
# WAL 损坏事件
prometheus_tsdb_wal_corruptions_total
WAL 恢复场景:
异常关闭 → 重启 Prometheus:
1. 扫描 WAL 目录
2. 回放未压缩的数据到 Head
3. 丢弃损坏的 WAL 段
4. 恢复内存中的时间序列
5. 正常开始采集
恢复性能:
1 小时 WAL 回放 ≈ 几分钟
依赖 WAL 大小和序列数
23 kube-prometheus-stack 如何监控 etcd?
答案:
kube-prometheus-stack 通过额外 Scrape Config 采集 etcd 指标,需要 etcd 证书认证。
etcd 指标端口:
etcd 默认 metrics 端口: 2381 (v3.5+)
etcd 安全 metrics 端口: 2382
采集配置:
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: 'etcd'
kubernetes_sd_configs:
- role: endpoint
scheme: https
tls_config:
ca_file: /etc/prometheus/secrets/etcd-certs/ca.crt
cert_file: /etc/prometheus/secrets/etcd-certs/server.crt
key_file: /etc/prometheus/secrets/etcd-certs/server.key
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_endpoint_port_name]
action: keep
regex: "etcd|etcd-metrics|metrics"
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
# 证书 Secret
apiVersion: v1
kind: Secret
metadata:
name: etcd-certs
namespace: monitoring
type: Opaque
stringData:
ca.crt: <etcd-ca-cert>
server.crt: <etcd-server-cert>
server.key: <etcd-server-key>
etcd 告警规则:
- alert: EtcdLeaderChanges
expr: rate(etcd_server_leader_changes_seen_total[5m]) > 0
for: 5m
labels:
severity: critical
- alert: EtcdHighFsyncDurations
expr: histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: critical
- alert: EtcdDbSizeHigh
expr: etcd_server_quota_backend_bytes / 1024 / 1024 > 1024
for: 5m
labels:
severity: warning
24 kube-prometheus-stack 如何自定义 Grafana 的配置文件?
答案:
kube-prometheus-stack 通过 Helm values 和 ConfigMap 自定义 Grafana 配置。
grafana.ini 配置:
grafana:
grafana.ini:
server:
root_url: https://grafana.example.com
domain: grafana.example.com
auth:
disable_login_form: false
auth.ldap:
enabled: true
config_file: /etc/grafana/ldap.toml
auth.generic_oauth:
enabled: true
client_id: grafana
client_secret: <secret>
auth_url: https://auth.example.com/oauth/authorize
token_url: https://auth.example.com/oauth/token
api_url: https://auth.example.com/api/userinfo
security:
admin_user: admin
admin_password: <strong-password>
smtp:
enabled: true
host: smtp.example.com:587
user: grafana@example.com
password: <password>
from_address: grafana@example.com
log:
mode: console
level: info
analytics:
reporting_enabled: false
LDAP 配置:
grafana:
ldap:
enabled: true
config: |
[[servers]]
host = "ldap.example.com"
port = 389
use_ssl = false
start_tls = true
bind_dn = "cn=admin,dc=example,dc=com"
bind_password = <password>
search_filter = "(sAMAccountName=%s)"
search_base_dns = ["dc=example,dc=com"]
插件配置:
grafana:
plugins:
- grafana-piechart-panel
- grafana-worldmap-panel
- grafana-clock-panel
- natel-discrete-panel
25 kube-prometheus-stack 如何配置 HPA 基于 Prometheus 指标?
答案:
kube-prometheus-stack 利用 Prometheus Adapter 将 Prometheus 指标暴露为 K8s Custom Metrics API,供 HPA 使用。
完整链路:
Pod → Prometheus → Prometheus Adapter → K8s API Server → HPA
Prometheus Adapter 配置:
prometheus-adapter:
enabled: true
prometheus:
url: http://prometheus-operated:9090
port: 9090
rules:
default: false
custom:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
- seriesQuery: 'redis_up{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
as: "redis_up"
metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'
HPA 配置:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 100
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
26 kube-prometheus-stack 如何实现零停机升级?
答案:
kube-prometheus-stack 通过多副本、滚动更新和数据持久化实现零停机升级。
零停机前提:
| 条件 | 配置 | 说明 |
|---|---|---|
| Prometheus 多副本 | replicas: 2+ | 一个升级,另一个继续服务 |
| 数据持久化 | storageSpec | 重启后数据不丢失 |
| Grafana 多副本 | replicas: 2+ | 需共享存储或外界数据库 |
| Alertmanager 集群 | cluster | 告警去重和状态同步 |
Prometheus 滚动更新:
prometheus:
prometheusSpec:
replicas: 2
# 更新策略
podMetadata:
annotations:
prometheus.io/should_be_updated: "true"
# 反亲和
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- prometheus
topologyKey: kubernetes.io/hostname
升级过程中:
1. prometheus-0 升级
- prometheus-1 正常服务
- 采集暂停(prometheus-0)
- 重启后从持久化存储恢复
2. prometheus-0 恢复
- prometheus-1 开始升级
- prometheus-0 正常服务
3. 验证
- 检查采集目标状态
- 确认告警规则加载
- 验证数据连续性
27 kube-prometheus-stack 如何监控集群证书过期?
答案:
kube-prometheus-stack 通过 blackbox-exporter 和证书指标采集,监控集群证书的有效期。
证书过期检测方案:
| 方案 | 采集工具 | 监控指标 |
|---|---|---|
| TLS 证书 | blackbox exporter | probe_ssl_earliest_cert_expiry |
| K8s 证书 | kube-state-metrics | kube_secret_metadata_* |
| 自定义 | textfile collector | 自定义脚本采集 |
Blackbox 证书监控:
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
name: tls-certificates
spec:
module: tls_connect
prober:
url: blackbox-exporter:9115
targets:
staticConfig:
static:
- https://api.example.com:6443
- https://grafana.example.com:443
- https://prometheus.example.com:9090
证书告警规则:
- alert: KubernetesCertificateExpirySoon
expr: avg by (endpoint) (probe_ssl_earliest_cert_expiry - time()) < 30 * 86400
for: 1h
labels:
severity: warning
annotations:
summary: "证书将在 XQOPEN $value | humanizeDuration XQCLOSE 后过期"
- alert: KubernetesCertificateExpiring
expr: avg by (endpoint) (probe_ssl_earliest_cert_expiry - time()) < 7 * 86400
for: 1h
labels:
severity: critical
annotations:
summary: "证书将在 XQOPEN $value | humanizeDuration XQCLOSE 后过期"
28 kube-prometheus-stack 如何配置告警抑制和静默?
答案:
kube-prometheus-stack 通过 Alertmanager 的抑制规则和静默管理减少告警风暴。
抑制规则配置:
alertmanager:
config:
inhibit_rules:
# 集群级故障抑制节点级告警
- source_match:
severity: "critical"
alertname: "KubeNodeNotReady"
target_match:
severity: "warning"
equal: ["cluster"]
# 节点故障抑制 Pod 级告警
- source_match:
alertname: "KubeNodeNotReady"
target_match:
alertname: "KubePodNotReady"
equal: ["node"]
# 高严重度抑制低严重度
- source_match:
severity: "critical"
target_match:
severity: "info"
equal: ["namespace", "cluster"]
# 通过 AlertmanagerConfig CRD 配置
alertmanagerSpec:
alertmanagerConfigSelector:
matchLabels:
app: myapp
抑制规则逻辑:
如果满足 source_match 的告警存在,
并且 target_match 的告警与 source 的 equal 标签值相同,
则 target 告警被抑制(不发送通知)。
静默管理:
# 创建静默(2 小时)
amtool silence add \
--alertmanager.url=http://alertmanager:9093 \
--author="admin" \
--comment="维护窗口" \
--duration=2h \
alertname="NodeCPUUsageHigh"
# 过期后自动删除静默
# 查看活跃静默
amtool silence query --alertmanager.url=http://alertmanager:9093
29 kube-prometheus-stack 如何实现 RBAC 权限隔离?
答案:
kube-prometheus-stack 通过 K8s RBAC 控制不同团队对监控数据的访问权限。
RBAC 模型:
ClusterRole: prometheus-viewer
- get /api/v1/query
- list /api/v1/targets
- 只读权限
ClusterRole: prometheus-admin
- POST /api/v1/admin/tsdb/snapshot
- POST /api/v1/admin/tsdb/delete_series
- 管理权限
只读角色:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-viewer
rules:
- nonResourceURLs:
- /api/v1/query
- /api/v1/query_range
- /api/v1/labels
- /api/v1/series
- /api/v1/targets
- /api/v1/rules
- /api/v1/alerts
verbs:
- get
- list
Grafana 权限隔离:
grafana:
grafana.ini:
auth.proxy:
enabled: true
header_name: X-WEBAUTH-USER
header_property: username
sync_ttl: 60m
auth:
oauth_auto_login: true
# LDAP 组织角色映射
ldap:
config: |
[[servers]]
...
[[servers.group_mappings]]
group_dn = "cn=devops,ou=groups,dc=example,dc=com"
org_role = "Admin"
[[servers.group_mappings]]
group_dn = "cn=viewer,ou=groups,dc=example,dc=com"
org_role = "Viewer"
Namespace 级别访问:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: namespace-monitor
namespace: myapp
rules:
- apiGroups: [""]
resources: ["services", "pods", "endpoints"]
verbs: ["get", "list", "watch"]
- apiGroups: ["monitoring.coreos.com"]
resources: ["servicemonitors", "podmonitors"]
verbs: ["get", "list"]
30 kube-prometheus-stack 常见的故障排查方法是什么?
答案:
kube-prometheus-stack 的故障排查从 Operator 状态、采集目标、存储和查询四个维度展开。
排查流程:
1. 检查 Operator 状态
kubectl get pods -n monitoring | grep operator
kubectl logs -n monitoring prometheus-operator-xxx
2. 检查 Prometheus 实例
kubectl get prometheus -n monitoring
kubectl describe prometheus k8s -n monitoring
3. 检查采集目标
kubectl port-forward -n monitoring prometheus-k8s-0 9090
# 访问 /targets 查看采集状态
4. 检查规则加载
# 访问 /rules
# 访问 /api/v1/rules
5. 检查 Alertmanager
kubectl get alertmanager -n monitoring
kubectl port-forward -n monitoring alertmanager-main-0 9093
# 访问 /#/status
常见问题:
| 症状 | 可能原因 | 排查方法 |
|---|---|---|
| 采集目标 UP==0 | ServiceMonitor 标签不匹配 | kubectl describe servicemonitor |
| 指标数据缺失 | relabel 过滤了指标 | 检查 metric_relabel_configs |
| 告警未触发 | 规则未加载 | /api/v1/rules 检查 |
| Grafana 无数据 | 数据源配置错误 | 检查 Grafana datasource |
| Prometheus OOM | 高基数 | 检查 prometheus_tsdb_head_series |
| Adatper 无响应 | PromQL 查询超时 | 检查 adapter 日志 |
诊断命令:
# 查看所有 CRD 实例
kubectl get prometheus,alertmanager,servicemonitor,podmonitor,prometheusrule,probe -A
# 查看 ServiceMonitor 转换的目标
kubectl describe servicemonitor -n monitoring myapp
# 测试规则加载
kubectl exec -n monitoring prometheus-k8s-0 -- wget -qO- http://localhost:9090/api/v1/rules
# 检查告警状态
kubectl exec -n monitoring alertmanager-main-0 -- amtool alert