Elasticsearch 分布式搜索引擎

22 道题

分类: 中间件
题目数: 22 道

已阅读 0 / 22 题

1 Elasticsearch 的核心架构（Cluster / Node / Index / Shard / Replica）

答案：

Elasticsearch 是一个分布式全文搜索引擎，其核心架构由 Cluster、Node、Index、Shard、Replica 五层组成，通过主从分片模型实现数据水平扩展与高可用。

[分层展开]

Cluster（集群）：由一个或多个 Node 组成，通过 cluster.name 标识，持有完整数据并提供联合索引与搜索能力。集群内通过 Zen Discovery 或 Coordinator 选举 Master 节点。
Node（节点）：集群中的单个 ES 实例，承载数据、处理请求或协调路由。节点可按角色划分为 Master-eligible、Data、Ingest、Coordinating、ML、Transform 等。
Index（索引）：逻辑命名空间，面向一组 JSON 文档的集合，映射为多个物理 Shard。每个 Index 拥有独立的 Mapping 定义字段类型与分词规则、Settings 配置分片数与副本数。
Shard（主分片）：Index 的物理存储单元，承载索引数据的子集。每个 Shard 是一个独立的 Lucene 实例，负责倒排索引构建与查询执行。主分片数量在 Index 创建时确定。
Replica（副本分片）：主分片的完整拷贝，提供数据冗余与查询负载分担。副本分片与主分片不能分配在同一节点上。

# Elasticsearch 索引创建 — 指定分片与副本
PUT /logs-2026.05
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 2
  },
  "mappings": {
    "properties": {
      "timestamp": { "type": "date" },
      "message":    { "type": "text" },
      "level":      { "type": "keyword" }
    }
  }
}

概念	职责	数量约束	变更策略
Primary Shard	写入入口、存储数据分片	创建后不可变更	需 Reindex 重建
Replica Shard	数据冗余、查询负载均衡	可动态增减	`PUT /index/_settings` 即时生效
Index	逻辑数据容器	无硬性上限	受集群容量约束
Node	计算与存储载体	受集群规模限制	可动态加入/退出

2 Elasticsearch 的分片（Shard）分配与路由算法

答案：

Elasticsearch 使用 _routing 字段结合 hash(_routing) % num_primary_shards 路由算法将文档分配到具体主分片，再由 Master 节点依据分配策略将分片部署到 Data Node。

[分层展开]

路由公式：shard_num = hash(_routing) % num_primary_shards。默认以文档 _id 为 _routing 值；用户可指定自定义路由字段（如 user_id）将同一用户数据聚合同一分片。
路由不可变性：num_primary_shards 作为除数参与取模运算，因此主分片数量在索引创建后不可变更，否则路由结果错乱。
分片分配策略：Master 节点负责全局分片分配，遵循 Disk Threshold、Shard Allocation Awareness、Forced Awareness、Filtering 等策略。
分片均衡：集群通过 cluster.routing.allocation.balance 参数控制同一节点上不同类型分片的权重，自动触发 Rebalance 使分片分布趋于均衡。
自定义路由：高基数字段（如 user_id）可确保数据均匀分散；低基数字段可能导致数据倾斜。

// 自定义路由 — 写入时指定 routing 参数
POST /orders/_doc?routing=user_42
{
  "order_id": "ORD-2026-0001",
  "user_id": "user_42",
  "amount": 299.99
}

// 查询时需携带相同 routing，否则遍历全部分片
GET /orders/_search?routing=user_42
{
  "query": { "term": { "user_id": "user_42" } }
}

策略	作用	典型配置
Disk Threshold	防止磁盘满溢	`cluster.routing.allocation.disk.watermark.low: 85%`
Shard Awareness	机架/可用区感知	`cluster.routing.allocation.awareness.attributes: zone`
Forced Awareness	强制跨 Zone 分布	`cluster.routing.allocation.awareness.force.zone.values: zone-a,zone-b`
Filtering	按节点属性排他分配	`index.routing.allocation.include.rack: rack1`

3 Elasticsearch 的倒排索引原理

答案：

倒排索引是 Elasticsearch 全文搜索的核心数据结构，它将文档中出现的每个词条映射到包含该词条的文档列表及位置信息，实现从词条到文档的快速检索。

[分层展开]

正排索引 vs 倒排索引：正排索引按文档 ID 组织字段值（文档 -> 词条），适合精确键值查找；倒排索引按词条组织文档 ID（词条 -> 文档），适合关键词搜索。
倒排索引结构：由 Term Dictionary（词条字典）、Term Index（词条索引，基于 FST 压缩前缀树）和 Posting List（倒排列表，存储文档 ID、词频 TF、位置偏移量、Payload）三层组成。
FST（Finite State Transducer）：Term Index 的核心数据结构，将词条前缀映射到 Term Dictionary 的 Block 位置，内存占用远小于 HashMap，支持高效前缀与范围查询。
Segment 不可变性：Lucene 将倒排索引存储在 Segment 中，Segment 一旦写入不再修改；新增文档写入新 Segment，删除文档通过 .del 标记实现逻辑删除。
Segment 合并：后台 Merge 线程持续将小 Segment 合并为大 Segment，减少文件句柄与搜索遍历开销。
Skip List：Posting List 内部使用跳表结构加速多词条交并集运算。

倒排索引构建示例：

文档 1: "Elasticsearch is a search engine"
文档 2: "Apache Lucene powers Elasticsearch"
文档 3: "search engine based on Lucene"

倒排索引：
Term            | (DocID, TF, Positions)
-----------------------------------------
Elasticsearch   | (1,1,[0]), (2,1,[2])
Apache          | (2,1,[0])
Lucene          | (2,1,[1]), (3,1,[3])
search          | (1,1,[3]), (3,1,[0])
engine          | (1,1,[4]), (3,1,[1])

组件	作用	存储位置
Term Dictionary	所有词条排序列表	磁盘
Term Index	词条前缀 -> Block 偏移	内存（FST）
Posting List	词条 -> 文档 ID 映射	磁盘（Skip List 加速）
Doc Values	文档 -> 字段值（列式存储）	磁盘（排序/聚合加速）

4 Elasticsearch 的写入流程（Refresh / Flush / Translog）

答案：

Elasticsearch 写入数据经历 Ingest Pipeline 预处理、Routing 定位分片、写入 Memory Buffer 与 Translog、Refresh 可见、Flush 持久化五个阶段，通过 Translog 保障写入可靠性。

[分层展开]

阶段一 — 写入请求路由：Coordinating Node 根据 _routing 计算目标 Shard，将写请求转发至 Primary Shard 所在节点。
阶段二 — 并行写入：Primary Shard 收到请求后，依次完成 Ingest Pipeline 处理（如果有）、校验 Mapping 字段类型、写入 Memory Buffer 与 Translog（WAL），随后将请求并行转发给所有 In-Sync Replica。
阶段三 — Refresh（近实时可见）：默认每 1 秒（refresh_interval）将 Memory Buffer 中的文档刷新为新 Segment 并打开，使文档可被搜索。此为 Near Real-Time（NRT）机制的核心。
阶段四 — Flush（持久化）：当 Translog 大小达到 flush_threshold_size（默认 512MB）或间隔 30 分钟时，执行 Full Commit：将 Memory Buffer 全部数据写入 Segment 并 fsync 到磁盘，清空 Translog。
阶段五 — Translog 恢复：节点重启时，从最后一次 Flush 后的 Translog 回放未持久化数据，防止数据丢失。

写入流程时序：
Client -> Coordinating Node -> Primary Shard -> [Ingest Pipeline]
                                              -> Memory Buffer + Translog
                                              -> Parallel Replicate to Replicas
                                              -> (Refresh) New Segment (1s)
                                              -> (Flush) fsync + Clear Translog

# 写性能相关索引设置
PUT /my-index/_settings
{
  "index": {
    "refresh_interval": "30s",          # 降低 Refresh 频率提升写入吞吐
    "translog.durability": "async",     # 异步刷盘，提升性能但存在丢失风险
    "translog.sync_interval": "5s",     # 异步模式下 Translog 同步间隔
    "translog.flush_threshold_size": "1gb"
  }
}

操作	触发条件	对可见性的影响	持久化级别
Refresh	1 秒间隔（默认）	数据变为可搜索	仅写入 Page Cache
Flush	Translog 512MB / 30 min	无影响	fsync 落盘
Translog Sync	每个请求（request 模式）	无影响	fsync 到磁盘

5 Elasticsearch 的读取流程（Query / Fetch / Scroll / Search After）

答案：

Elasticsearch 读取流程分为 Query Then Fetch 两阶段（查询阶段收集文档 ID 与分数、Fetch 阶段获取完整文档），并支持 Scroll 与 Search After 两种深度分页方式。

[分层展开]

Query 阶段：Coordinating Node 将查询请求广播到 Index 的每个 Shard（Primary 或 Replica），各 Shard 在本地执行查询并返回 from + size 个文档的 ID + 排序值。Coordinating Node 合并排序后确定全局 Top N 文档 ID 列表。
Fetch 阶段：Coordinating Node 根据文档 ID 向对应 Shard 发起 Multi-Get 请求，获取完整 _source 内容并返回客户端。
Scroll：生成搜索上下文快照（Snapshot），后续滚动请求均基于此快照执行，不受后续索引变更影响。适用于全量数据导出或批量处理，但占用 Search Context 资源，用完需清理。
Search After：基于上一次命中的排序值游标方式，规避 Scroll 的上下文开销，支持实时搜索下的深度翻页。要求排序字段值唯一（建议追加 _id）。
Search Sliced Scroll：将 Scroll 拆分为多个并行 Slices，提升大批量数据导出性能。

// Query 阶段 — 分片级别的查询请求
GET /orders/_search
{
  "query": { "match": { "status": "shipped" } },
  "sort": [{ "order_date": "desc" }, { "_id": "asc" }],
  "size": 100
}

// Search After — 深度分页
GET /orders/_search
{
  "size": 100,
  "query": { "match": { "status": "shipped" } },
  "sort": [
    { "order_date": "desc" },
    { "_id": "asc" }
  ],
  "search_after": ["2026-01-15T10:30:00", "order_12345"]
}

// Scroll — 全量导出
POST /orders/_search?scroll=5m
{
  "size": 1000,
  "query": { "match_all": {} }
}

// Sliced Scroll — 并行导出
GET /orders/_search?scroll=5m
{
  "slice": { "id": 0, "max": 4 },
  "size": 1000,
  "query": { "match_all": {} }
}

分页方式	原理	适用场景	限制
from/size	合并排序后截取	浅翻页（< 10000）	`max_result_window` 默认 10000
Scroll	快照上下文	全量导出、批量重索引	占用堆内存，不支持实时结果
Search After	排序值游标	深度分页、实时搜索	需唯一排序字段，不能跳页
PIT + Search After	Point in Time 快照 + 游标	ES 7.10+ 推荐方案	PIT 上下文需手动释放

6 ES 的 Master 节点选举与 Zen Discovery

答案：

Elasticsearch 集群通过 Zen Discovery（7.x）或 Coordinator 选举（8.x+）选出单一 Master 节点，负责集群状态变更、分片分配与索引创建，选举过程依赖法定人数（Quorum）保障脑裂避免。

[分层展开]

Master 选举机制（7.x ZenDiscovery）：基于 Bully 算法变体，所有 Master-eligible 节点参与选举，ID 最小的节点成为 Master。选举依赖 discovery.zen.minimum_master_nodes 设置：（master_eligible_nodes / 2 + 1）的法定人数。
Master 选举机制（8.x Coordinator）：引入基于 Raft 一致性协议的 Coordinator 选举，替代 ZenDiscovery。不再需要手动配置 minimum_master_nodes，系统自动维护 Quorum。
脑裂（Split Brain）：网络分区导致集群出现多个 Master，写入冲突。ZenDiscovery 通过 minimum_master_nodes 避免；8.x Coordinator 通过 Raft Term 自动处理。
Voting-only 节点：8.x 支持 Voting-only Master-eligible 节点，参与选举投票但不成为 Master，减少选举开销。
故障转移：Master 宕机后，其余 Master-eligible 节点在 discovery.zen.fd.ping_timeout 超时后发起重新选举。

# ES 7.x — Zen Discovery 配置
discovery.zen.minimum_master_nodes: 2    # 3 个 Master 节点的 Quorum
discovery.zen.ping.unicast.hosts:
  - es-master-0.es-master-headless
  - es-master-1.es-master-headless
  - es-master-2.es-master-headless

# ES 8.x — 无需 minimum_master_nodes
# ECK 自动配置 voting 节点
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
spec:
  version: 8.17.0
  nodeSets:
    - name: master
      count: 3
      config:
        node.roles: ["master"]

版本	选举协议	Quorum 配置	脑裂防护
7.x	Zen Discovery (Bully)	手动 `minimum_master_nodes`	配置值 < Quorum 则拒绝选举
8.x	Coordinator (Raft)	自动维护	Raft Term 机制自动回避
通用原则	—	`N/2 + 1`	Master-eligible 节点数建议 3

7 ES 的 Cluster Health（Green / Yellow / Red）

答案：

Cluster Health 是 Elasticsearch 集群的运行状态指标，分为 Green、Yellow、Red 三级，依据所有 Index 的 Primary Shard 与 Replica Shard 分配状态综合判定。

[分层展开]

Green：所有 Index 的 Primary Shard 与 Replica Shard 均已分配到健康节点，集群完全正常。
Yellow：所有 Primary Shard 已分配，但至少一个 Replica Shard 未分配。数据完整性未受损，但部分索引处于单副本状态，存在数据丢失风险。常见于单节点集群、正在初始化的 Replica 或在缩容节点未完成数据迁移的场景。
Red：至少一个 Primary Shard 未分配，部分索引数据不可用，读写操作报错。常见于节点宕机且 Primary Shard 无 Replica 或 Replica 也未分配。
Health API：GET /_cluster/health 返回 status、number_of_nodes、active_primary_shards、unassigned_shards 等字段，支持 level=indices / level=shards 下钻。

# 查看集群健康状态
GET /_cluster/health?level=shards
{
  "cluster_name": "production",
  "status": "yellow",
  "timed_out": false,
  "number_of_nodes": 6,
  "active_primary_shards": 128,
  "active_shards": 250,
  "unassigned_shards": 6,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0
}

# 查看 Red 分片原因
GET /_cluster/allocation/explain
{
  "index": "logs-2026.05.15",
  "shard": 2,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "NODE_LEFT",
    "details": "node [node-3] left the cluster"
  },
  "can_allocate": "no",
  "allocate_explanation": "no valid shard copies exist"
}

状态	Primary Shard	Replica Shard	数据完整性	读写影响
Green	全部分配	全部分配	完整	正常
Yellow	全部分配	部分未分配	完整	正常（写可能阻塞等待 Replica）
Red	至少一个未分配	—	部分不可用	对应索引读写异常

8 ES 的分片分配感知（Shard Allocation Awareness / Rack Awareness）

答案：

分片分配感知通过 cluster.routing.allocation.awareness 配置项，使 Elasticsearch 在分配 Primary 与 Replica Shard 时感知节点物理位置（可用区、机架、数据中心），确保副本位于不同故障域。

[分层展开]

节点属性标记：通过 node.attr.zone、node.attr.rack、node.attr.datacenter 等属性标记节点物理位置。
Awareness 策略：配置 cluster.routing.allocation.awareness.attributes: zone 后，ES 确保同一 Index 的 Primary 与 Replica Shard 不在同一 Zone。
Forced Awareness：配置 cluster.routing.allocation.awareness.force.zone.values: zone-a,zone-b 后，ES 强制在所有指定 Zone 中都至少分配一个副本，否则拒绝分配。
Azure / AWS / GCP 适配：通过 Cloud Plugin 或 Infrastructure Discovery Plugin 自动发现 Zone 信息。
Kubernetes 集成：ECK 通过 Pod topology.kubernetes.io/zone Label 自动设置 node.attr.zone。

# ECK — 分片分配感知配置
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
spec:
  version: 8.17.0
  nodeSets:
    - name: data
      count: 6
      config:
        cluster.routing.allocation.awareness.attributes: zone
        cluster.routing.allocation.awareness.force.zone.values: ap-southeast-1a,ap-southeast-1b,ap-southeast-1c
      podTemplate:
        spec:
          topologySpreadConstraints:
            - maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: DoNotSchedule
              labelSelector:
                matchLabels:
                  elasticsearch.k8s.elastic.co/cluster-name: production

# 验证分片分布
GET /_cat/shards?v&h=index,shard,prirep,state,node,unassigned.reason

配置级别	效果	适用场景
Awareness	Primary 与 Replica 不在同 Zone	单地域多 AZ 部署
Forced Awareness	每个 Zone 必须分配副本	严格容灾要求
Filtering (include/exclude)	排除或限定节点分配	节点维护 / 下线

9 ES 的快照与恢复（Snapshot to S3 / GCS）

答案：

Elasticsearch 使用 Snapshot / Restore API 将索引数据备份至对象存储（S3、GCS、Azure Blob、HDFS）或共享文件系统，通过增量快照与去重机制降低备份存储开销。

[分层展开]

Snapshot Repository：先注册 Repository（存储目标），支持 S3（repository-s3）、GCS（repository-gcs）、Azure（repository-azure）、HDFS 与 FS 五种后端。
增量快照：每次 Snapshot 仅复制上次 Snapshot 后变更的 Segment 文件，未变更数据以引用方式指向已有 Snapshot 中的文件。
快照一致性：Snapshot 基于 Cluster State 版本号执行，确保包含快照时刻所有已提交文档。
部分恢复：支持按 Index 粒度恢复，支持 rename_pattern、rename_replacement 重命名恢复后的索引。
Searchable Snapshot：ES 7.10+ 支持 Searchable Snapshot，将 Snapshot 直接挂载为可搜索索引，数据按需从对象存储加载，节省本地存储。

// 注册 S3 Repository
PUT _snapshot/s3-backup
{
  "type": "s3",
  "settings": {
    "bucket": "es-snapshots-production",
    "region": "ap-southeast-1",
    "base_path": "production-cluster",
    "max_snapshot_bytes_per_sec": "100mb",
    "max_restore_bytes_per_sec": "100mb"
  }
}

// 创建快照
PUT _snapshot/s3-backup/snapshot-2026-05-26
{
  "indices": "logs-*,metrics-*",
  "ignore_unavailable": true,
  "include_global_state": false,
  "partial": false
}

// 恢复快照
POST _snapshot/s3-backup/snapshot-2026-05-26/_restore
{
  "indices": "logs-2026.05.*",
  "ignore_unavailable": true,
  "rename_pattern": "logs-(.+)",
  "rename_replacement": "restored-logs-$1"
}

# ECK — 注册 S3 Snapshot Repository
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: production
spec:
  version: 8.17.0
  secureSettings:
    - secretName: s3-credentials
  podTemplate:
    spec:
      containers:
        - name: elasticsearch
          env:
            - name: S3_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: s3-credentials
                  key: access_key
            - name: S3_SECRET_KEY
              valueFrom:
                secretKeyRef:
                  name: s3-credentials
                  key: secret_key
# 注册 Repository（通过 Kibana DevTools 或 Curator 调用）

后端	Plugin	特性
S3	`repository-s3`	IAM Role / Access Key 认证，SSE 加密
GCS	`repository-gcs`	Service Account 认证
Azure Blob	`repository-azure`	Shared Key / SAS Token
FS	`repository-fs`	NFS / 共享文件系统（多节点需共享挂载）
Searchable Snapshot	`x-pack`	远程挂载，按需加载，冷数据分层

10 ES 的 X-Pack 安全（TLS / RBAC / Audit / FIPS）

答案：

X-Pack Security 是 Elastic 的商用安全套件（Basic 许可免费），提供传输层加密（TLS）、基于角色的访问控制（RBAC）、审计日志（Audit）、字段级加密与 FIPS 140-2 合规等安全能力。

[分层展开]

传输层加密（TLS）：支持 HTTP 层（9200）与 Transport 层（9300）双向 mTLS 加密。ECK 自动签发、轮换集群内证书，支持 cert-manager 集成使用外部 CA。
RBAC：基于 Role + User + Privilege 模型，Role 可绑定 Cluster Privilege（集群级权限，如 manage_index_templates）与 Index Privilege（索引级权限，如 read、write、create_index）。支持 Document-level Security（DLS）与 Field-level Security（FLS）。
Audit 日志：记录认证、授权、拒绝访问事件，输出至 ES 索引或本地文件。
API Key / Service Account：Service Account Token 用于 ECK 组件间认证（Kibana -> ES、APM -> ES）；API Key 用于应用端免密码认证。
FIPS 140-2：8.x 支持 FIPS 140-2 合规模式，强制使用 FIPS 认证的加密算法与库。

# ECK — RBAC Role 定义
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: production
spec:
  version: 8.17.0
  auth:
    roles:
      - secretName: es-roles
# Secret 内容 — 自定义 Role
apiVersion: v1
kind: Secret
metadata:
  name: es-roles
stringData:
  log-reader-role.yml: |
    log_reader:
      cluster: ["monitor"]
      indices:
        - names: ["logs-*"]
          privileges: ["read", "view_index_metadata"]
        - names: ["metrics-*"]
          privileges: ["read"]
      applications:
        - application: "kibana-.kibana"
          privileges: ["feature_discover.read"]
          resources: ["space:default"]
  admin-role.yml: |
    admin:
      cluster: ["all"]
      indices:
        - names: ["*"]
          privileges: ["all"]
      applications:
        - application: "kibana-*"
          privileges: ["*"]
          resources: ["*"]

// 通过 API 创建用户并绑定角色
POST /_security/user/logstash_writer
{
  "password": "secure_password",
  "roles": ["log_writer"],
  "full_name": "Logstash Writer"
}

// Document-Level Security — 按角色过滤文档
POST /_security/role/user_read_own
{
  "indices": [
    {
      "names": ["orders-*"],
      "privileges": ["read"],
      "query": {
        "template": {
          "source": "{\"term\": {\"user_id\": \"{{_user.username}}\"}}"
        }
      }
    }
  ]
}

// Field-Level Security — 按角色过滤字段
POST /_security/role/log_reader_no_pii
{
  "indices": [
    {
      "names": ["orders-*"],
      "privileges": ["read"],
      "field_security": {
        "grant": ["order_id", "status", "amount", "created_at"],
        "except": ["credit_card", "ssn"]
      }
    }
  ]
}

安全层	配置对象	粒度
传输加密	`spec.transport.tls` / `spec.http.tls`	集群级
用户认证	Native Realm / LDAP / SAML / OIDC	用户级
角色授权	Role + Privilege	资源级（Cluster / Index / Application）
数据级安全	DLS / FLS	文档级 / 字段级
审计追踪	Audit Log	操作级

11 ES 的搜索性能优化（查询缓存 / 字段映射 / 索引设计）

答案：

Elasticsearch 搜索性能优化遵循减少扫描数据量、利用缓存、优化 Mapping 与调整查询模式四条主线，核心手段包括合理 Mapping 设计、查询缓存配置、索引分片规划与查询重构。

[分层展开]

Mapping 优化：
- 禁用不必要的 _source（不推荐全局禁用，可按需排除字段）。
- 对不需要全文搜索的字段使用 keyword 类型替代 text，避免分词开销。
- 对不需要排序/聚合的字段设置 doc_values: false，减少列式存储占用。
- 使用 index: false 禁用特定字段的索引。
- 选择合适的 index_options 级别（docs / freqs / positions / offsets）。
查询缓存：
- Node Query Cache：缓存 Filter 查询结果（LRU），由 indices.queries.cache.size 控制（默认 JVM Heap 10%）。
- Shard Request Cache：缓存 size=0 的聚合结果，由 indices.requests.cache.size 控制。
- Field Data Cache：内存中词条 -> 文档的映射，用于排序与聚合（text 字段需开启 fielddata，生产建议关闭）。
分片规划：单分片大小建议 10-50GB；分片数过多增加查询协调开销，过少限制并行度。
查询优化：
- 优先使用 filter 上下文（可缓存）而非 query 上下文。
- 使用 search_as_you_type 替代 match_phrase_prefix 提升前缀搜索性能。
- 使用 Routing 减少搜索分片范围。
- 避免前置通配符查询（*term）。
- 使用 preference 参数路由到特定分片副本实现缓存命中。

// Mapping 优化示例
PUT /optimized_orders
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1,
    "refresh_interval": "30s"
  },
  "mappings": {
    "_source": {
      "excludes": ["large_raw_payload"]
    },
    "properties": {
      "order_id": {
        "type": "keyword",
        "doc_values": true
      },
      "status": {
        "type": "keyword"
      },
      "description": {
        "type": "text",
        "index_options": "freqs",
        "norms": false
      },
      "amount": {
        "type": "scaled_float",
        "scaling_factor": 100
      },
      "tags": {
        "type": "keyword",
        "index": false
      }
    }
  }
}

// 查询优化 — filter 上下文优先
GET /orders/_search
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "status": "shipped" } },
        { "range": { "amount": { "gte": 100, "lte": 500 } } }
      ],
      "must": [
        { "match": { "description": "express delivery" } }
      ]
    }
  }
}

优化手段	作用对象	预期效果	注意事项
Filter Context	查询	命中缓存，避免评分	仅适用于无需相关性排序的场景
Keyword 替代 Text	Mapping	避免分词，精准匹配	无法用于全文搜索
doc_values: false	Mapping	减少磁盘空间与排序开销	该字段无法排序/聚合
Routing	查询	减少 Shard 扫描量	路由值选择需保证均匀分布
Force Merge	索引	减少 Segment 数量	合并期间消耗 I/O

12 ES 的写入性能优化（Bulk API / 批量大小 / 刷新间隔）

答案：

Elasticsearch 写入性能优化围绕减少 I/O 频率、增大批处理规模、降低 Refresh 频率与优化 Translog 策略四条主线，核心手段包括 Bulk API 批量写入、调整 Refresh Interval 与 Translog 刷盘策略。

[分层展开]

Bulk API：将多条 Index / Update / Delete 操作打包为单个 HTTP 请求，减少网络往返（RTT），ES 内部并行处理每个子请求。
批量大小选择：单次 Bulk 请求建议 5-15MB 或 500-1500 条文档。过大导致 ES 请求队列积压与 GC 压力；过小增加客户端往返开销。
Refresh Interval：默认 1 秒一次 Refresh 将 Memory Buffer 写入新 Segment。写入密集型场景可调整至 30s 或 -1（禁用），以提升吞吐。
Translog 配置：translog.durability 设为 async 后异步刷盘；translog.sync_interval 控制异步刷盘间隔。风险：节点宕机丢失未刷盘数据。
索引设计优化：减少 Replica 数量（写入完成后动态增加）；使用自动生成的 _id（避免 _id 去重查询）；关闭 _all、_source 非必要字段。
硬件与节点规划：使用 SSD / NVMe 存储；CPU / 内存充足的数据节点；将 Ingest Pipeline 处理移至专用 Ingest 节点。

// Bulk API 写入
POST _bulk
{ "index": { "_index": "orders", "_id": "1" } }
{ "user_id": "42", "amount": 99.99, "status": "pending" }
{ "index": { "_index": "orders", "_id": "2" } }
{ "user_id": "43", "amount": 199.99, "status": "shipped" }
{ "update": { "_index": "orders", "_id": "1" } }
{ "doc": { "status": "cancelled" } }
{ "delete": { "_index": "orders", "_id": "2" } }

// 写入优化索引设置
PUT /high-throughput-logs/_settings
{
  "index": {
    "refresh_interval": "30s",
    "number_of_replicas": 0,               # 写入时 0 副本
    "translog.durability": "async",
    "translog.sync_interval": "10s",
    "translog.flush_threshold_size": "2gb"
  }
}

// 写入完成后增加 Replica
PUT /high-throughput-logs/_settings
{
  "index": {
    "number_of_replicas": 1
  }
}

# 客户端 Bulk 写入最佳参数
# Node.js Elasticsearch Client 示例
const result = await client.helpers.bulk({
  datasource: docs,          // 文档生成器
  onDocument(doc) {
    return { index: { _index: 'orders' } };
  },
  refreshOnCompletion: false, // 客户端不等待 Refresh
  concurrency: 4,             // 并发批次
  flushBytes: 5 * 1024 * 1024 // 每达到 5MB 刷新一次
});

优化项	默认值	高吞吐推荐	风险
refresh_interval	1s	30s / -1	数据不可见延迟增加
translog.durability	request	async	节点宕机可能丢失 5-10s 数据
number_of_replicas	1	0（写入后恢复）	写入期间无冗余
bulk size	—	5-15 MB	过大：GC 压力；过小：RTT 开销
indices.memory.index_buffer_size	10% heap	20-30% heap	减少 Refresh 频率但增加 GC

13 ES 的垃圾回收与堆内存调优

答案：

Elasticsearch 基于 JVM 运行，GC 调优核心原则是 JVM Heap 不超过物理内存的 50% 且上限 32GB（关闭 CompressedOops 后可突破），Heap 剩余内存交由 Lucene 用于 OS Page Cache。

[分层展开]

Heap 大小：-Xms 与 -Xmx 设为相同值，避免动态调整；建议不超过 31GB（受 Compressed OOPs 影响），若物理内存超过 64GB 可突破 32GB 限制但需关闭 CompressedOops。
GC 算法（8.x+）：默认使用 G1GC，适合大 Heap 低延迟场景。7.x 及更早版本推荐 CMS。
GC 监控：关注 Young GC 频率（正常每 3-5s 一次）与 Old GC 频率（正常每小时几次，频繁则需排查）。
内存占用拆解：
- Segment Memory：存储倒排索引结构，由 Lucene 管理，不在 Heap 内。
- Field Data Cache：排序与聚合时加载词条到堆内，生产建议禁用 Text 字段的 fielddata。
- Node Query Cache：Filter 查询结果缓存。
- Indexing Buffer：写入缓冲区，默认占用 Heap 10%。
- 剩余内存：OS Page Cache，缓存 Segment 文件，减少磁盘 I/O。
Circuit Breaker：ES 内置熔断器限制各类内存操作，触发时抛出 CircuitBreakingException 避免 OOM。

# JVM 堆内存配置 (jvm.options)
-Xms8g
-Xmx8g

# 启用 G1GC
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:G1HeapRegionSize=4m
-XX:InitiatingHeapOccupancyPercent=30

# GC 日志
-Xlog:gc*,gc+age=trace:file=/var/log/elasticsearch/gc.log:time,uptime,level,tags:filecount=10,filesize=50m

# ECK — JVM Heap 配置
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
spec:
  version: 8.17.0
  nodeSets:
    - name: data-hot
      count: 6
      config:
        # 通过环境变量设置 Heap 大小
      podTemplate:
        spec:
          containers:
            - name: elasticsearch
              env:
                - name: ES_JAVA_OPTS
                  value: "-Xms16g -Xmx16g"
              resources:
                requests:
                  memory: "32Gi"
                  cpu: "8"
                limits:
                  memory: "32Gi"
                  cpu: "8"

// 监控 Circuit Breaker 状态
GET /_nodes/stats/breaker
{
  "nodes": {
    "node-1": {
      "breakers": {
        "parent": {
          "limit_size_in_bytes": 17179869184,    // 16GB
          "limit_size": "16gb",
          "estimated_size_in_bytes": 2147483648,  // 2GB
          "estimated_size": "2gb",
          "tripped": 0
        },
        "fielddata": {
          "limit_size_in_bytes": 6871947673,     // 40% of parent
          "tripped": 0
        },
        "request": {
          "limit_size_in_bytes": 10307921510,    // 60% of parent
          "tripped": 0
        }
      }
    }
  }
}

内存区域	位置	用途	调优建议
JVM Heap	堆内	Field Data / Query Cache / Index Buffer	不超过 31GB（Compressed OOPs）
Segment Memory	堆外（OS Page Cache）	倒排索引 / Doc Values	不少于剩余物理内存的 50%
Translog	磁盘	WAL 持久化	独立磁盘避免争抢
OS Page Cache	堆外	Segment 文件缓存	确保 `vm.max_map_count >= 262144`

14 ES 的索引生命周期管理（ILM：Hot / Warm / Cold / Delete）

答案：

索引生命周期管理（Index Lifecycle Management）通过 ILM Policy 定义索引从创建到删除的全阶段自动化流转策略，配合 Rollover、Allocate、Shrink、Delete 等内置 Action 实现索引的自动化运维。

[分层展开]

ILM Policy 结构：定义 Hot、Warm、Cold、Frozen、Delete 五个阶段，每个阶段由 min_age 触发入口条件，由 Actions 定义阶段内执行的自动化操作。
核心 Action：
- Rollover：当前写入索引达到指定条件（max_size / max_age / max_docs）后创建新索引并切换 Alias 写入目标。
- Shrink：减少 Primary Shard 数量（需重新分配，要求目标节点数 >= 目标分片数）。
- Force Merge：合并 Segment 至指定数量（如 1），释放被删除文档占用的空间。
- Allocate：根据节点属性（如 data: warm）迁移分片。
- Delete：删除过期索引。
- Searchable Snapshot：将索引转为 Searchable Snapshot，数据主体存于对象存储。
Index Template 绑定：ILM Policy 通过 Index Template 绑定到索引，确保新建索引自动应用 ILM 策略。
Lifecycle Execution：ILM 以 10 分钟为检查周期，由 indices.lifecycle.poll_interval 控制。

// 完整 ILM Policy
PUT _ilm/policy/hot-warm-cold-delete
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb",
            "max_age": "1d",
            "max_docs": 100000000
          },
          "set_priority": { "priority": 100 }
        }
      },
      "warm": {
        "min_age": "3d",
        "actions": {
          "allocate": {
            "number_of_replicas": 1,
            "require": { "data": "warm" }
          },
          "forcemerge": {
            "max_num_segments": 1
          },
          "shrink": {
            "number_of_shards": 1
          },
          "set_priority": { "priority": 50 }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "number_of_replicas": 0,
            "require": { "data": "cold" }
          },
          "searchable_snapshot": {
            "snapshot_repository": "s3-snapshots"
          },
          "set_priority": { "priority": 0 }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

// 绑定到 Index Template
PUT _index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "index.lifecycle.name": "hot-warm-cold-delete",
      "index.lifecycle.rollover_alias": "logs"
    }
  }
}

// 创建初始索引并绑定 Rollover Alias
PUT logs-000001
{
  "aliases": {
    "logs": {
      "is_write_index": true
    }
  }
}

阶段	典型 min_age	关键 Action	存储策略
Hot	0ms	Rollover	NVMe / SSD
Warm	3d	Shrink + Force Merge	SSD / HDD
Cold	30d	Searchable Snapshot	HDD + 对象存储
Frozen	90d+	Searchable Snapshot	对象存储
Delete	180d+	Delete	永久清除

15 ES 的 Cross-Cluster Replication（CCR）跨集群复制

答案：

CCR（Cross-Cluster Replication）是 X-Pack Platinum 许可特性，通过 Follower Index 以 Pull 模式从 Leader Index 同步数据，实现跨地域、跨集群的索引级容灾。

[分层展开]

架构模型：Follower Cluster 从 Leader Cluster 按 Segment 粒度拉取变更，Follower Index 为只读，Leader Index 可读写。
同步机制：基于 Leader Index 的 Global Checkpoint 确定同步位点；CCR 内部读取 Leader 端 Soft Deletes 历史并重放到 Follower。
自动跟随（Auto-Follow）：配置 auto_follow_pattern 后，Leader 端符合 Pattern 的新索引自动在 Follower 端创建 Follower Index 并开始同步。
暂停与恢复：支持 pause / resume 控制同步状态；暂停期间 Follower Index 转为可写模式可执行应急操作。
部署场景：同云跨 Region 灾备、混合云（On-Prem -> Cloud）、数据集中（多集群汇聚到中心集群）。

// Follower 集群 — 创建 Remote Cluster 连接
PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "leader-cluster": {
          "seeds": ["10.0.1.10:9300", "10.0.1.11:9300"]
        }
      }
    }
  }
}

// 创建 Follower Index
PUT /orders/_ccr/follow
{
  "remote_cluster": "leader-cluster",
  "leader_index": "orders",
  "max_read_request_operation_count": 5000,
  "max_outstanding_read_requests": 12,
  "max_read_request_size": "32mb",
  "max_write_request_operation_count": 5000,
  "max_write_request_size": "9223372036854775807b",
  "max_outstanding_write_requests": 9,
  "read_poll_timeout": "1m"
}

// 自动跟随配置
PUT /_ccr/auto_follow/logs-auto-follow
{
  "remote_cluster": "leader-cluster",
  "leader_index_patterns": ["logs-*", "metrics-*"],
  "follow_index_pattern": "{{leader_index}}-follower",
  "max_outstanding_read_requests": 12,
  "max_outstanding_write_requests": 9,
  "max_write_request_operation_count": 5000
}

# ECK — Multi-Cluster CCR 部署架构
# Cluster A (Asia): Leader — 主写入
# Cluster B (Europe): Follower — 只读灾备

# Cluster B 配置 Remote Cluster 连接
# 使用 ECK Trust Relationship 建立集群间信任
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: cluster-b
spec:
  version: 8.17.0
  remoteClusters:
    - name: cluster-a
      elasticsearchRef:
        name: cluster-a
        namespace: elastic-stack-asia

特性	CCR	Snapshot Restore
同步延迟	秒级	分钟/小时级
RPO	几秒	上次快照至今
Follower 可读	是	恢复后才可读
双向同步	不支持	不支持
License	Platinum / Enterprise	Basic+

16 ES 的 Cross-Cluster Search（CCS）跨集群搜索

答案：

CCS（Cross-Cluster Search）允许一个 ES 集群作为 Coordinator，将搜索请求分发到多个远程集群并行执行，聚合结果后返回客户端，实现用户无感知的多集群联合查询。

[分层展开]

架构模式：Local Cluster 作为协调者，接收客户端请求后广播到配置的 Remote Cluster，各 Remote Cluster 在本地执行查询并返回结果，Local Cluster 合并（Merge / Reduce）。
连接模式：
- Sniff Mode（默认）：Coordinating Gateway 节点加入 Remote Cluster，通过 Sniff 发现 Remote 全部节点并直连。
- Proxy Mode：通过 Dedicated Gateway 节点中转，减少本地集群的 Remote 连接数。
安全与认证：远程集群间需建立 Trust Relationship，ECK 2.x + 支持通过 remoteClusters 自动配置 TLS 认证。
网络延迟：跨 Region / 跨数据中心查询延迟大，建议在查询时设置 ccs_minimize_roundtrips 以减少往返次数。
查询语法：支持 remote_cluster:index_pattern 格式指定远程索引，可在同一查询中混合本地与远程索引。

// 配置 Remote Cluster 连接
PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "asia-cluster": {
          "mode": "sniff",
          "seeds": ["10.0.1.10:9300", "10.0.1.11:9300"],
          "transport.compress": true,
          "skip_unavailable": true
        },
        "europe-cluster": {
          "mode": "proxy",
          "proxy_address": "europe-gateway:9300",
          "skip_unavailable": false
        }
      }
    }
  }
}

// 跨集群搜索
GET /asia-cluster:logs-*,europe-cluster:logs-*,local-logs/_search
{
  "query": {
    "bool": {
      "must": [
        { "range": { "@timestamp": { "gte": "now-1h" } } },
        { "match": { "level": "ERROR" } }
      ]
    }
  },
  "size": 100,
  "ccs_minimize_roundtrips": true
}

// Kibana 中创建 Cross-Cluster Index Pattern
// Management -> Stack Management -> Kibana -> Index Patterns
// Index Pattern: asia-cluster:logs-*,europe-cluster:logs-*,local-logs

连接模式	连接数	延迟	适用场景
Sniff	与 Remote 每个 Data Node 直连	低	同 Region 内网
Proxy	仅与 Gateway 节点连接	中	跨 Region / 大量远程集群
Minimize Roundtrips	减少 Shard 级往返	更优	高延迟网络

17 ES 的滚动升级与零停机配置变更

答案：

ECK 通过 StatefulSet 的 OnDelete / RollingUpdate 策略实现 ES 节点的滚动升级，升级前自动迁移数据、暂停分片分配，逐节点替换，确保集群在整个过程中持续可用。

[分层展开]

升级流程：ECK 检测到 .spec.version 变更后，执行 Pre-stop Hook 暂停分片分配（cluster.routing.allocation.enable: primaries），等待当前节点数据迁移完成后终止 Pod，新版本 Pod 启动后恢复分片分配并等待集群 Green，再继续下一节点。
Upgrade Strategy：默认 RollingUpdate（逐个滚动），也支持 OnDelete（手动触发，精细控制）。
Pre-stop Hook 行为：ECK 自动调用 /_cluster/settings 暂时排除当前节点（_name exclude），等待分片迁移完毕后删除 Pod。
Downgrade 限制：ES 不支持降级，不可将版本号降低。
配置变更：nodeSets 的 count 变更自动扩缩容；config 变更逐节点滚动应用；PVC 扩容依赖底层 StorageClass 的 allowVolumeExpansion。
变更约束：禁止缩小 Master-eligible 节点数导致低于 Quorum（N/2 + 1）。

# ECK — 滚动升级配置
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: production
spec:
  version: 8.17.0              # 修改此字段触发滚动升级
  updateStrategy:
    changeBudget:
      maxUnavailable: 1        # 同时最多不可用节点数
      maxSurge: 0              # 滚动升级不允许临时超出
  nodeSets:
    - name: data
      count: 6
      config:
        # 动态配置变更（不需重启）
        cluster.routing.allocation.disk.watermark.low: 85%
        cluster.routing.allocation.disk.watermark.high: 90%

# 手动升级流程（无 ECK 场景）
# 1. 暂停分片分配
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}

# 2. 停止 Node-1，升级，启动

# 3. 恢复分片分配
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}

# 4. 等待 Green
GET _cluster/health?wait_for_status=green&timeout=30m

# 5. 循环节点 2-N

变更类型	是否滚动重启	对业务影响	恢复时间
版本升级	是	逐个节点不可用（秒级）	取决于数据量
config 变更	是（部分）	逐节点滚动	分钟级
count 变更	否（扩）/ 数据迁移（缩）	无（扩）/ 轻微（缩）	即时 / 分钟级
PVC 扩容	否	无	即时
X-Pack 插件变更	是	逐个重启	分钟级

18 ES 的监控（Stack Monitoring / Prometheus Exporter）

答案：

Elasticsearch 监控方案包括官方 Stack Monitoring（Kibana 内置）与 Prometheus Exporter 两种途径，覆盖集群健康、节点指标、索引性能、JVM 与 OS 指标。

[分层展开]

Stack Monitoring：启用 xpack.monitoring.collection.enabled 后，Metricbeat 将 ES 指标写入 .monitoring-es-* 索引，Kibana Stack Monitoring 页面展示集群总览与节点详情。
Prometheus Exporter：elasticsearch_exporter（Community）或 Elastic Agent Prometheus Integration 将 ES 指标暴露为 Prometheus 格式，结合 Grafana Dashboard（ID: 2322 / 14191）展示。
核心监控指标：
- Cluster：status、active_primary_shards、active_shards、unassigned_shards、pending_tasks
- Node：jvm.heap_used_percent、cpu.percent、disk.total / disk.available、thread_pool.rejected
- Index：indexing.index_total / indexing.index_time、search.query_total / search.query_time
- GC：jvm.gc.collectors.young.collection_count、jvm.gc.collectors.old.collection_time
告警规则：Cluster Health Yellow/Red、Unassigned Shards > 0、Heap Used > 85%、CPU Throttling、Disk Watermark 触发、Thread Pool Rejections。

# elasticsearch_exporter Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: es-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: es-exporter
  template:
    metadata:
      labels:
        app: es-exporter
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9114"
    spec:
      containers:
        - name: exporter
          image: quay.io/prometheuscommunity/elasticsearch-exporter:v1.8.0
          args:
            - "--es.uri=https://production-es-http:9200"
            - "--es.ssl-skip-verify"
            - "--es.all"
            - "--es.indices"
            - "--es.indices_settings"
            - "--es.shards"
            - "--es.slm"
            - "--es.data_stream"
          env:
            - name: ES_USERNAME
              valueFrom:
                secretKeyRef:
                  name: es-monitoring-user
                  key: username
            - name: ES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: es-monitoring-user
                  key: password
          ports:
            - containerPort: 9114
              name: metrics

# Prometheus 告警规则示例
groups:
  - name: elasticsearch
    rules:
      - alert: ESClusterRed
        expr: elasticsearch_cluster_health_status{color="red"} == 1
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "ES cluster {{ $labels.cluster }} is RED"

      - alert: ESHeapUsageHigh
        expr: elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"} > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "ES node {{ $labels.node }} heap usage > 85%"

      - alert: ESUnassignedShards
        expr: elasticsearch_cluster_health_unassigned_shards > 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "ES cluster has {{ $value }} unassigned shards"

监控方案	数据存储	可视化	告警	适用场景
Stack Monitoring	ES 自身	Kibana 内置	Watcher	纯 Elastic Stack 环境
Prometheus Exporter	Prometheus / Mimir	Grafana	Alertmanager	混合监控栈
Elastic Agent	ES Data Stream	Kibana	Alerting Rules	Elastic 8.x 推荐
Datadog / New Relic	SaaS	SaaS 平台	SaaS 平台	已有 SaaS 监控的环境

19 ES 的权限管理（Role-Based Access Control）

答案：

ES RBAC 基于 Role + User（用户）模型实现细粒度权限控制，Role 绑定 Cluster Privilege 与 Index Privilege，结合 Document-Level Security 与 Field-Level Security 实现行级与列级数据访问控制。

[分层展开]

用户（User）：身份认证实体，支持 Native Realm（内置）、File Realm、LDAP、Active Directory、SAML、OIDC、PKI 多种认证方式。User 可被分配多个 Role。
角色（Role）：权限集合，定义对 Cluster、Index、Application 的操作许可。内置角色包括 superuser、kibana_admin、remote_monitoring_agent、ingest_admin 等。
Cluster Privilege：集群级操作权限，如 all、monitor、manage、manage_index_templates、manage_ilm、manage_security。
Index Privilege：索引级操作权限，如 all、read、write、create_index、delete_index、manage、monitor。
DLS (Document-Level Security)：通过 Role 内嵌 Query 过滤文档，实现行级权限（如用户仅可见自己部门的订单）。
FLS (Field-Level Security)：通过 field_security.grant / field_security.except 控制字段可见性（如对分析人员屏蔽客户 PII 字段）。
API Key：基于 API Key 的细粒度鉴权，可限定角色与过期时间，适合应用访问与跨集群调用。

// 创建自定义 Role — 日志读取角色
POST /_security/role/log_reader
{
  "cluster": ["monitor", "read_ilm"],
  "indices": [
    {
      "names": ["logs-*"],
      "privileges": ["read", "view_index_metadata"],
      "allow_restricted_indices": false
    }
  ],
  "applications": [
    {
      "application": "kibana-.kibana",
      "privileges": ["feature_discover.read"],
      "resources": ["space:default"]
    }
  ]
}

// 创建自定义 Role — 含 DLS + FLS
POST /_security/role/order_viewer_dept
{
  "cluster": [],
  "indices": [
    {
      "names": ["orders-*"],
      "privileges": ["read"],
      "query": {
        "template": {
          "source": "{\"term\": {\"department\": \"{{_user.metadata.department}}\"}}"
        }
      },
      "field_security": {
        "grant": ["order_id", "amount", "status", "created_at", "department"]
      }
    }
  ]
}

// 创建用户并绑定角色
POST /_security/user/alice
{
  "password": "secure_password",
  "roles": ["log_reader", "order_viewer_dept"],
  "full_name": "Alice Smith",
  "email": "alice@example.com",
  "metadata": {
    "department": "engineering"
  }
}

// 创建 API Key（应用访问）
POST /_security/api_key
{
  "name": "order-service-key",
  "role_descriptors": {
    "order_writer": {
      "cluster": ["monitor"],
      "indices": [
        {
          "names": ["orders-*"],
          "privileges": ["create_index", "index", "read"]
        }
      ]
    }
  },
  "expiration": "90d"
}

权限粒度	控制对象	配置位置	示例
Cluster Level	集群操作（快照、设置、模板）	Role -> Cluster	`monitor`, `manage_ilm`
Index Level	索引读写删管理	Role -> Indices	`read`, `write`, `delete_index`
Document Level	行级过滤	Role -> Indices -> query	DLS Query DSL
Field Level	列级可见性	Role -> Indices -> field_security	FLS grant/except
Application Level	Kibana Feature	Role -> Applications	Discover / Dashboard 访问

20 ES 的 Snapshot Lifecycle Management（SLM）

答案：

SLM（Snapshot Lifecycle Management）是 ES 的内置快照调度与保留管理功能，通过 SLM Policy 自动按 Cron 表达式创建快照，并根据保留规则清理过期快照。

[分层展开]

SLM Policy 结构：包含 schedule（Cron 表达式）、name（快照命名模板，支持 date_math）、repository（目标 Repository）、config（快照参数）、retention（保留策略）。
保留策略（Retention）：
- expire_after：快照存活时间（如 30d）。
- min_count：最少保留快照数。
- max_count：最多保留快照数。
快照执行：由 slm 后台线程根据 Policy 的 schedule 触发；可通过 POST _slm/policy/<name>/_execute 手动触发。
全局指标：每个 Policy 记录 last_success、last_failure、next_execution 等状态字段。
Health API 集成：当 SLM 快照失败时，可通过 _cluster/health 中的 snapshot_lifecycle 指标触发告警。

// SLM Policy — 每日快照保留 30 天
PUT _slm/policy/daily-snapshots
{
  "schedule": "0 1 * * *",
  "name": "<daily-snap-{now/d}>",
  "repository": "s3-backup",
  "config": {
    "indices": ["logs-*", "metrics-*", ".kibana*"],
    "ignore_unavailable": true,
    "include_global_state": false,
    "partial": false
  },
  "retention": {
    "expire_after": "30d",
    "min_count": 5,
    "max_count": 60
  }
}

// SLM Policy — 分层保留（GFS 策略）
PUT _slm/policy/layered-retention
{
  "schedule": "0 2 * * *",
  "name": "<layered-{now/d}>",
  "repository": "s3-backup",
  "config": {
    "indices": ["*,-.monitoring-*"],
    "include_global_state": true
  },
  "retention": {
    "expire_after": "90d",
    "min_count": 7,
    "max_count": 100
  }
}

// 查看 SLM 状态
GET _slm/stats
{
  "retention_runs": 120,
  "retention_failed": 0,
  "retention_timed_out": 0,
  "retention_deletion_time_millis": 45000,
  "policy_stats": [{
    "policy": "daily-snapshots",
    "snapshots_taken": 30,
    "snapshots_failed": 0,
    "snapshots_deleted": 25,
    "retention_deletions": 25
  }]
}

保留策略	参数	作用
expire_after	时间	超出存活期的快照自动删除
min_count	数量	即使超出存活期也保留至少 N 个
max_count	数量	最多保留 N 个快照，超出则删除最旧
组合使用	expire_after + min_count + max_count	总保留数不超过 max_count，但不少于 min_count（即使 expire_after 已到）

21 ES 的索引模版（Index Template / Component Template / Data Stream）

答案：

索引模版（Index Template）定义新建索引的 Settings、Mappings 与 Aliases，Component Template 实现可复用的模版组件，Data Stream 将时序数据写入抽象为背附多个索引的写入流。

[分层展开]

Component Template（ES 7.8+）：可复用的最小模板单元，包含 Settings、Mappings 或 Aliases 中的一项或多项，被 Index Template 组合引用。例：logs-settings Component 定义分片与 ILM 配置，logs-mappings Component 定义字段类型。
Composable Index Template（ES 7.8+）：组合多个 Component Template 并定义 index_patterns 匹配规则，优先级从高到低合并。替代旧版 _template API。
Data Stream（ES 7.9+）：面对时序数据的抽象，将写入别名背后的多个 Backing Index 封装为统一的写入与读取入口。每个 Data Stream 绑定一个 Index Template，该 Template 必须包含 data_stream: {} 对象且 index_mode: "time_series"（可选）。
新旧对比：7.8 之前使用 Legacy Index Template（_template API），不支持组件化；7.8+ 引入 Composable Index Template + Component Template；7.9+ Data Stream 正式 GA。

// Component Template — 可复用组件
PUT _component_template/logs-settings
{
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "refresh_interval": "30s",
      "index.lifecycle.name": "logs-ilm-policy"
    }
  }
}

PUT _component_template/logs-mappings
{
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "message": { "type": "text" },
        "level": { "type": "keyword" },
        "host.name": { "type": "keyword" },
        "service.name": { "type": "keyword" },
        "trace.id": { "type": "keyword" }
      }
    }
  }
}

// Composable Index Template — 组合引用 Component
PUT _index_template/logs-template
{
  "priority": 100,
  "index_patterns": ["logs-*"],
  "composed_of": ["logs-settings", "logs-mappings"],
  "data_stream": {},
  "_meta": {
    "description": "Template for application logs data stream"
  }
}

// 创建 Data Stream
PUT _data_stream/logs-app
// 写入 Data Stream
POST /logs-app/_doc
{
  "@timestamp": "2026-05-26T10:00:00Z",
  "message": "User login successful",
  "level": "INFO",
  "host.name": "web-01",
  "service.name": "auth-service"
}

// 设置 Data Stream 为 Time Series 模式（ES 8.x tsdb）
PUT _index_template/metrics-template
{
  "priority": 200,
  "index_patterns": ["metrics-*"],
  "data_stream": {},
  "template": {
    "settings": {
      "index.mode": "time_series",
      "index.routing_path": ["host.name"]
    },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "host.name": {
          "type": "keyword",
          "time_series_dimension": true
        },
        "cpu.usage": {
          "type": "double",
          "time_series_metric": "gauge"
        }
      }
    }
  }
}

特性	Legacy Template	Composable Template + Component	Data Stream
版本	< 7.8	>= 7.8	>= 7.9
组件复用	不支持	Component Template 复用	基于 Composable Template
索引命名	用户控制	用户控制	自动生成 `.ds-<name>-<generation>`
写入模型	直接写索引	直接写索引	通过 Data Stream 别名写入
ILM/Rollover	手动绑定	手动绑定	内置 Rollover 支持

22 ES 常见故障排查（Unassigned Shards / Split Brain / Circuit Breaker）

答案：

ES 常见故障包括未分配分片（Unassigned Shards）、脑裂（Split Brain）、熔断器触发（Circuit Breaker）三大类，每种故障对应明确的排查路径与恢复流程。

[分层展开]

Unassigned Shards

现象：Cluster Health 为 Yellow 或 Red，/_cat/shards 显示 UNASSIGNED 状态。

原因：

节点宕机且分片的 All Copies 不可用（Red）。
分片数多于可用节点数（Primary 数量 > Data Node 数）。
Disk Watermark 触发，Master 拒绝分配新分片。
Allocation Filter 规则与当前节点属性不匹配。
Replica 分片在单节点集群无法分配（Yellow，预期行为）。

排查：GET /_cluster/allocation/explain 返回未分配原因与 can_allocate 决策。

恢复：

增加 Data 节点。
调整 number_of_replicas。
释放磁盘空间或调整 Disk Watermark。
使用 /_cluster/reroute 重试分配，allocate_stale_primary 恢复 Red 索引。

Split Brain

现象：集群出现两个 Master，日志中 multiple-master 或 master_not_discovered 异常，数据写入不一致。

原因：网络分区导致 Master-eligible 节点间通信中断；discovery.zen.minimum_master_nodes（7.x）配置小于等于 N/2。

恢复：

恢复网络连接。
确保 minimum_master_nodes (7.x) 设为 N/2 + 1。
升级到 8.x 使用 Coordinator（Raft）自动避脑裂。
从 Snapshot 恢复数据不一致的索引。

Circuit Breaker

现象：查询或写入抛出 CircuitBreakingException，错误信息包含 [parent] Data too large 或 [fielddata] Data too large。

原因：某类内存操作超出对应熔断器预设阈值。

排查：GET /_nodes/stats/breaker 查看哪些断路器触发、当前使用量与限制。

处理：

增加 Heap 大小或调整断路器限制。
优化查询：使用 filter 上下文减少 Field Data 消耗，分页查询替代大结果集。
禁用 Text 字段的 fielddata，改用 keyword 字段排序/聚合。
限制 terms 聚合分片级大小（shard_size）。

# 排查 Unassigned Shards
GET _cluster/allocation/explain
{
  "index": "logs-2026.05.15",
  "shard": 2,
  "primary": true
}

# 手动重分配 (allocate_stale_primary — 数据可能丢失)
POST _cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "logs-2026.05.15",
        "shard": 2,
        "node": "data-node-3",
        "accept_data_loss": true
      }
    }
  ]
}

# 检查 Disk Watermark
GET _cat/allocation?v&h=node,disk.percent,disk.total,disk.used,disk.avail

# 调整 Disk Watermark（紧急）
PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "97%"
  }
}

// 查看已触发的断路器
GET _nodes/stats/breaker
// 返回示例 — 关注 tripped > 0 的断路器

// 调整 Field Data 断路器限制
PUT _cluster/settings
{
  "transient": {
    "indices.breaker.fielddata.limit": "50%",
    "indices.breaker.request.limit": "70%",
    "indices.breaker.total.limit": "95%"
  }
}

// 在 Mapping 中禁用 Text 字段的 fielddata
PUT /my-index/_mapping
{
  "properties": {
    "description": {
      "type": "text",
      "fielddata": false
    },
    "description_keyword": {
      "type": "keyword"
    }
  }
}

故障类型	关键排查 API	紧急恢复	根本解决方案
Unassigned Shards (Yellow)	`_cluster/allocation/explain`	增加节点或调整副本数	保障节点数与副本策略匹配
Unassigned Shards (Red)	`_cluster/allocation/explain`	`allocate_stale_primary`（丢数据）	确保 Primary 多副本 + Snapshot
Split Brain	Node 日志	恢复网络，修正 `minimum_master_nodes`	升级 8.x Coordinator
Circuit Breaker	`_nodes/stats/breaker`	放宽限制或降低并发	优化查询 + 合理分配 Heap
Disk Full	`_cat/allocation`	释放磁盘 / 调整 Watermark	磁盘容量规划 + ILM 自动清理

参考资料：Elasticsearch 官方指南（elastic.co/guide/en/elasticsearch/reference）、CCR/CCS 官方文档