Elastic搜索的使用方法

发布时间：2021-07-10 10:32:03 作者：chen
来源：亿速云阅读：168

# Elasticsearch的使用方法

## 目录
- [一、Elasticsearch概述](#一elasticsearch概述)
  - [1.1 什么是Elasticsearch](#11-什么是elasticsearch)
  - [1.2 核心特性](#12-核心特性)
  - [1.3 典型应用场景](#13-典型应用场景)
- [二、环境搭建与配置](#二环境搭建与配置)
  - [2.1 系统要求](#21-系统要求)
  - [2.2 单节点安装](#22-单节点安装)
  - [2.3 集群配置](#23-集群配置)
  - [2.4 安全配置](#24-安全配置)
- [三、核心概念解析](#三核心概念解析)
  - [3.1 索引(Index)](#31-索引index)
  - [3.2 文档(Document)](#32-文档document)
  - [3.3 类型(Type)](#33-类型type)
  - [3.4 分片(Shard)](#34-分片shard)
  - [3.5 副本(Replica)](#35-副本replica)
- [四、数据操作CRUD](#四数据操作crud)
  - [4.1 创建索引](#41-创建索引)
  - [4.2 文档增删改查](#42-文档增删改查)
  - [4.3 批量操作](#43-批量操作)
  - [4.4 版本控制](#44-版本控制)
- [五、搜索查询详解](#五搜索查询详解)
  - [5.1 查询DSL基础](#51-查询dsl基础)
  - [5.2 全文搜索](#52-全文搜索)
  - [5.3 复合查询](#53-复合查询)
  - [5.4 聚合分析](#54-聚合分析)
  - [5.5 高亮与建议](#55-高亮与建议)
- [六、高级特性](#六高级特性)
  - [6.1 映射与分词](#61-映射与分词)
  - [6.2 索引别名](#62-索引别名)
  - [6.3 索引模板](#63-索引模板)
  - [6.4 管道处理](#64-管道处理)
- [七、性能优化](#七性能优化)
  - [7.1 硬件配置](#71-硬件配置)
  - [7.2 索引设计](#72-索引设计)
  - [7.3 查询优化](#73-查询优化)
  - [7.4 JVM调优](#74-jvm调优)
- [八、实战案例](#八实战案例)
  - [8.1 电商商品搜索](#81-电商商品搜索)
  - [8.2 日志分析系统](#82-日志分析系统)
  - [8.3 地理位置搜索](#83-地理位置搜索)
- [九、常见问题排查](#九常见问题排查)
  - [9.1 性能问题](#91-性能问题)
  - [9.2 数据不一致](#92-数据不一致)
  - [9.3 集群状态异常](#93-集群状态异常)
- [十、未来发展趋势](#十未来发展趋势)
  - [10.1 向量搜索](#101-向量搜索)
  - [10.2 机器学习集成](#102-机器学习集成)
  - [10.3 云原生支持](#103-云原生支持)

## 一、Elasticsearch概述

### 1.1 什么是Elasticsearch

Elasticsearch是一个基于Lucene构建的开源、分布式、RESTful搜索引擎。它提供了一个分布式多用户能力的全文搜索引擎，能够实时存储、检索和分析大规模数据。

**核心架构特点**：
- 分布式设计：自动分片数据并平衡集群负载
- 近实时(NRT)搜索：数据变更在秒级内可被搜索
- 高可用性：通过副本机制保证数据安全
- RESTful API：所有操作通过HTTP接口暴露

### 1.2 核心特性

1. **全文检索能力**：
   - 支持多种语言分词
   - 提供相关性评分机制
   - 支持模糊搜索、同义词等高级特性

2. **数据分析功能**：
   ```json
   {
     "aggs": {
       "avg_price": {
         "avg": { "field": "price" }
       }
     }
   }

可扩展性：
- 水平扩展：轻松添加节点处理更大数据量
- 垂直扩展：通过增加硬件资源提升性能

1.3 典型应用场景

场景类型	使用案例	ES优势
企业搜索	文档管理系统、Wiki搜索	强大的全文检索能力
日志分析	ELK Stack中的日志存储与分析	高效的聚合查询性能
电商平台	商品搜索、推荐系统	支持多条件过滤和相关性排序
监控系统	应用性能监控(APM)数据存储	实时数据处理能力

二、环境搭建与配置

2.1 系统要求

最低配置： - JDK 11或更高版本 - 2GB RAM（生产环境建议8GB+） - 10GB可用磁盘空间

操作系统支持： - Linux/Unix - Windows - macOS（开发环境）

2.2 单节点安装

Linux安装示例：

# 下载安装包
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.4.1-linux-x86_64.tar.gz

# 解压
tar -xzf elasticsearch-8.4.1-linux-x86_64.tar.gz
cd elasticsearch-8.4.1/

# 启动单节点
./bin/elasticsearch

# 验证运行状态
curl -X GET "localhost:9200/?pretty"

关键配置文件： - config/elasticsearch.yml：主配置文件 - config/jvm.options：JVM参数配置 - config/log4j2.properties：日志配置

2.3 集群配置

三节点集群配置示例：

# node-1配置
cluster.name: my-application
node.name: node-1
network.host: 192.168.1.101
discovery.seed_hosts: ["192.168.1.101", "192.168.1.102", "192.168.1.103"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

# 其他节点类似配置，修改node.name和network.host即可

2.4 安全配置

基础安全设置： 1. 启用TLS加密：

   ./bin/elasticsearch-certutil ca
   ./bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12

配置基础认证：

xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true

设置内置用户密码：

./bin/elasticsearch-setup-passwords auto

三、核心概念解析

3.1 索引(Index)

索引是Elasticsearch中最高层次的数据组织单位，相当于关系型数据库中的”数据库”概念。

索引特性： - 包含多个具有相似特征的文档 - 通过名称引用（必须小写） - 可以配置不同的分片和副本数量

PUT /my_index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 2
  }
}

3.2 文档(Document)

文档是Elasticsearch中的基本数据单元，使用JSON格式表示。

文档元数据： - _index：所属索引 - _type：文档类型（7.x后默认为_doc） - _id：唯一标识符 - _version：版本号 - _source：原始JSON内容

3.3 类型(Type)

Elasticsearch 7.x后已弃用类型概念，8.x完全移除。现在所有文档都使用默认类型_doc。

3.4 分片(Shard)

分片机制： - 主分片(Primary Shard)：数据的主要存储单元 - 副本分片(Replica Shard)：主分片的拷贝，提供高可用

分片数量建议： - 每个分片大小建议在10-50GB之间 - 分片数量在创建索引后不可更改（除非reindex）

3.5 副本(Replica)

副本分片提供： 1. 故障转移：主分片不可用时提升副本为主分片 2. 读取扩展：搜索请求可以并行在所有副本上执行

PUT /my_index/_settings
{
  "number_of_replicas": 1
}

四、数据操作CRUD

4.1 创建索引

带映射的索引创建：

PUT /products
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "price": { "type": "double" },
      "created_at": { "type": "date" }
    }
  }
}

4.2 文档增删改查

创建文档：

POST /products/_doc/1
{
  "name": "智能手机",
  "price": 2999.00,
  "description": "最新款旗舰智能手机",
  "tags": ["电子", "通讯", "数码"]
}

查询文档：

GET /products/_doc/1

更新文档：

POST /products/_update/1
{
  "doc": {
    "price": 2799.00
  }
}

删除文档：

DELETE /products/_doc/1

4.3 批量操作

Bulk API示例：

POST _bulk
{ "index" : { "_index" : "products", "_id" : "2" } }
{ "name": "平板电脑", "price": 1999.00 }
{ "create" : { "_index" : "products", "_id" : "3" } }
{ "name": "智能手表", "price": 899.00 }
{ "delete" : { "_index" : "products", "_id" : "1" } }

4.4 版本控制

乐观并发控制：

PUT /products/_doc/2?version=1&version_type=external
{
  "name": "更新后的平板电脑",
  "price": 1899.00
}

五、搜索查询详解

5.1 查询DSL基础

查询结构：

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "手机" } }
      ],
      "filter": [
        { "range": { "price": { "gte": 1000, "lte": 3000 } } }
      ]
    }
  },
  "sort": [
    { "price": { "order": "desc" } }
  ],
  "from": 0,
  "size": 10
}

5.2 全文搜索

多字段匹配查询：

GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "智能",
      "fields": ["name^2", "description"]
    }
  }
}

5.3 复合查询

布尔查询组合：

{
  "query": {
    "bool": {
      "must": [
        { "term": { "category": "electronics" } }
      ],
      "should": [
        { "match": { "name": "phone" } },
        { "match": { "description": "smart" } }
      ],
      "minimum_should_match": 1,
      "must_not": [
        { "range": { "price": { "gte": 1000 } } }
      ]
    }
  }
}

5.4 聚合分析

多级聚合示例：

GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_stats": {
      "stats": { "field": "price" }
    },
    "tags_agg": {
      "terms": { "field": "tags.keyword", "size": 5 },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        }
      }
    }
  }
}

5.5 高亮与建议

搜索结果高亮：

GET /products/_search
{
  "query": {
    "match": { "description": "智能" }
  },
  "highlight": {
    "fields": {
      "description": {
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"]
      }
    }
  }
}

六、高级特性

6.1 映射与分词

自定义分词器：

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "ik_max_word",
          "filter": ["lowercase", "my_filter"]
        }
      },
      "filter": {
        "my_filter": {
          "type": "stop",
          "stopwords": ["的", "了", "是"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

6.2 索引别名

别名管理：

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "products_2023",
        "alias": "current_products"
      }
    },
    {
      "remove": {
        "index": "products_2022",
        "alias": "current_products"
      }
    }
  ]
}

6.3 索引模板

索引模板定义：

PUT _index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 2,
      "number_of_replicas": 1
    },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "message": { "type": "text" }
      }
    }
  }
}

6.4 管道处理

Ingest Pipeline示例：

PUT _ingest/pipeline/timestamp_pipeline
{
  "description": "Add timestamp to documents",
  "processors": [
    {
      "set": {
        "field": "@timestamp",
        "value": "{{_ingest.timestamp}}"
      }
    },
    {
      "grok": {
        "field": "message",
        "patterns": ["%{TIMESTAMP_ISO8601:log_time} %{LOGLEVEL:level} %{GREEDYDATA:content}"]
      }
    }
  ]
}

# 使用管道
POST my_index/_doc?pipeline=timestamp_pipeline
{
  "message": "2023-01-01T12:00:00Z INFO System started"
}

七、性能优化

7.1 硬件配置

生产环境建议： - 内存：至少50%给ES堆内存（不超过32GB） - 磁盘：SSD优先，RD 0或单盘 - CPU：现代多核处理器（16核+）

7.2 索引设计

优化策略： 1. 冷热数据分离：使用ILM(Index Lifecycle Management) 2. 合理设置分片数：避免过多小分片 3. 字段映射优化： - 不需要搜索的字段设为"index": false - 数值类型优先选择最紧凑的类型

7.3 查询优化

常见优化手段： - 使用filter代替query进行不评分过滤 - 避免深度分页（使用search_after替代） - 合理使用缓存：

  {
    "query": {
      "bool": {
        "filter": [
          { "term": { "category": "electronics" } }
        ]
      }
    }
  }

7.4 JVM调优

关键参数：

# config/jvm.options
-Xms16g
-Xmx16g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200

八、实战案例

8.1 电商商品搜索

完整搜索实现：

GET /products/_search
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            { "match": { "name": "手机" } }
          ],
          "filter": [
            { "term": { "in_stock": true } },
            { "range": { "price": { "lte": 5000 } } }
          ]
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "sales",
            "factor": 1.2,
            "modifier": "log1p"
          }
        }
      ],
      "boost_mode": "multiply"
    }
  },
  "aggs": {
    "brands": {
      "terms": { "field": "brand.keyword" }
    },
    "price_histogram": {
      "histogram": {
        "field": "price",
        "interval": 1000
      }
    }
  }
}

8.2 日志分析系统

日志分析查询： “`json GET /