如何安全的移除elasticsearch节点

发布时间：2021-12-23 11:28:03 作者：iii
来源：亿速云阅读：1030

本篇内容介绍了“如何安全的移除elasticsearch节点”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！

为什么要移除一个节点，有很多种情况

节点所在物理机要销毁。
节点实例需要升级、重启

es文档

Elasticsearch安全移除节点

想要安全的移除一个es节点，不改变分片的数量，100%不会引起数据丢失，即保证这个节点的所有数据被其他节点接收。然后停止这个节点的实例。

1. Data-node节点

步骤1：将节点从集群路由策略中排除

curl -XPUT http://0.0.0.0:9200/_cluster/settings?pretty -d '{"transient":{"cluster.routing.allocation.exclude._ip":"10.10.10.11"}}'

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._ip": "10.1.22.129"
  }
}

执行结果：
{
  "acknowledged">

如何安全的移除elasticsearch节点

步骤2：等待节点上分片全部被迁移

curl http://0.0.0.0:9200/_cluster/health?pretty
curl http://0.0.0.0:9200/_cluster/pending_tasks?pretty
curl http://0.0.0.0:9200/_cluster/allocation/explain?pretty

1.检查集群状态
http://10.1.34.146:9200/_cluster/health?pretty
{
   cluster_name: "my-es6-test",
   status: "green",
   timed_out: false,
   number_of_nodes: 4,
   number_of_data_nodes: 4,
   active_primary_shards: 150,
   active_shards: 272,
   relocating_shards: 0,
   initializing_shards: 0,
   unassigned_shards: 0,
   delayed_unassigned_shards: 0,
   number_of_pending_tasks: 0,
   number_of_in_flight_fetch: 0,
   task_max_waiting_in_queue_millis: 0,
   active_shards_percent_as_number: 100
}

2.若出现pening_tasks，当pending_tasks的等级>=HIGH时，存在集群无法新建索引的风险
http://10.1.34.146:9200/_cluster/pending_tasks?pretty
{
  "tasks": []
}

3.若集群中出现UNASSIGNED shards,检查原因，查看是否是分配策略导致无法迁移分片
http://10.1.22.129:9200/_cluster/allocation/explain?pretty

4.查看节点数据是否已迁移，都是 0 表示数据也已经迁移
http://10.1.34.146:9200/_nodes/%7Bnode-6%7D/stats/indices?pretty
{
   _nodes: {
       total: 0,
       successful: 0,
       failed: 0
   },
   cluster_name: "my-es6-test",
   nodes: { }
}

步骤3：下线节点

kill {pid}

步骤4：取消节点禁用策略

curl -XPUT http://0.0.0.0:9200/_cluster/settings?pretty -d '{"transient":{"cluster.routing.allocation.exclude._ip": null}}'

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._ip": null
  }
}

两个节点禁用策略：
curl -XPUT http://0.0.0.0:9200/_cluster/settings?pretty -d '{"transient":{"cluster.routing.allocation.exclude._ip":"10.10.10.11,10.10.10.12"}}'

2. Master-node节点

#下线非master节点
#start
#步骤1：停止IP指定的ES实例
##注意：由于ES集群配置文件中指定了ES集群必须有2台master eligible节点才能进行选举选出master节点，所以mater-node组的节点数应保持至少3台
#end


#下线master节点
#start
#步骤1：停止IP指定的ES实例
##注意：master节点进行选举的时间默认是3s,配置文件中可能设置的为30s。在master选举期间集群功能不可用（索引、查找、各类API功能）
#end

3. Client-node节点

#未验证
#start
##步骤1：LB中删除指定IP
##步骤2: 停止IP指定的ES实例
#end

Elasticsearch安全停机

有序的关闭Elasticsearch来确保Elasticsearch有机会清理和关闭未完成得资源。譬如：节点关闭后有序的从集群中移除、同步传输日志到磁盘以及一些其他的相关清理活动。你可以确保Elasticsearch有序的停机来帮助Elasticsearch正确的停止。

如果Elasticsearch作为一个服务运行，你可以通过你安装的服务管理功能来停止Elasticsearch。

如果你是在控制台直接运行的Elasticsearch，你可以通过发送conrtol + C来停止，或者是在POSIX系统发送SIGTERM信号给Elasticsearch进程。你可以通过各种各样的工具获取PID来发送信号（如：ps或jps）：

[localhost~]$ ps aux | grep Elasticsearch
[localhost~]$ ps -ef | grep Elasticsearch

或者通过启动日志：

[2016-07-07 12:26:18,908][INFO ][node] [I8hydUG] version[5.0.0-alpha4], pid[15399], build[3f5b994/2016-06-27T16:23:46.861Z], OS[Mac OS X/10.11.5/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_92/25.92-b14]

或者通过启动时指定的PID文件获取：

$ ./bin/elasticsearch -p /tmp/elasticsearch-pid -d
$ cat /tmp/elasticsearch-pid && echo
15516
$ kill -SIGTERM 15516

1. 致命错误停机

在Elasticsearch虚拟机运行期间，可能出现某些致命错误把虚拟机标记为可疑状态。这些致命错误可能包含虚拟机内部错误、严重的I/O错误。

当Elasticsearch检测到虚拟机遇到这样一个致命错误时，Elasticsearch将尝试记录错误，然后将停止虚拟机。当Elasticsearch发起一个这样的关闭时，它没有经过上述的有序关闭。Elasticsearch将会返回一个特定的状态码来标识这个错误。

错误原因	错误码
JVM内部错误	128
内存溢出	127
堆溢出	126
未知虚拟机错误	125
严重I/O错误	124
未知致命错误	1

“如何安全的移除elasticsearch节点”的内容就介绍到这里了，感谢大家的阅读。如果想了解更多行业相关的知识可以关注亿速云网站，小编将为大家输出更多高质量的实用文章！

如何安全的移除elasticsearch节点

Elasticsearch安全移除节点

1. Data-node节点

2. Master-node节点

3. Client-node节点

Elasticsearch安全停机

1. 致命错误停机

相关阅读