Elasticsearch集群40亿级优化

发布时间:2020-07-05 04:49:36 作者:archivelog
来源:网络 阅读:3314

目前架构:

n台filebeat客户端来将每台应用上的日志传到kafka,3台kafka做集群用于日志队列,四台ES做集群,前两台存放近两天热数据日志,后两台存放两天前的历史日志,数据保存一个月,目前总数据量44亿,大小为6T。logstash与kibana与ES在一台机器上,kibana域名指向后端三个kibana做轮询。


出现性能问题:

1、集群中只有第一台负载很高,其他节点负载一直都很低,偶尔同为hot数据节点的第二台负载也会稍微有点升高。

2、队列经常堵塞,kafka中uat,pet,prd三个环境的topic同在一个默认的logstash消费组。只要其中一个环境的列队积压,其他环境的队列就无法消费了。

3、Kibana登陆后首页打开,需要至少半分钟,日志查询也很慢,至少几分钟才会出结果。

4、有时候ES常因负载高而脱离集群,导致集群节点数据重新分配,集群状态颜色为RED,同时kibana页面打开时显示Red报错。kibana页面间断无法打开的情况约持续一两周。




目前ELK中发现有些索引查询有点慢,于是打开ES索引查询日志来记录慢查询,进而对慢查询日志进行分析,定位问题。慢日志内容如下:

[2017-08-28T11:21:02,377][WARN ][index.search.slowlog.query] [node-3] [logstash-nginx-2017.08.01][4] took[15s], took_millis[15029], types[], stats[], search
_type[QUERY_THEN_FETCH], total_shards[140], source[{"size":0,"query":{"bool":{"filter":[{"match_none":{"boost":1.0}},{"query_string":{"query":"NOT status:200  OR  NOT
status:304","fields":[],"use_dis_max":true,"tie_breaker":0.0,"default_operator":"or","auto_generate_phrase_queries":false,"max_determined_states":10000,"enable_position
_increment":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"analyze_wildcard":true,"escape":false,"split_on_whitespace":true,
"boost":1.0}}],"disable_coord":false,"adjust_pure_negative":true,"boost":1.0}},"aggregations":{"3":{"terms":{"field":"status","size":5,"min_doc_count":0,"shard_min_doc_
count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_term":"asc"}]},"aggregations":{"2":{"date_histogram":{"field":"@timestamp","format":"epoch_mill
is","interval":"20m","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0,"extended_bounds":{"min":"1503886846372","max":"1503890446372"}}}}}}}],
[2017-08-28T11:21:02,377][WARN ][index.search.slowlog.query] [node-3] [logstash-nginx-2017.08.01][2] took[15.7s], took_millis[15787], types[], stats[], sear
ch_type[QUERY_THEN_FETCH], total_shards[140], source[{"size":0,"query":{"bool":{"filter":[{"match_none":{"boost":1.0}},{"query_string":{"query":"NOT status:200  OR  NOT
  status:304","fields":[],"use_dis_max":true,"tie_breaker":0.0,"default_operator":"or","auto_generate_phrase_queries":false,"max_determined_states":10000,"enable_positi
on_increment":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"analyze_wildcard":true,"escape":false,"split_on_whitespace":tru
e,"boost":1.0}}],"disable_coord":false,"adjust_pure_negative":true,"boost":1.0}},"aggregations":{"3":{"terms":{"field":"status","size":5,"min_doc_count":0,"shard_min_do
c_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_term":"asc"}]},"aggregations":{"2":{"date_histogram":{"field":"@timestamp","format":"epoch_mi
llis","interval":"20m","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0,"extended_bounds":{"min":"1503886846372","max":"1503890446372"}}}}}}}],

下面进行分析:

待续

推荐阅读:
  1. ElasticSearch集群搭建
  2. elasticsearch 集群部署

免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。

slowlog elasticsearch st

上一篇:[完结篇] - 理解异步之美 --- promise与async await (三)

下一篇:前后端分离实践有感

相关阅读

您好,登录后才能下订单哦!

密码登录
登录注册
其他方式登录
点击 登录注册 即表示同意《亿速云用户服务条款》