深入MongoDB内存溢出调优

发布时间：2020-10-20 05:38:19 作者：UltraSQL
来源：网络阅读：7381

MongoDB内存溢出错误描述

exception: getMore runner error: Overflow sort stage buffered data 
usage of 33638076 bytes exceeds internal limit of 33554432 bytes

MongoDB内存中排序的限制和解决方案

下文引用自：https://docs.mongodb.com/manual/reference/method/cursor.sort/#cursor.sort

When unable to obtain the sort order from an index, MongoDB will sort the results in memory, which requires that the result set being sorted is less than 32 megabytes.

When the sort operation consumes more than 32 megabytes, MongoDB returns an error. To avoid this error, either create an index supporting the sort operation (see Sort and Index Use) or use sort() in conjunction with limit() (see Limit Results).

MongoDB查询方法的描述和执行顺序

下文引用自：https://docs.mongodb.com/manual/tutorial/query-documents/#query-method

Query Method

MongoDB provides the db.collection.find() method to read documents from a collection. The db.collection.find() method returns a cursor to the matching documents.

db.collection.find( <query filter>, <projection> )

For the db.collection.find() method, you can specify the following optional fields:

a query filter to specify which documents to return.
a query projection to specifies which fields from the matching documents to return. The projection limits the amount of data that MongoDB returns to the client over the network.

You can optionally add a cursor modifier to impose limits, skips, and sort orders. The order of documents returned by a query is not defined unless you specify a sort().

下文引用自：https://docs.mongodb.com/manual/reference/method/db.collection.find/#combine-cursor-methods

Combine Cursor Methods

The following statements chain cursor methods limit() and sort():

db.bios.find().sort( { name: 1 } ).limit( 5 )
db.bios.find().limit( 5 ).sort( { name: 1 } )

The two statements are equivalent; i.e. the order in which you chain the limit() and the sort() methods is not significant. Both statements return the first five documents, as determined by the ascending sort order on ‘name’.

顺便来看看SQL Server语句执行顺序

《SQL Server 2005技术内幕--查询》这本书的开篇第一章第一节。书的作者也要让读者首先了解语句是怎么样的一个执行顺序。

查询的逻辑执行顺序：

(1) FROM < left_table>

(3) < join_type> JOIN < right_table> (2) ON < join_condition>

(4) WHERE < where_condition>

(5) GROUP BY < group_by_list>

(6) WITH {cube | rollup}

(7) HAVING < having_condition>

(8) SELECT (9) DISTINCT (11) < top_specification> < select_list>

(10) ORDER BY < order_by_list>

标准的SQL 的解析顺序为:

(1).FROM 子句组装来自不同数据源的数据

(2).WHERE 子句基于指定的条件对记录进行筛选

(3).GROUP BY 子句将数据划分为多个分组

(4).使用聚合函数进行计算

(5).使用HAVING子句筛选分组

(6).计算所有的表达式

(7).使用ORDER BY对结果集进行排序

执行顺序：

1.FROM：对FROM子句中前两个表执行笛卡尔积生成虚拟表vt1

2.ON:对vt1表应用ON筛选器只有满足< join_condition> 为真的行才被插入vt2

3.OUTER(join)：如果指定了 OUTER JOIN保留表(preserved table)中未找到的行将行作为外部行添加到vt2 生成t3如果from包含两个以上表则对上一个联结生成的结果表和下一个表重复执行步骤和步骤直接结束

4.WHERE：对vt3应用 WHERE 筛选器只有使< where_condition> 为true的行才被插入vt4

5.GROUP BY：按GROUP BY子句中的列列表对vt4中的行分组生成vt5

6.CUBE|ROLLUP：把超组(supergroups)插入vt6 生成vt6

7.HAVING：对vt6应用HAVING筛选器只有使< having_condition> 为true的组才插入vt7

8.SELECT：处理select列表产生vt8

9.DISTINCT：将重复的行从vt8中去除产生vt9

10.ORDER BY：将vt9的行按order by子句中的列列表排序生成一个游标vc10

11.TOP：从vc10的开始处选择指定数量或比例的行生成vt11 并返回调用者

对比总结

MongoDB和SQL Server都是先SELECT列表后，再到内存中排序，最后取前几行。

对于内存溢出的优化

MongoDB查询优化的原则可参考：

Optimize Query Performance
https://docs.mongodb.com/manual/tutorial/optimize-query-performance-with-indexes-and-projections/

有的开发会干脆将数据取出来后在程序里排序，这个不推荐，因为这样同样占用过多内存，没有从根本上解决这个问题。

比较推荐的方案有三个：
1.优化查询和索引。
2.减少输出列（限制输出列个数）或行（如limit函数，或限制输入查询_id数量）。
3.将查询分2步，第1步只输出_id，第2步再通过_id查明细。
都可以解决内存中排序溢出问题。

从3.0版本开始的系统参数调优

从3.0版本开始可以通过修改参数值internalQueryExecMaxBlockingSortBytes来增加内存排序大小限制。

先来看看所有支持的参数：

use admin
db.runCommand( { getParameter : 1, "internalQueryExecMaxBlockingSortBytes" : 1 } )

再来看看如何设置：

db.adminCommand({setParameter: 1, internalQueryExecMaxBlockingSortBytes: <limit in bytes>})

深入MongoDB内存溢出调优

相关阅读