1 当各Sharding间负载和数据分布不平衡时,自动rebalancing
2 简单方便的添加和删除节点
3 自动故障转移(auto failover)
4 可扩展至上千台节点
如何增加shard节点,在之前的shard集群配置过程使用过。当向一个sharding集群添加新的节点,mongodb 会将在其他节点的数据chunk迁移到新的节点上面。以便达到均分数据的目的,这也算是负载均衡吧。 添加之前:
mongos> db.printShardingStatus()
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "shard0000", "host" : "10.250.7.225:27018" }
{ "_id" : "shard0001", "host" : "10.250.7.249:27019" }
{ "_id" : "shard0002", "host" : "10.250.7.241:27020" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "shard0000" }
test.momo chunks:
shard0000 30
shard0001 26
shard0002 24
too many chunks to print, use verbose if you want to force print
........省略.......
Noet:对于由于chunks数量过大,而显示“too many chunks to print, use verbose if you want to force print”,可以使用如下方式查看:
printShardingStatus(db.getSisterDB("config"),1);
在admin 数据库操作
mongos> use admin
switched to db admin
mongos> db.runCommand({addshard:"10.250.7.225:27019"})
{ "shardAdded" : "shard0003", "ok" : 1 }
这里添加很短时间就返回结果,但是后台要花一定的时间来做数据 chunk的迁移,从其他shard节点迁移到新的节点上面.
mongos> db.runCommand({ listShards : 1});
{
"shards" : [
{
"_id" : "shard0000",
"host" : "10.250.7.225:27018"
},
{
"_id" : "shard0001",
"host" : "10.250.7.249:27019"
},
{
"_id" : "shard0002",
"host" : "10.250.7.241:27020"
},
{
"_id" : "shard0003",
"host" : "10.250.7.225:27019"
}
],
"ok" : 1
}
过一段时间再看:已经做了数据的平均分布了。
mongos> printShardingStatus(db.getSisterDB("config"),1);
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "shard0000", "host" : "10.250.7.225:27018" }
{ "_id" : "shard0001", "host" : "10.250.7.249:27019" }
{ "_id" : "shard0002", "host" : "10.250.7.241:27020" }
{ "_id" : "shard0003", "host" : "10.250.7.225:27019" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "shard0000" }
test.momo chunks:
shard0003 16
shard0001 21
shard0000 21
shard0002 23
{ "id" : { $minKey : 1 } } -->> { "id" : 0 } on : shard0003 { "t" : 28000, "i" : 0 }
{ "id" : 0 } -->> { "id" : 5236 } on : shard0003 { "t" : 33000, "i" : 0 }
{ "id" : 5236 } -->> { "id" : 11595 } on : shard0003 { "t" : 35000, "i" : 0 }
{ "id" : 11595 } -->> { "id" : 17346 } on : shard0003 { "t" : 37000, "i" : 0 }
{ "id" : 17346 } -->> { "id" : 23191 } on : shard0003 { "t" : 40000, "i" : 0 }
{ "id" : 23191 } -->> { "id" : 31929 } on : shard0003 { "t" : 43000, "i" : 0 }
.....省略部分....
{ "id" : 930108 } -->> { "id" : 948575 } on : shard0002 { "t" : 21000, "i" : 7 }
{ "id" : 948575 } -->> { "id" : 957995 } on : shard0002 { "t" : 27000, "i" : 42 }
{ "id" : 957995 } -->> { "id" : 969212 } on : shard0002 { "t" : 27000, "i" : 43 }
{ "id" : 969212 } -->> { "id" : 983794 } on : shard0002 { "t" : 25000, "i" : 6 }
{ "id" : 983794 } -->> { "id" : 999997 } on : shard0002 { "t" : 25000, "i" : 7 }
{ "id" : 999997 } -->> { "id" : { $maxKey : 1 } } on : shard0002 { "t" : 11000, "i" : 3 }
test.yql chunks:
shard0003 1
shard0000 1
shard0002 1
shard0001 1
{ "_id" : { $minKey : 1 } } -->> { "_id" : ObjectId("4eb298b3adbd9673afee95e3") } on : shard0003 { "t" : 5000, "i" : 0 }
{ "_id" : ObjectId("4eb298b3adbd9673afee95e3") } -->> { "_id" : ObjectId("4eb2a64640643e5bb60072f7") } on : shard0000 { "t" : 4000, "i" : 1 }
{ "_id" : ObjectId("4eb2a64640643e5bb60072f7") } -->> { "_id" : ObjectId("4eb2a65340643e5bb600e084") } on : shard0002 { "t" : 3000, "i" : 1 }
{ "_id" : ObjectId("4eb2a65340643e5bb600e084") } -->> { "_id" : { $maxKey : 1 } } on : shard0001 { "t" : 5000, "i" : 1 }
{ "_id" : "mongos", "partitioned" : false, "primary" : "shard0000" }
附上日志记录:
##启动信息
Sat Nov 5 17:41:23 [initandlisten] MongoDB starting : pid=11807 port=27019 dbpath=/opt/mongodata/r2 64-bit host=rac1
Sat Nov 5 17:41:23 [initandlisten] db version v2.0.1, pdfile version 4.5
Sat Nov 5 17:41:23 [initandlisten] git version: 3a5cf0e2134a830d38d2d1aae7e88cac31bdd684
Sat Nov 5 17:41:23 [initandlisten] build info: Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41
Sat Nov 5 17:41:23 [initandlisten] options: { dbpath: "/opt/mongodata/r2", logappend: true, logpath: "/opt/mongodata/r1/27019.log", port: 27019, shardsvr: true }
Sat Nov 5 17:41:23 [initandlisten] journal dir=/opt/mongodata/r2/journal
Sat Nov 5 17:41:23 [initandlisten] recover : no journal files present, no recovery needed
Sat Nov 5 17:41:23 [initandlisten] waiting for connections on port 27019
Sat Nov 5 17:41:23 [websvr] admin web console waiting for connections on port 28019
###连接其他节点,并复制数据
Sat Nov 5 17:41:53 [initandlisten] connection accepted from 10.250.7.220:46807 #1
Sat Nov 5 17:42:03 [initandlisten] connection accepted from 10.250.7.225:57578 #2
Sat Nov 5 17:42:03 [FileAllocator] allocating new datafile /opt/mongodata/r2/test.ns, filling with zeroes...
Sat Nov 5 17:42:03 [FileAllocator] creating directory /opt/mongodata/r2/_tmp
Sat Nov 5 17:42:03 [FileAllocator] done allocating datafile /opt/mongodata/r2/test.ns, size: 16MB, took 0.1 secs
Sat Nov 5 17:42:03 [FileAllocator] allocating new datafile /opt/mongodata/r2/test.0, filling with zeroes...
Sat Nov 5 17:42:06 [FileAllocator] done allocating datafile /opt/mongodata/r2/test.0, size: 64MB, took 3.143 secs
Sat Nov 5 17:42:06 [migrateThread] build index test.momo { _id: 1 }
Sat Nov 5 17:42:06 [migrateThread] build index done 0 records 0 secs
Sat Nov 5 17:42:06 [migrateThread] info: creating collection test.momo on add index
Sat Nov 5 17:42:06 [migrateThread] build index test.momo { id: 1.0 }
Sat Nov 5 17:42:06 [migrateThread] build index done 0 records 0 secs
Sat Nov 5 17:42:06 [FileAllocator] allocating new datafile /opt/mongodata/r2/test.1, filling with zeroes...
Sat Nov 5 17:42:07 [migrateThread] migrate commit succeeded flushing to secondaries for 'test.momo' { id: MinKey } -> { id: 0.0 }
Sat Nov 5 17:42:07 [migrateThread] migrate commit flushed to journal for 'test.momo' { id: MinKey } -> { id: 0.0 }
Sat Nov 5 17:42:07 [migrateThread] migrate commit succeeded flushing to secondaries for 'test.momo' { id: MinKey } -> { id: 0.0 }
Sat Nov 5 17:42:07 [migrateThread] migrate commit flushed to journal for 'test.momo' { id: MinKey } -> { id: 0.0 }
Sat Nov 5 17:42:07 [migrateThread] about to log metadata event: { _id: "rac1-2011-11-05T09:42:07-0", server: "rac1", clientAddr: "", time: new Date(1320486127651), wh
at: "moveChunk.to", ns: "test.momo", details: { min: { id: MinKey }, max: { id: 0.0 }, step1: 3271, step2: 217, step3: 0, step4: 0, step5: 520 } }
Sat Nov 5 17:42:07 [migrateThread] SyncClusterConnection connecting to [rac1:28001]
Sat Nov 5 17:42:07 [migrateThread] SyncClusterConnection connecting to [rac2:28002]
Sat Nov 5 17:42:07 [migrateThread] SyncClusterConnection connecting to [rac3:28003]
Sat Nov 5 17:42:07 [FileAllocator] done allocating datafile /opt/mongodata/r2/test.1, size: 128MB, took 1.011 secs
Sat Nov 5 17:42:13 [initandlisten] connection accepted from 10.250.7.249:40392 #3
Sat Nov 5 17:42:13 [migrateThread] build index test.yql { _id: 1 }
Sat Nov 5 17:42:13 [migrateThread] build index done 0 records 0.001 secs
Sat Nov 5 17:42:13 [migrateThread] info: creating collection test.yql on add index
Sat Nov 5 17:42:13 [migrateThread] migrate commit succeeded flushing to secondaries for 'test.yql' { _id: MinKey } -> { _id: ObjectId('4eb298b3adbd9673afee95e3') }
Sat Nov 5 17:42:13 [migrateThread] migrate commit flushed to journal for 'test.yql' { _id: MinKey } -> { _id: ObjectId('4eb298b3adbd9673afee95e3') }
Sat Nov 5 17:42:14 [migrateThread] migrate commit succeeded flushing to secondaries for 'test.yql' { _id: MinKey } -> { _id: ObjectId('4eb298b3adbd9673afee95e3') }
Sat Nov 5 17:42:14 [migrateThread] migrate commit flushed to journal for 'test.yql' { _id: MinKey } -> { _id: ObjectId('4eb298b3adbd9673afee95e3') }
Sat Nov 5 17:42:14 [migrateThread] about to log metadata event: { _id: "rac1-2011-11-05T09:42:14-1", server: "rac1", clientAddr: "", time: new Date(1320486134775), wh
at: "moveChunk.to", ns: "test.yql", details: { min: { _id: MinKey }, max: { _id: ObjectId('4eb298b3adbd9673afee95e3') }, step1: 5, step2: 0, step3: 0, step4: 0, step5:
1006 } }
Sat Nov 5 17:42:16 [migrateThread] migrate commit succeeded flushing to secondaries for 'test.momo' { id: 102100 } -> { id: 120602 }
Sat Nov 5 17:42:16 [migrateThread] migrate commit flushed to journal for 'test.momo' { id: 102100 } -> { id: 120602 }
Sat Nov 5 17:42:17 [migrateThread] migrate commit succeeded flushing to secondaries for 'test.momo' { id: 102100 } -> { id: 120602 }
Sat Nov 5 17:42:17 [migrateThread] migrate commit flushed to journal for 'test.momo' { id: 102100 } -> { id: 120602 }
Sat Nov 5 17:42:17 [migrateThread] about to log metadata event: { _id: "rac1-2011-11-05T09:42:17-2", server: "rac1", clientAddr: "", time: new Date(1320486137351), wh
at: "moveChunk.to", ns: "test.momo", details: { min: { id: 102100 }, max: { id: 120602 }, step1: 0, step2: 0, step3: 1573, step4: 0, step5: 479 } }
Sat Nov 5 17:42:20 [conn2] end connection 10.250.7.225:57578
Sat Nov 5 17:42:21 [initandlisten] connection accepted from 10.250.7.220:46814 #4
Sat Nov 5 17:42:21 [conn4] warning: bad serverID set in setShardVersion and none in info: EOO
Sat Nov 5 18:06:47 [initandlisten] connection accepted from 10.250.7.225:13612 #6
Sat Nov 5 18:06:47 [migrateThread] Socket say send() errno:32 Broken pipe 10.250.7.225:27018
Sat Nov 5 18:06:47 [migrateThread] about to log metadata event: { _id: "rac1-2011-11-05T10:06:47-3", server: "rac1", clientAddr: "", time: new Date(1320487607530), wh
at: "moveChunk.to", ns: "test.momo", details: { min: { id: 120602 }, max: { id: 132858 }, note: "aborted" } }
Sat Nov 5 18:06:47 [migrateThread] not logging config change: rac1-2011-11-05T10:06:47-3 SyncClusterConnection::insert prepare failed: 9001 socket exception [2] serve
r [127.0.0.1:28001] rac1:28001:{}
Sat Nov 5 18:07:00 [migrateThread] migrate commit succeeded flushing to secondaries for 'test.momo' { id: 120602 } -> { id: 132858 }
Sat Nov 5 18:07:00 [migrateThread] migrate commit flushed to journal for 'test.momo' { id: 120602 } -> { id: 132858 }
Sat Nov 5 18:07:01 [migrateThread] migrate commit succeeded flushing to secondaries for 'test.momo' { id: 120602 } -> { id: 132858 }
Sat Nov 5 18:07:01 [migrateThread] migrate commit flushed to journal for 'test.momo' { id: 120602 } -> { id: 132858 }
Sat Nov 5 18:07:01 [migrateThread] about to log metadata event: { _id: "rac1-2011-11-05T10:07:01-4", server: "rac1", clientAddr: "", time: new Date(1320487621150), wh
at: "moveChunk.to", ns: "test.momo", details: { min: { id: 120602 }, max: { id: 132858 }, step1: 0, step2: 0, step3: 1121, step4: 0, step5: 886 } }
Sat Nov 5 18:07:01 [migrateThread] SyncClusterConnection connecting to [rac1:28001]
Sat Nov 5 18:07:01 [migrateThread] SyncClusterConnection connecting to [rac2:28002]
Sat Nov 5 18:07:01 [migrateThread] SyncClusterConnection connecting to [rac3:28003]
Sat Nov 5 18:07:17 [migrateThread] migrate commit flushed to journal for 'test.momo' { id: 142178 } -> { id: 154425 }
Sat Nov 5 18:07:18 [migrateThread] migrate commit succeeded flushing to secondaries for 'test.momo' { id: 142178 } -> { id: 154425 }
Sat Nov 5 18:07:18 [migrateThread] migrate commit flushed to journal for 'test.momo' { id: 142178 } -> { id: 154425 }
Sat Nov 5 18:07:18 [migrateThread] about to log metadata event: { _id: "rac1-2011-11-05T10:07:18-6", server: "rac1", clientAddr: "", time: new Date(1320487638676), wh
at: "moveChunk.to", ns: "test.momo", details: { min: { id: 142178 }, max: { id: 154425 }, step1: 0, step2: 0, step3: 1108, step4: 0, step5: 940 } }
.....省略部分.....
Sat Nov 5 18:09:23 [clientcursormon] mem (MB) res:55 virt:413 mapped:80
Sat Nov 5 18:12:21 [conn1] command admin.$cmd command: { writebacklisten: ObjectId('4eb4e43618ed672581e26201') } ntoreturn:1 reslen:44 300012ms
Sat Nov 5 18:14:24 [clientcursormon] mem (MB) res:55 virt:413 mapped:80
Sat Nov 5 18:17:21 [conn1] command admin.$cmd command: { writebacklisten: ObjectId('4eb4e43618ed672581e26201') } ntoreturn:1 reslen:44 300012ms
Sat Nov 5 18:19:24 [clientcursormon] mem (MB) res:55 virt:413 mapped:80
二 删除节点
集群对于删除节点,也会将被删除节点上的数据迁移到其他的节点上面。
db.runCommand({ listShards : 1});
mongos> db.runCommand({removeshard:"10.250.7.225:27018"})
{
"msg" : "draining started successfully",
"state" : "started",
"shard" : "shard0000",
"ok" : 1
}
mongos> db.runCommand({ listShards : 1});
{
"shards" : [
{
"_id" : "shard0001",
"host" : "10.250.7.249:27019"
},
{
"_id" : "shard0002",
"host" : "10.250.7.241:27020"
},
{
"_id" : "shard0003",
"host" : "10.250.7.225:27019"
},
{
"_id" : "shard0000",
"draining" : true, --正在迁移数据
"host" : "10.250.7.225:27018"
}
],
"ok" : 1
}
mongos>
删除之后:
mongos> db.printShardingStatus()
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "shard0000", "draining" : true, "host" : "10.250.7.225:27018" }
{ "_id" : "shard0001", "host" : "10.250.7.249:27019" }
{ "_id" : "shard0002", "host" : "10.250.7.241:27020" }
{ "_id" : "shard0003", "host" : "10.250.7.225:27019" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "shard0000" }
test.momo chunks:
shard0003 27
shard0001 28
shard0002 27
too many chunks to print, use verbose if you want to force print
test.yql chunks:
shard0003 1
shard0001 2
shard0002 1
{ "_id" : { $minKey : 1 } } -->> { "_id" : ObjectId("4eb298b3adbd9673afee95e3") } on : shard0003 { "t" : 5000, "i" : 0 }
{ "_id" : ObjectId("4eb298b3adbd9673afee95e3") } -->> { "_id" : ObjectId("4eb2a64640643e5bb60072f7") } on : shard0001 { "t" : 6000, "i" : 0 }
{ "_id" : ObjectId("4eb2a64640643e5bb60072f7") } -->> { "_id" : ObjectId("4eb2a65340643e5bb600e084") } on : shard0002 { "t" : 3000, "i" : 1 }
{ "_id" : ObjectId("4eb2a65340643e5bb600e084") } -->> { "_id" : { $maxKey : 1 } } on : shard0001 { "t" : 5000, "i" : 1 }
{ "_id" : "mongos", "partitioned" : false, "primary" : "shard0000" }
mongos>
附上相关日志:
##Balancer 会将被去除节点上的数据拷贝的其他的节点上。
Sat Nov 5 19:09:29 [Balancer] chose [shard0000] to [shard0001] { _id: "test.yql-_id_ObjectId('4eb298b3adbd9673afee95e3')", lastmod: Timestamp 4000|1, ns: "test.yql", min: { _id: ObjectId('4eb298b3adbd9673afee95e3') }, max: { _id: ObjectId('4eb2a64640643e5bb60072f7') }, shard: "shard0000" }
Sat Nov 5 19:09:29 [Balancer] chose [shard0000] to [shard0003] { _id: "test.momo-id_212402", lastmod: Timestamp 42000|1, ns: "test.momo", min: { id: 212402 }, max: { id: 236820 }, shard: "shard0000" }
Sat Nov 5 19:09:29 [Balancer] moving chunk ns: test.yql moving ( ns:test.yql at: shard0000:10.250.7.225:27018 lastmod: 4|1 min: { _id: ObjectId('4eb298b3adbd9673afee95e3') } max: { _id: ObjectId('4eb2a64640643e5bb60072f7') }) shard0000:10.250.7.225:27018 -> shard0001:10.250.7.249:27019
Sat Nov 5 19:09:33 [Balancer] created new distributed lock for test.yql on rac1:28001,rac2:28002,rac3:28003 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Sat Nov 5 19:09:33 [Balancer] ChunkManager: time to load chunks for test.yql: 0ms sequenceNumber: 114 version: 6|0
Sat Nov 5 19:09:33 [Balancer] moving chunk ns: test.momo moving ( ns:test.momo at: shard0000:10.250.7.225:27018 lastmod: 42|1 min: { id: 212402 } max: { id: 236820 }) shard0000:10.250.7.225:27018 -> shard0003:10.250.7.225:27019
Sat Nov 5 19:09:34 [Balancer] moveChunk result: { chunkTooBig: true, estimatedChunkSize: 1462920, errmsg: "chunk too big to move", ok: 0.0 }
Sat Nov 5 19:09:34 [Balancer] balancer move failed: { chunkTooBig: true, estimatedChunkSize: 1462920, errmsg: "chunk too big to move", ok: 0.0 } from: shard0000 to: shard0003 chunk: { _id: "test.momo-id_212402", lastmod: Timestamp 42000|1, ns: "test.momo", min: { id: 212402 }, max: { id: 236820 }, shard: "shard0000" }
Sat Nov 5 19:09:34 [Balancer] forcing a split because migrate failed for size reasons
Sat Nov 5 19:09:34 [Balancer] created new distributed lock for test.momo on rac1:28001,rac2:28002,rac3:28003 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Sat Nov 5 19:09:34 [Balancer] ChunkManager: time to load chunks for test.momo: 1ms sequenceNumber: 115 version: 43|5
Sat Nov 5 19:09:34 [Balancer] forced split results: { ok: 1.0 }
Sat Nov 5 19:09:34 [Balancer] distributed lock 'balancer/rac4:27017:1320477786:1804289383' unlocked.
Sat Nov 5 19:09:39 [Balancer] distributed lock 'balancer/rac4:27017:1320477786:1804289383' acquired, ts : 4eb5197318ed672581e267a7
Sat Nov 5 19:09:39 [Balancer] chose [shard0002] to [shard0003] { _id: "test.momo-id_682899", lastmod: Timestamp 43000|2, ns: "test.momo", min: { id: 682899 }, max: { id: 697740 }, shard: "shard0002" }
Sat Nov 5 19:09:39 [Balancer] moving chunk ns: test.momo moving ( ns:test.momo at: shard0002:10.250.7.241:27020 lastmod: 43|2 min: { id: 682899 } max: { id: 697740 }) shard0002:10.250.7.241:27020 -> shard0003:10.250.7.225:27019
Sat Nov 5 19:09:43 [Balancer] created new distributed lock for test.momo on rac1:28001,rac2:28002,rac3:28003 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Sat Nov 5 19:09:43 [Balancer] ChunkManager: time to load chunks for test.momo: 1ms sequenceNumber: 116 version: 44|1
Sat Nov 5 19:09:43 [Balancer] distributed lock 'balancer/rac4:27017:1320477786:1804289383' unlocked.
Sat Nov 5 19:09:48 [Balancer] distributed lock 'balancer/rac4:27017:1320477786:1804289383' acquired, ts : 4eb5197c18ed672581e267a8
Sat Nov 5 19:09:48 [Balancer] chose [shard0000] to [shard0003] { _id: "test.momo-id_212402", lastmod: Timestamp 43000|4, ns: "test.momo", min: { id: 212402 }, max: { id: 224692 }, shard: "shard0000" }
Sat Nov 5 19:09:48 [Balancer] moving chunk ns: test.momo moving ( ns:test.momo at: shard0000:10.250.7.225:27018 lastmod: 43|4 min: { id: 212402 } max: { id: 224692 }) shard0000:10.250.7.225:27018 -> shard0003:10.250.7.225:27019