MHA就提供了这样一种优雅的方式,只会堵塞业务0.5~2s的时间,在这段时间内,业务无法读取和写入。
集群信息
角色 IP地址 ServerID 类型
Master 192.168.244.10 1 写入
Candicate master 192.168.244.20 2 读
Slave 192.168.244.30 3 读
Monitor host 192.168.244.40 监控集群组
MHA具体的搭建步骤和原理,可参考另外一篇博客:
MySQL高可用方案MHA的部署和原理
在线切换的步骤
1. 关闭MHA监控
# masterha_stop --conf=/etc/masterha/app1.cnf
2. 在线切换
# /usr/local/bin/masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.244.20 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
其中,
--orig_master_is_new_slave是将原master切换为新主的slave,默认情况下,是不添加的。
--running_updates_limit默认为1s,即如果主从延迟时间(Seconds_Behind_Master),或master show processlist中dml操作大于1s,则不会执行切换。
在线切换的输出
Tue Apr :: - [] MHA::MasterRotate version :: - [ :: - [ :: - [] * Phase :: - [ :: - [warning] Global configuration /etc/ :: - [] Reading application default configuration from /etc/masterha/ :: - [] Reading server configuration from /etc/masterha/ :: - [] GTID failover mode = :: - [] Current Alive Master: .(.: :: - [ :: - [] .(.:) Version=.--bin:enabledTue Apr :: - [] Replicating from .(.: :: - [] Primary candidate :: - [] .(.:) Version=.--bin:enabledTue Apr :: - [] Replicating from .(.:.(:)? (YES/no): yes Tue Apr :: - [] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take :: - [ :: - [ :: - [] Checking replication health on . :: - [ :: - [] Checking replication health on . :: - [ :: - [] . :: - [.(.:+--.(.:+--.(.:.(.:+--.(.:+--.(.:.(.:) to .(.:)? (yes/ :: - [] Checking whether .(.:) is ok :: - [ :: - [] .(.: :: - [] .(.: :: - [] ** Phase :: - [ :: - [] * Phase :: - [ :: - [] Executing master ip online change script to disable :: - [] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=. --=. --orig_master_port= --orig_master_user= --orig_master_password= --new_master_host=. --new_master_ip=. --new_master_port= --new_master_user= --new_master_password= --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slaveTue Apr :: :: Set read_only= :: :: . :: - [ :: - [ :: - [ :: - [ :: - [] Orig master binlog:pos is mysql-bin.: :: - [] Waiting to execute all relay logs on .(.: :: - [] master_pos_wait(mysql-bin.:) completed on .(.:). Executed :: - [] :: - [] Getting new masterTue Apr :: - [] mysql-bin.: :: - [=, MASTER_PORT=, MASTER_LOG_FILE=, MASTER_LOG_POS=, MASTER_USER=, MASTER_PASSWORD=;Tue Apr :: - [] Executing master ip online change script to allow :: - [] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=. --=. --orig_master_port= --orig_master_user= --orig_master_password= --new_master_host=. --new_master_ip=. --new_master_port= --new_master_user= --new_master_password= --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slaveTue Apr :: Set read_only=. on the new master: . :: - [ :: - [ :: - [] * Switching slaves :: - [ :: - [] -- Slave switch on host .(.:) started, pid: :: - [ :: - [] Log messages from . :: - [ :: - [] Waiting to execute all relay logs on .(.: :: - [] master_pos_wait(mysql-bin.:) completed on .(.:). Executed :: - [] :: - [] Resetting slave .(.:) and starting replication from the new master .(.:)..Tue Apr :: - [ :: - [ :: - [] End of log messages from . :: - [ :: - [] -- Slave switch on host .(.: :: - [ :: - [ :: - [ :: - [ :: - [] Resetting slave .(.:) and starting replication from the new master .(.:)..Tue Apr :: - [ :: - [ :: - [ :: - [ :: - [] * Phase :: - [ :: - [] .: Resetting slave :: - [] Switching master to .(.:) completed successfully.
MHA在线切换的原理
1. 检查当前的配置信息及主从服务器的信息
包括读取MHA的配置文件/etc/masterha/app1.cnf及检查当前slave的健康状态
2. 阻止对当前master的更新
主要通过如下步骤:
1> 等待1.5s($time_until_kill_threads*100ms),等待当前连接断开。
2> 执行 read_only=1,阻止新的DML操作
3> 等待0.5s,等待当前DML操作完成。
4> kill掉所有连接。
5> FLUSH NO_WRITE_TO_BINLOG TABLES
6> FLUSH TABLES WITH READ LOCK
3. 等待新master执行完所有的relay log
Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306)..
4. 将新master的read_only设置为off,并添加VIP
5. slave切换到新master上。
1> 等待slave(192.168.244.30)应用完原主从复制产生的relay log,然后执行change master操作切换到新master上。
2> 释放原master上加的锁。
3> 因masterha_master_switch命令行中带有--orig_master_is_new_slave参数,故原master也切换为新master的从。
6. 清理新master的相关信息。
主要是执行了reset slave all操作,清除之前的复制信息。
MHA在线切换需满足的条件
MHA在执行在线切换之前,会判断当前的主从复制信息,只有满足了以下条件,才能执行切换动作:
1. 所有SLAVE的IO线程和SQL线程都在运行。
2. 所有slave的Seconds_Behind_Master小于或等于running_updates_limit的值,该参数如果没有显示指定的话,则默认为1s
3. 在master上,通过show processlist输出,没有一个DML操作的时间大于running_updates_limit的值。
在线切换时,打开general log,各个服务器的操作信息
注:在执行masterha_master_switch命令时,会有两次确认操作
1. It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.244.10(192.168
.244.10:3306)? (YES/no):
2. Starting master switch from 192.168.244.10(192.168.244.10:3306) to 192.168.244.20(192.168.244.20:3306)? (yes/NO):
以下输出中间都有两次空白,其中第一次空白之前的输出对应第一次确认之前,第二次之前的输出对应第二次确认之前。
原master 192.168.244.10
170412 16:52:38 23 Connect monitor@node4 on 23 Query set autocommit=1 23 Query SELECT CONNECTION_ID() AS Value170412 16:52:39 24 Connect monitor@node4 on 24 Query set autocommit=1 24 Query SELECT CONNECTION_ID() AS Value 24 Query SET wait_timeout=86400 24 Query SELECT @@global.server_id As Value 24 Query SELECT VERSION() AS Value 24 Query SELECT @@global.gtid_mode As Value 24 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 24 Query SHOW MASTER STATUS 24 Query SELECT @@global.datadir AS Value 24 Query SELECT @@global.slave_parallel_workers AS Value 24 Query SHOW SLAVE STATUS 24 Query SELECT @@global.read_only As Value 24 Query SELECT @@global.relay_log_purge As Value 170412 16:54:06 24 Query FLUSH NO_WRITE_TO_BINLOG TABLES 24 Query SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '0') AS Value 24 Query SHOW PROCESSLIST 170412 16:55:51 24 Query SHOW SLAVE STATUS 24 Query CHANGE MASTER TO MASTER_HOST='dummy_host'170412 16:55:52 24 Query SHOW SLAVE STATUS 24 Query RESET SLAVE /*!50516 ALL */ 24 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Monitor') As Value 24 Quit 25 Connect monitor@node4 on 25 Query set autocommit=1 25 Query SELECT CONNECTION_ID() AS Value 25 Query SET sql_log_bin=0 25 Query SHOW PROCESSLIST 25 Query SELECT @@global.read_only As Value 25 Query SET GLOBAL read_only=1 25 Query SELECT @@global.read_only As Value 25 Query SHOW PROCESSLIST 25 Query SET sql_log_bin=1 25 Quit 26 Connect monitor@node4 on 26 Query set autocommit=1 26 Query SELECT CONNECTION_ID() AS Value 26 Query SET wait_timeout=86400 26 Query FLUSH TABLES WITH READ LOCK 26 Query SHOW MASTER STATUS170412 16:55:53 26 Query UNLOCK TABLES 26 Query CHANGE MASTER TO MASTER_HOST = '192.168.244.20' MASTER_USER = 'repl' MASTER_PASSWORD = <secret> MASTE R_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = 120 26 Query SET GLOBAL relay_log_purge=0 26 Query START SLAVE 27 Connect Out repl@192.168.244.20:3306 26 Query SHOW SLAVE STATUS 26 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value 26 Quit
新master 192.168.244.20
170412 16:52:38 23 Connect monitor@node4 on 23 Query set autocommit=1 23 Query SELECT CONNECTION_ID() AS Value170412 16:52:39 24 Connect monitor@node4 on 24 Query set autocommit=1 24 Query SELECT CONNECTION_ID() AS Value 24 Query SET wait_timeout=86400 24 Query SELECT @@global.server_id As Value 24 Query SELECT VERSION() AS Value 24 Query SELECT @@global.gtid_mode As Value 24 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 24 Query SHOW MASTER STATUS 24 Query SELECT @@global.datadir AS Value 24 Query SELECT @@global.slave_parallel_workers AS Value 24 Query SHOW SLAVE STATUS 24 Query SELECT @@global.read_only As Value 24 Query SELECT @@global.relay_log_purge As Value 24 Query SELECT @@global.relay_log_info_repository AS Value 24 Query SELECT @@global.datadir AS Value 24 Query SELECT @@global.relay_log_info_file AS Value 24 Query SHOW SLAVE STATUS 24 Query SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl' 170412 16:54:06 24 Query SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value 24 Query SHOW SLAVE STATUS 24 Query SHOW SLAVE STATUS 170412 16:55:52 24 Query SHOW PROCESSLIST 25 Connect monitor@node4 on 25 Query set autocommit=1 25 Query SELECT CONNECTION_ID() AS Value 25 Query SELECT @@global.read_only As Value 25 Query SELECT @@global.read_only As Value 25 Quit 24 Query SHOW SLAVE STATUS 24 Query SELECT MASTER_POS_WAIT('mysql-bin.000017','120',0) AS Result 24 Query STOP SLAVE SQL_THREAD 24 Query SHOW SLAVE STATUS 24 Query SHOW MASTER STATUS 26 Connect monitor@node4 on 26 Query set autocommit=1 26 Query SELECT CONNECTION_ID() AS Value 26 Query SET sql_log_bin=0 26 Query SELECT @@global.read_only As Value 26 Query SET GLOBAL read_only=0 26 Query SET sql_log_bin=1 26 Quit 24 Query SELECT @@global.read_only As Value 27 Connect repl@node3 on 27 Query SELECT UNIX_TIMESTAMP() 27 Query SHOW VARIABLES LIKE 'SERVER_ID' 27 Query SET @master_heartbeat_period= 1799999979520 27 Query SET @master_binlog_checksum= @@global.binlog_checksum 27 Query SELECT @master_binlog_checksum 27 Query SELECT @@GLOBAL.GTID_MODE 27 Query SHOW VARIABLES LIKE 'SERVER_UUID' 27 Query SET @slave_uuid= '8a1093c8-1d00-11e7-954f-000c299a5715' 27 Binlog Dump Log: 'mysql-bin.000010' Pos: 120170412 16:55:53 28 Connect repl@node1 on 28 Query SELECT UNIX_TIMESTAMP() 28 Query SHOW VARIABLES LIKE 'SERVER_ID' 28 Query SET @master_heartbeat_period= 1799999979520 28 Query SET @master_binlog_checksum= @@global.binlog_checksum 28 Query SELECT @master_binlog_checksum 28 Query SELECT @@GLOBAL.GTID_MODE 28 Query SHOW VARIABLES LIKE 'SERVER_UUID' 24 Query STOP SLAVE 28 Query SET @slave_uuid= '2a6365e0-1d05-11e7-956d-000c29c64704' 28 Binlog Dump Log: 'mysql-bin.000010' Pos: 120 24 Query SHOW SLAVE STATUS 24 Query RESET SLAVE /*!50516 ALL */ 24 Query SHOW SLAVE STATUS 24 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value 24 Quit
slave 192.168.244.30
170412 16:52:37 16 Connect monitor@node4 on 16 Query set autocommit=1 16 Query SELECT CONNECTION_ID() AS Value170412 16:52:38 17 Connect monitor@node4 on 17 Query set autocommit=1 17 Query SELECT CONNECTION_ID() AS Value 17 Query SET wait_timeout=86400 17 Query SELECT @@global.server_id As Value 17 Query SELECT VERSION() AS Value 17 Query SELECT @@global.gtid_mode As Value 17 Query SHOW GLOBAL VARIABLES LIKE 'log_bin' 17 Query SHOW MASTER STATUS 17 Query SELECT @@global.datadir AS Value 17 Query SELECT @@global.slave_parallel_workers AS Value 17 Query SHOW SLAVE STATUS 17 Query SELECT @@global.read_only As Value 17 Query SELECT @@global.relay_log_purge As Value 17 Query SELECT @@global.relay_log_info_repository AS Value 17 Query SELECT @@global.datadir AS Value 17 Query SELECT @@global.relay_log_info_file AS Value 17 Query SHOW SLAVE STATUS 17 Query SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl' 170412 16:54:05 17 Query SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value 17 Query SHOW SLAVE STATUS 17 Query SHOW SLAVE STATUS 170412 16:55:50 17 Query SHOW SLAVE STATUS170412 16:55:51 17 Query SHOW SLAVE STATUS 17 Query SELECT MASTER_POS_WAIT('mysql-bin.000017','120',0) AS Result 17 Query STOP SLAVE SQL_THREAD 17 Query SHOW SLAVE STATUS 17 Query STOP SLAVE 17 Query STOP SLAVE 17 Query SHOW SLAVE STATUS 17 Query RESET SLAVE 17 Query CHANGE MASTER TO MASTER_HOST = '192.168.244.20' MASTER_USER = 'repl' MASTER_PASSWORD = <secret> MASTE R_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = 120 17 Query SET GLOBAL relay_log_purge=0 17 Query START SLAVE 18 Connect Out repl@192.168.244.20:3306 17 Query SHOW SLAVE STATUS170412 16:55:52 17 Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value 17 Quit