MHA를 이용한 MariaDB(MySQL) Replication Auto failover

MHA 개요

Mariadb 혹은 MySQL 리플리케이션을 운영하다 보면 Master 장애 복구시 상당히 번거로운 부분이 있다.
MHA는 최소한의 Down Time으로 Master를 장애 조치하고 Slave를 새로운 Master로 자동승격 하도록 하는 auto failover 솔루션이다.

MHA 아키텍쳐

장애 발생 전

장애 발생 전 MHA는 replication을 감시함.

장애 발생

장애 발생시 Slave 한대를 Master로 승격시키고 나머지 한대의 Slave를 승격된 Master 서버로 CHANGE MASTER 해 준다.

테스트 환경

한대의 master와 두대의 slave로 mariadb replication 환경을 구성하고 IP정보는 다음과 같다.
MHA의 설치는 간단하다. master, slave1, slave2 에는 mha4mysql-node를 manager 서버에는 mha4mysql-manager, mha4mysql-node를 설치 해준다.

  • master : 172.16.254.110
  • slave1 : 172.16.254.111
  • slave2 : 172.16.254.112
  • MHA manager : 172.16.254.113

MHA 설치 – download

본 내용에서는 MHA source code를 이용해서 설치를 하려고 한다. 다음 링크에서 tarball 소스를 다운 받는다.
https://code.google.com/archive/p/mysql-master-ha/downloads

MHA 설치 – 의존성 패키지 설치

  • 각 replication 노드들에서 실행
yum -y install perl-CPAN perl-DBD-MySQL perl-Module-Install
  • manager 서버에서 실행
yum -y install perl-CPAN perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Module-Install

MHA 설치 – 소스코드 컴파일

perl 을 이용해서 컴파일 해주고 설치하면 된다.

  • 각 replication 노드들에서 실행
    노드 설치
tar xvzf mha4mysql-node-0.56.tar.gz.tar.gz
cd mha4mysql-node-0.56.tar.gz
perl Makefile.PL
make
make install
  • manager 서버에서 실행
    노드 설치
tar xvzf mha4mysql-node-0.56.tar.gz.tar.gz
cd mha4mysql-node-0.56.tar.gz
perl Makefile.PL
make
make install

mha-manager 설치

tar xvzf mha4mysql-manager-0.56.tar.gz
cd mha4mysql-manager-0.56
perl Makefile.PL
make
make install

MHA 설치 – MHA 접속계정 생성

MariaDB [(none)]> GRANT ALL PRIVILEGES ON *.* TO 'mhauser'@'172.16.254.%' IDENTIFIED BY '12345';
MariaDB [(none)]> FLUSH PRIVILEGES;

MHA 설치 – 설정파일 생성

manager 서버에 설정파일을 생성한다. 설정 파일에 대한 자세한 설명은 다음 링크를 참고 한다.
https://code.google.com/p/mysql-master-ha/wiki/Configuration

vi /etc/app1.cnf
...
[server default]
# mysql user and password
user=mhauser
password=12345
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# manager log file
manager_log=/var/log/masterha/app1/app1.log
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
[server1]
hostname=172.16.254.110
master_binlog_dir=/usr/local/mariadb/data
candidate_master=1
[server2]
hostname=172.16.254.111
master_binlog_dir=/usr/local/mariadb/data
candidate_master=1
[server3]
hostname=172.16.254.112
master_binlog_dir=/usr/local/mariadb/data
no_master=1

설정파일 Parameter에 대한 내용은 다음 링크에서 자세히 볼 수 있다.
https://code.google.com/p/mysql-master-ha/wiki/Parameters

MHA 설치 – SSH 접속 설정

MHA는 내부적으로 ssh를 통해서 각 노드들에 연결하고 scp를 통해 릴레이 로그들을 전송한다. 이런 절차들을 자동화 하기 위해 각 노드간 ssh 비밀번호 인증없이 접속 할 수 있게 해준다.
자세한 내용은 (https://opentutorials.org/module/432/3742)을 참고한다.

각 노드 및 manager 서버에서 실행

ssh-keyget -t rsa
엔터 3번

각 서버들에서 생성한 키를 ssh-copy-id 명령을 이용해 다른 서버들에 복사 한다.(모든 서버에서 실행함)
예를 들어 master 서버에서는 slave1, slave2, manager 서버에 복사, manager 서버에서는 master, slave1, slave2 서버에 복사한다.

ssh-copy-id root@172.16.254.111
ssh-copy-id root@172.16.254.112
ssh-copy-id root@172.16.254.113

ssh 인증 설정을 테스트
masterha_check_ssh명령을 이용해 노드 상호간에 ssh 인증 설정을 체크 할 수 있다.

masterha_check_ssh --conf=/etc/app1.cnf
Tue Nov 29 15:31:58 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 29 15:31:58 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Tue Nov 29 15:31:58 2016 - [info] Reading server configuration from /etc/app1.cnf..
Tue Nov 29 15:31:58 2016 - [info] Starting SSH connection tests..
Tue Nov 29 15:32:00 2016 - [debug]
Tue Nov 29 15:31:58 2016 - [debug]  Connecting via SSH from root@172.16.254.110(172.16.254.110:22) to root@172.16.254.111(172.16.254.111:22)..
Tue Nov 29 15:31:59 2016 - [debug]   ok.
Tue Nov 29 15:31:59 2016 - [debug]  Connecting via SSH from root@172.16.254.110(172.16.254.110:22) to root@172.16.254.112(172.16.254.112:22)..
Tue Nov 29 15:32:00 2016 - [debug]   ok.
Tue Nov 29 15:32:01 2016 - [debug]
Tue Nov 29 15:31:59 2016 - [debug]  Connecting via SSH from root@172.16.254.111(172.16.254.111:22) to root@172.16.254.110(172.16.254.110:22)..
Tue Nov 29 15:32:00 2016 - [debug]   ok.
Tue Nov 29 15:32:00 2016 - [debug]  Connecting via SSH from root@172.16.254.111(172.16.254.111:22) to root@172.16.254.112(172.16.254.112:22)..
Tue Nov 29 15:32:01 2016 - [debug]   ok.
Tue Nov 29 15:32:01 2016 - [debug]
Tue Nov 29 15:31:59 2016 - [debug]  Connecting via SSH from root@172.16.254.112(172.16.254.112:22) to root@172.16.254.110(172.16.254.110:22)..
Tue Nov 29 15:32:00 2016 - [debug]   ok.
Tue Nov 29 15:32:00 2016 - [debug]  Connecting via SSH from root@172.16.254.112(172.16.254.112:22) to root@172.16.254.111(172.16.254.111:22)..
Tue Nov 29 15:32:01 2016 - [debug]   ok.
Tue Nov 29 15:32:01 2016 - [info] All SSH connection tests passed successfully.

MHA 설치 – replication 체크

masterha_check_repl 명령을 이용해 replicaiton 을 체크 해 볼 수 있다.

masterha_check_repl --conf=/etc/app1.cnf
Tue Nov 29 15:34:25 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 29 15:34:25 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Tue Nov 29 15:34:25 2016 - [info] Reading server configuration from /etc/app1.cnf..
Tue Nov 29 15:34:25 2016 - [info] MHA::MasterMonitor version 0.56.
Tue Nov 29 15:34:25 2016 - [info] GTID failover mode = 0
Tue Nov 29 15:34:25 2016 - [info] Dead Servers:
Tue Nov 29 15:34:25 2016 - [info] Alive Servers:
Tue Nov 29 15:34:25 2016 - [info]   172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:34:25 2016 - [info]   172.16.254.111(172.16.254.111:3306)
Tue Nov 29 15:34:25 2016 - [info]   172.16.254.112(172.16.254.112:3306)
Tue Nov 29 15:34:25 2016 - [info] Alive Slaves:
Tue Nov 29 15:34:25 2016 - [info]   172.16.254.111(172.16.254.111:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:34:25 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:34:25 2016 - [info]   172.16.254.112(172.16.254.112:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:34:25 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:34:25 2016 - [info] Current Alive Master: 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:34:25 2016 - [info] Checking slave configurations..
Tue Nov 29 15:34:25 2016 - [info]  read_only=1 is not set on slave 172.16.254.111(172.16.254.111:3306).
Tue Nov 29 15:34:25 2016 - [info] Checking replication filtering settings..
Tue Nov 29 15:34:25 2016 - [info]  binlog_do_db= , binlog_ignore_db=
Tue Nov 29 15:34:25 2016 - [info]  Replication filtering check ok.
Tue Nov 29 15:34:25 2016 - [info] GTID (with auto-pos) is not supported
Tue Nov 29 15:34:25 2016 - [info] Starting SSH connection tests..
Tue Nov 29 15:34:28 2016 - [info] All SSH connection tests passed successfully.
Tue Nov 29 15:34:28 2016 - [info] Checking MHA Node version..
Tue Nov 29 15:34:29 2016 - [info]  Version check ok.
Tue Nov 29 15:34:29 2016 - [info] Checking SSH publickey authentication settings on the current master..
Tue Nov 29 15:34:30 2016 - [info] HealthCheck: SSH to 172.16.254.110 is reachable.
Tue Nov 29 15:34:30 2016 - [info] Master MHA Node version is 0.56.
Tue Nov 29 15:34:30 2016 - [info] Checking recovery script configurations on 172.16.254.110(172.16.254.110:3306)..
Tue Nov 29 15:34:30 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mariadb/data --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000009
Tue Nov 29 15:34:30 2016 - [info]   Connecting to root@172.16.254.110(172.16.254.110:22)..
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /usr/local/mariadb/data, up to mysql-bin.000009
Tue Nov 29 15:34:31 2016 - [info] Binlog setting check done.
Tue Nov 29 15:34:31 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Nov 29 15:34:31 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=172.16.254.111 --slave_ip=172.16.254.111 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=10.1.19-MariaDB --manager_version=0.56 --relay_log_info=/usr/local/mariadb/data/relay-log.info  --relay_dir=/usr/local/mariadb/data/  --slave_pass=xxx
Tue Nov 29 15:34:31 2016 - [info]   Connecting to root@172.16.254.111(172.16.254.111:22)..
  Checking slave recovery environment settings..
    Opening /usr/local/mariadb/data/relay-log.info ... ok.
    Relay log found at /usr/local/mariadb/data, up to mariadb_S1-relay-bin.000005
    Temporary relay log file is /usr/local/mariadb/data/mariadb_S1-relay-bin.000005
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Nov 29 15:34:31 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=172.16.254.112 --slave_ip=172.16.254.112 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=10.1.19-MariaDB --manager_version=0.56 --relay_log_info=/usr/local/mariadb/data/relay-log.info  --relay_dir=/usr/local/mariadb/data/  --slave_pass=xxx
Tue Nov 29 15:34:31 2016 - [info]   Connecting to root@172.16.254.112(172.16.254.112:22)..
  Checking slave recovery environment settings..
    Opening /usr/local/mariadb/data/relay-log.info ... ok.
    Relay log found at /usr/local/mariadb/data, up to mariadb_S2-relay-bin.000005
    Temporary relay log file is /usr/local/mariadb/data/mariadb_S2-relay-bin.000005
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Nov 29 15:34:32 2016 - [info] Slaves settings check done.
Tue Nov 29 15:34:32 2016 - [info]
172.16.254.110(172.16.254.110:3306) (current master)
 +--172.16.254.111(172.16.254.111:3306)
 +--172.16.254.112(172.16.254.112:3306)
Tue Nov 29 15:34:32 2016 - [info] Checking replication health on 172.16.254.111..
Tue Nov 29 15:34:32 2016 - [info]  ok.
Tue Nov 29 15:34:32 2016 - [info] Checking replication health on 172.16.254.112..
Tue Nov 29 15:34:32 2016 - [info]  ok.
Tue Nov 29 15:34:32 2016 - [warning] master_ip_failover_script is not defined.
Tue Nov 29 15:34:32 2016 - [warning] shutdown_script is not defined.
Tue Nov 29 15:34:32 2016 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.

MHA 설치 – 실제 failover 테스트

masterha_manager 명령을 이용해 manager 를 구동한다.

masterha_manager --conf=/etc/app1.cnf

구동시 manager 로그

tail -f /var/log/masterha/app1/app1.log
Tue Nov 29 15:39:27 2016 - [info] MHA::MasterMonitor version 0.56.
Tue Nov 29 15:39:28 2016 - [info] GTID failover mode = 0
Tue Nov 29 15:39:28 2016 - [info] Dead Servers:
Tue Nov 29 15:39:28 2016 - [info] Alive Servers:
Tue Nov 29 15:39:28 2016 - [info]   172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:39:28 2016 - [info]   172.16.254.111(172.16.254.111:3306)
Tue Nov 29 15:39:28 2016 - [info]   172.16.254.112(172.16.254.112:3306)
Tue Nov 29 15:39:28 2016 - [info] Alive Slaves:
Tue Nov 29 15:39:28 2016 - [info]   172.16.254.111(172.16.254.111:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:39:28 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:39:28 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 29 15:39:28 2016 - [info]   172.16.254.112(172.16.254.112:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:39:28 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:39:28 2016 - [info]     Not candidate for the new Master (no_master is set)
Tue Nov 29 15:39:28 2016 - [info] Current Alive Master: 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:39:28 2016 - [info] Checking slave configurations..
Tue Nov 29 15:39:28 2016 - [info]  read_only=1 is not set on slave 172.16.254.111(172.16.254.111:3306).
Tue Nov 29 15:39:28 2016 - [info] Checking replication filtering settings..
Tue Nov 29 15:39:28 2016 - [info]  binlog_do_db= , binlog_ignore_db=
Tue Nov 29 15:39:28 2016 - [info]  Replication filtering check ok.
Tue Nov 29 15:39:28 2016 - [info] GTID (with auto-pos) is not supported
Tue Nov 29 15:39:28 2016 - [info] Starting SSH connection tests..
Tue Nov 29 15:39:31 2016 - [info] All SSH connection tests passed successfully.
Tue Nov 29 15:39:31 2016 - [info] Checking MHA Node version..
Tue Nov 29 15:39:32 2016 - [info]  Version check ok.
Tue Nov 29 15:39:32 2016 - [info] Checking SSH publickey authentication settings on the current master..
Tue Nov 29 15:39:32 2016 - [info] HealthCheck: SSH to 172.16.254.110 is reachable.
Tue Nov 29 15:39:33 2016 - [info] Master MHA Node version is 0.56.
Tue Nov 29 15:39:33 2016 - [info] Checking recovery script configurations on 172.16.254.110(172.16.254.110:3306)..
Tue Nov 29 15:39:33 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mariadb/data --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000009
Tue Nov 29 15:39:33 2016 - [info]   Connecting to root@172.16.254.110(172.16.254.110:22)..
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /usr/local/mariadb/data, up to mysql-bin.000009
Tue Nov 29 15:39:33 2016 - [info] Binlog setting check done.
Tue Nov 29 15:39:33 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Nov 29 15:39:33 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=172.16.254.111 --slave_ip=172.16.254.111 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=10.1.19-MariaDB --manager_version=0.56 --relay_log_info=/usr/local/mariadb/data/relay-log.info  --relay_dir=/usr/local/mariadb/data/  --slave_pass=xxx
Tue Nov 29 15:39:33 2016 - [info]   Connecting to root@172.16.254.111(172.16.254.111:22)..
  Checking slave recovery environment settings..
    Opening /usr/local/mariadb/data/relay-log.info ... ok.
    Relay log found at /usr/local/mariadb/data, up to mariadb_S1-relay-bin.000005
    Temporary relay log file is /usr/local/mariadb/data/mariadb_S1-relay-bin.000005
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Nov 29 15:39:34 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=172.16.254.112 --slave_ip=172.16.254.112 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=10.1.19-MariaDB --manager_version=0.56 --relay_log_info=/usr/local/mariadb/data/relay-log.info  --relay_dir=/usr/local/mariadb/data/  --slave_pass=xxx
Tue Nov 29 15:39:34 2016 - [info]   Connecting to root@172.16.254.112(172.16.254.112:22)..
  Checking slave recovery environment settings..
    Opening /usr/local/mariadb/data/relay-log.info ... ok.
    Relay log found at /usr/local/mariadb/data, up to mariadb_S2-relay-bin.000005
    Temporary relay log file is /usr/local/mariadb/data/mariadb_S2-relay-bin.000005
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Nov 29 15:39:34 2016 - [info] Slaves settings check done.
Tue Nov 29 15:39:34 2016 - [info]
172.16.254.110(172.16.254.110:3306) (current master)
 +--172.16.254.111(172.16.254.111:3306)
 +--172.16.254.112(172.16.254.112:3306)
Tue Nov 29 15:39:34 2016 - [warning] master_ip_failover_script is not defined.
Tue Nov 29 15:39:34 2016 - [warning] shutdown_script is not defined.
Tue Nov 29 15:39:34 2016 - [info] Set master ping interval 3 seconds.
Tue Nov 29 15:39:35 2016 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Nov 29 15:39:35 2016 - [info] Starting ping health check on 172.16.254.110(172.16.254.110:3306)..
Tue Nov 29 15:39:35 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

위 로그 60~62행을 보면 현재 172.16.254.110 서버가 master 이고 172.16.254.111, 172.16.254.112 서버가 slave 임을 알 수 있다.

master 장애 재현
master의 장애를 재현하기 위해 간단하게 mariadb를 정지해보면 다음과 같은 로그들이 출력된다.

Tue Nov 29 15:46:26 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Nov 29 15:46:26 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mariadb/data --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-bin
Tue Nov 29 15:46:26 2016 - [info] HealthCheck: SSH to 172.16.254.110 is reachable.
Tue Nov 29 15:46:29 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 29 15:46:29 2016 - [warning] Connection failed 2 time(s)..
Tue Nov 29 15:46:32 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 29 15:46:32 2016 - [warning] Connection failed 3 time(s)..
Tue Nov 29 15:46:35 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 29 15:46:35 2016 - [warning] Connection failed 4 time(s)..
Tue Nov 29 15:46:35 2016 - [warning] Master is not reachable from health checker!
Tue Nov 29 15:46:35 2016 - [warning] Master 172.16.254.110(172.16.254.110:3306) is not reachable!
Tue Nov 29 15:46:35 2016 - [warning] SSH is reachable.
Tue Nov 29 15:46:35 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Tue Nov 29 15:46:35 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 29 15:46:35 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Tue Nov 29 15:46:35 2016 - [info] Reading server configuration from /etc/app1.cnf..
Tue Nov 29 15:46:35 2016 - [info] GTID failover mode = 0
Tue Nov 29 15:46:35 2016 - [info] Dead Servers:
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:46:35 2016 - [info] Alive Servers:
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.111(172.16.254.111:3306)
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.112(172.16.254.112:3306)
Tue Nov 29 15:46:35 2016 - [info] Alive Slaves:
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.111(172.16.254.111:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:46:35 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:46:35 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.112(172.16.254.112:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:46:35 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:46:35 2016 - [info]     Not candidate for the new Master (no_master is set)
Tue Nov 29 15:46:35 2016 - [info] Checking slave configurations..
Tue Nov 29 15:46:35 2016 - [info]  read_only=1 is not set on slave 172.16.254.111(172.16.254.111:3306).
Tue Nov 29 15:46:35 2016 - [info] Checking replication filtering settings..
Tue Nov 29 15:46:35 2016 - [info]  Replication filtering check ok.
Tue Nov 29 15:46:35 2016 - [info] Master is down!
Tue Nov 29 15:46:35 2016 - [info] Terminating monitoring script.
Tue Nov 29 15:46:35 2016 - [info] Got exit code 20 (Master dead).
Tue Nov 29 15:46:35 2016 - [info] MHA::MasterFailover version 0.56.
Tue Nov 29 15:46:35 2016 - [info] Starting master failover.
Tue Nov 29 15:46:35 2016 - [info]
Tue Nov 29 15:46:35 2016 - [info] * Phase 1: Configuration Check Phase..
Tue Nov 29 15:46:35 2016 - [info]
Tue Nov 29 15:46:35 2016 - [info] GTID failover mode = 0
Tue Nov 29 15:46:35 2016 - [info] Dead Servers:
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:46:35 2016 - [info] Checking master reachability via MySQL(double check)...
Tue Nov 29 15:46:35 2016 - [info]  ok.
Tue Nov 29 15:46:35 2016 - [info] Alive Servers:
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.111(172.16.254.111:3306)
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.112(172.16.254.112:3306)
Tue Nov 29 15:46:35 2016 - [info] Alive Slaves:
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.111(172.16.254.111:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:46:35 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:46:35 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.112(172.16.254.112:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:46:35 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:46:35 2016 - [info]     Not candidate for the new Master (no_master is set)
Tue Nov 29 15:46:35 2016 - [info] Starting Non-GTID based failover.
Tue Nov 29 15:46:35 2016 - [info]
Tue Nov 29 15:46:35 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Nov 29 15:46:35 2016 - [info]
Tue Nov 29 15:46:35 2016 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Nov 29 15:46:35 2016 - [info]
Tue Nov 29 15:46:35 2016 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Nov 29 15:46:35 2016 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Tue Nov 29 15:46:35 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Nov 29 15:46:35 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Nov 29 15:46:35 2016 - [info]
Tue Nov 29 15:46:35 2016 - [info] * Phase 3: Master Recovery Phase..
Tue Nov 29 15:46:35 2016 - [info]
Tue Nov 29 15:46:35 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Nov 29 15:46:35 2016 - [info]
Tue Nov 29 15:46:35 2016 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:23400456
Tue Nov 29 15:46:35 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.111(172.16.254.111:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:46:35 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:46:35 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.112(172.16.254.112:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:46:35 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:46:35 2016 - [info]     Not candidate for the new Master (no_master is set)
Tue Nov 29 15:46:35 2016 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:23400456
Tue Nov 29 15:46:35 2016 - [info] Oldest slaves:
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.111(172.16.254.111:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:46:35 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:46:35 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 29 15:46:35 2016 - [info]   172.16.254.112(172.16.254.112:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:46:35 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:46:35 2016 - [info]     Not candidate for the new Master (no_master is set)
Tue Nov 29 15:46:35 2016 - [info]
Tue Nov 29 15:46:35 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Nov 29 15:46:35 2016 - [info]
Tue Nov 29 15:46:35 2016 - [info] Fetching dead master's binary logs..
Tue Nov 29 15:46:35 2016 - [info] Executing command on the dead master 172.16.254.110(172.16.254.110:3306): save_binary_logs --command=save --start_file=mysql-bin.000009  --start_pos=23400456 --binlog_dir=/usr/local/mariadb/data --output_file=/var/log/masterha/app1/saved_master_binlog_from_172.16.254.110_3306_20161129154635.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000009 pos 23400456 to mysql-bin.000009 EOF into /var/log/masterha/app1/saved_master_binlog_from_172.16.254.110_3306_20161129154635.binlog ..
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 249.. ok.
  Dumping effective binlog data from /usr/local/mariadb/data/mysql-bin.000009 position 23400456 to tail(23400475).. ok.
 Binlog Checksum enabled
 Concat succeeded.
Tue Nov 29 15:46:37 2016 - [info] scp from root@172.16.254.110:/var/log/masterha/app1/saved_master_binlog_from_172.16.254.110_3306_20161129154635.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_172.16.254.110_3306_20161129154635.binlog succeeded.
Tue Nov 29 15:46:37 2016 - [info] HealthCheck: SSH to 172.16.254.111 is reachable.
Tue Nov 29 15:46:38 2016 - [info] HealthCheck: SSH to 172.16.254.112 is reachable.
Tue Nov 29 15:46:39 2016 - [info]
Tue Nov 29 15:46:39 2016 - [info] * Phase 3.3: Determining New Master Phase..
Tue Nov 29 15:46:39 2016 - [info]
Tue Nov 29 15:46:39 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Nov 29 15:46:39 2016 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Nov 29 15:46:39 2016 - [info] Searching new master from slaves..
Tue Nov 29 15:46:39 2016 - [info]  Candidate masters from the configuration file:
Tue Nov 29 15:46:39 2016 - [info]   172.16.254.111(172.16.254.111:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:46:39 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:46:39 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Nov 29 15:46:39 2016 - [info]  Non-candidate masters:
Tue Nov 29 15:46:39 2016 - [info]   172.16.254.112(172.16.254.112:3306)  Version=10.1.19-MariaDB (oldest major version between slaves) log-bin:enabled
Tue Nov 29 15:46:39 2016 - [info]     Replicating from 172.16.254.110(172.16.254.110:3306)
Tue Nov 29 15:46:39 2016 - [info]     Not candidate for the new Master (no_master is set)
Tue Nov 29 15:46:39 2016 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Tue Nov 29 15:46:39 2016 - [info] New master is 172.16.254.111(172.16.254.111:3306)
Tue Nov 29 15:46:39 2016 - [info] Starting master failover..
Tue Nov 29 15:46:39 2016 - [info]
From:
172.16.254.110(172.16.254.110:3306) (current master)
 +--172.16.254.111(172.16.254.111:3306)
 +--172.16.254.112(172.16.254.112:3306)
To:
172.16.254.111(172.16.254.111:3306) (new master)
 +--172.16.254.112(172.16.254.112:3306)
Tue Nov 29 15:46:39 2016 - [info]
Tue Nov 29 15:46:39 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Nov 29 15:46:39 2016 - [info]
Tue Nov 29 15:46:39 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Nov 29 15:46:39 2016 - [info] Sending binlog..
Tue Nov 29 15:46:40 2016 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_172.16.254.110_3306_20161129154635.binlog to root@172.16.254.111:/var/log/masterha/app1/saved_master_binlog_from_172.16.254.110_3306_20161129154635.binlog succeeded.
Tue Nov 29 15:46:40 2016 - [info]
Tue Nov 29 15:46:40 2016 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Nov 29 15:46:40 2016 - [info]
Tue Nov 29 15:46:40 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Nov 29 15:46:40 2016 - [info] Starting recovery on 172.16.254.111(172.16.254.111:3306)..
Tue Nov 29 15:46:40 2016 - [info]  Generating diffs succeeded.
Tue Nov 29 15:46:40 2016 - [info] Waiting until all relay logs are applied.
Tue Nov 29 15:46:40 2016 - [info]  done.
Tue Nov 29 15:46:40 2016 - [info] Getting slave status..
Tue Nov 29 15:46:40 2016 - [info] This slave(172.16.254.111)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000009:23400456). No need to recover from Exec_Master_Log_Pos.
Tue Nov 29 15:46:40 2016 - [info] Connecting to the target slave host 172.16.254.111, running recover script..
Tue Nov 29 15:46:40 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mhauser' --slave_host=172.16.254.111 --slave_ip=172.16.254.111  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_172.16.254.110_3306_20161129154635.binlog --workdir=/var/log/masterha/app1 --target_version=10.1.19-MariaDB --timestamp=20161129154635 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Tue Nov 29 15:46:40 2016 - [info]
MySQL client version is 10.1.19. Using --binary-mode.
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_172.16.254.110_3306_20161129154635.binlog on 172.16.254.111:3306. This may take long time...
Applying log files succeeded.
Tue Nov 29 15:46:40 2016 - [info]  All relay logs were successfully applied.
Tue Nov 29 15:46:40 2016 - [info] Getting new master's binlog name and position..
Tue Nov 29 15:46:40 2016 - [info]  mysql-bin.000013:991791
Tue Nov 29 15:46:40 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.254.111', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000013', MASTER_LOG_POS=991791, MASTER_USER='repl_user', MASTER_PASSWORD='xxx';
Tue Nov 29 15:46:40 2016 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Tue Nov 29 15:46:40 2016 - [info] ** Finished master recovery successfully.
Tue Nov 29 15:46:40 2016 - [info] * Phase 3: Master Recovery Phase completed.
Tue Nov 29 15:46:40 2016 - [info]
Tue Nov 29 15:46:40 2016 - [info] * Phase 4: Slaves Recovery Phase..
Tue Nov 29 15:46:40 2016 - [info]
Tue Nov 29 15:46:40 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Nov 29 15:46:40 2016 - [info]
Tue Nov 29 15:46:40 2016 - [info] -- Slave diff file generation on host 172.16.254.112(172.16.254.112:3306) started, pid: 17907. Check tmp log /var/log/masterha/app1/172.16.254.112_3306_20161129154635.log if it takes time..
Tue Nov 29 15:46:40 2016 - [info]
Tue Nov 29 15:46:40 2016 - [info] Log messages from 172.16.254.112 ...
Tue Nov 29 15:46:40 2016 - [info]
Tue Nov 29 15:46:40 2016 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Nov 29 15:46:40 2016 - [info] End of log messages from 172.16.254.112.
Tue Nov 29 15:46:40 2016 - [info] -- 172.16.254.112(172.16.254.112:3306) has the latest relay log events.
Tue Nov 29 15:46:40 2016 - [info] Generating relay diff files from the latest slave succeeded.
Tue Nov 29 15:46:40 2016 - [info]
Tue Nov 29 15:46:40 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Nov 29 15:46:40 2016 - [info]
Tue Nov 29 15:46:40 2016 - [info] -- Slave recovery on host 172.16.254.112(172.16.254.112:3306) started, pid: 17909. Check tmp log /var/log/masterha/app1/172.16.254.112_3306_20161129154635.log if it takes time..
Tue Nov 29 15:46:42 2016 - [info]
Tue Nov 29 15:46:42 2016 - [info] Log messages from 172.16.254.112 ...
Tue Nov 29 15:46:42 2016 - [info]
Tue Nov 29 15:46:40 2016 - [info] Sending binlog..
Tue Nov 29 15:46:41 2016 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_172.16.254.110_3306_20161129154635.binlog to root@172.16.254.112:/var/log/masterha/app1/saved_master_binlog_from_172.16.254.110_3306_20161129154635.binlog succeeded.
Tue Nov 29 15:46:41 2016 - [info] Starting recovery on 172.16.254.112(172.16.254.112:3306)..
Tue Nov 29 15:46:41 2016 - [info]  Generating diffs succeeded.
Tue Nov 29 15:46:41 2016 - [info] Waiting until all relay logs are applied.
Tue Nov 29 15:46:41 2016 - [info]  done.
Tue Nov 29 15:46:41 2016 - [info] Getting slave status..
Tue Nov 29 15:46:41 2016 - [info] This slave(172.16.254.112)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000009:23400456). No need to recover from Exec_Master_Log_Pos.
Tue Nov 29 15:46:41 2016 - [info] Connecting to the target slave host 172.16.254.112, running recover script..
Tue Nov 29 15:46:41 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mhauser' --slave_host=172.16.254.112 --slave_ip=172.16.254.112  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_172.16.254.110_3306_20161129154635.binlog --workdir=/var/log/masterha/app1 --target_version=10.1.19-MariaDB --timestamp=20161129154635 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Tue Nov 29 15:46:42 2016 - [info]
MySQL client version is 10.1.19. Using --binary-mode.
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_172.16.254.110_3306_20161129154635.binlog on 172.16.254.112:3306. This may take long time...
Applying log files succeeded.
Tue Nov 29 15:46:42 2016 - [info]  All relay logs were successfully applied.
Tue Nov 29 15:46:42 2016 - [info]  Resetting slave 172.16.254.112(172.16.254.112:3306) and starting replication from the new master 172.16.254.111(172.16.254.111:3306)..
Tue Nov 29 15:46:42 2016 - [info]  Executed CHANGE MASTER.
Tue Nov 29 15:46:42 2016 - [info]  Slave started.
Tue Nov 29 15:46:42 2016 - [info] End of log messages from 172.16.254.112.
Tue Nov 29 15:46:42 2016 - [info] -- Slave recovery on host 172.16.254.112(172.16.254.112:3306) succeeded.
Tue Nov 29 15:46:42 2016 - [info] All new slave servers recovered successfully.
Tue Nov 29 15:46:42 2016 - [info]
Tue Nov 29 15:46:42 2016 - [info] * Phase 5: New master cleanup phase..
Tue Nov 29 15:46:42 2016 - [info]
Tue Nov 29 15:46:42 2016 - [info] Resetting slave info on the new master..
Tue Nov 29 15:46:42 2016 - [info]  172.16.254.111: Resetting slave info succeeded.
Tue Nov 29 15:46:42 2016 - [info] Master failover to 172.16.254.111(172.16.254.111:3306) completed successfully.
Tue Nov 29 15:46:42 2016 - [info]
----- Failover Report -----
app1: MySQL Master failover 172.16.254.110(172.16.254.110:3306) to 172.16.254.111(172.16.254.111:3306) succeeded
Master 172.16.254.110(172.16.254.110:3306) is down!
Check MHA Manager logs at localhost.localdomain:/var/log/masterha/app1/app1.log for details.
Started automated(non-interactive) failover.
The latest slave 172.16.254.111(172.16.254.111:3306) has all relay logs for recovery.
Selected 172.16.254.111(172.16.254.111:3306) as a new master.
172.16.254.111(172.16.254.111:3306): OK: Applying all logs succeeded.
172.16.254.112(172.16.254.112:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.16.254.112(172.16.254.112:3306): OK: Applying all logs succeeded. Slave started, replicating from 172.16.254.111(172.16.254.111:3306)
172.16.254.111(172.16.254.111:3306): Resetting slave info succeeded.
Master failover to 172.16.254.111(172.16.254.111:3306) completed successfully.

로그 제일 하단에 Master failover to 172.16.254.111(172.16.254.111:3306) completed successfully. 메시지에 172.16.254.111 서버가 master로 승격된 것을 확인 할 수 있다.

실제로 172.16.254.111 서버에 접속해서 show slave status\G 명령을 실행해보면 다음과 같은 메시지가 출력된다.(현재 slave 가 아니라는 말이다.)

MariaDB [(none)]> show slave status\G
Empty set (0.00 sec)

172.16.254.112 서버에서 show slave status\G를 실행하면 112번이 바라보는 master ip가 172.16.254.111번으로 변경되었음을 확인 할 수 있다.

MariaDB [(none)]> show slave status\G
*** 1. row ***
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.254.111
                  Master_User: repl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000013
          Read_Master_Log_Pos: 991791
               Relay_Log_File: mariadb_S2-relay-bin.000002
                Relay_Log_Pos: 537
        Relay_Master_Log_File: mysql-bin.000013
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 991791
              Relay_Log_Space: 840
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 3
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: No
                  Gtid_IO_Pos:
      Replicate_Do_Domain_Ids:
  Replicate_Ignore_Domain_Ids:
                Parallel_Mode: conservative
1 row in set (0.00 sec)

이상.

MHA에 대한 자세한 내용은 공식 사이트인 https://code.google.com/p/mysql-master-ha/ 를 참고하시면 됩니다.

11 thoughts on “MHA를 이용한 MariaDB(MySQL) Replication Auto failover

  1. 안녕하세요 포스팅 보고 댓글답니다!
    현재 centos7, mariadb10.1.24 환경에서 mha 구축을 하려고 하는데,
    replication, ssh trust 설정까지는 끝냈으나, ㅠ mha 환경 만드는게 쉬운일이 아니네요;
    작성된 포스트에 따라 해봤으나 ㅠ 잘 되지 않네요ㅠ
    제 서버 환경에서도 정상적으로 설치 진행되어야하는게 맞는건지 궁금해서 댓글 남기고 갑니다..ㅎ

    1. 정확히 어떤 부분에서 안되시는건가요?
      제가 테스트 해보았을때 별다른 어려움은 없었습니다.
      말씀해주신 환경에서도 충분히 구축이 가능합니다!

      1. MHA 환경은 완료되었고,
        masterha_check_ssh –conf=/etc/mha.cnf -> All SSH connection tests passed successfully.
        masterha_check_repl –conf=/etc/mha.cnf -> MySQL Replication Health is OK.
        떨어졌는데,
        masterha_manager –conf=/etc/mha.cnf MHA를 실행시키면 시작 시간이 굉장히 오래걸려요.. 그리고 응답이 안떨어집니다 ㅠ
        Wed Jun 21 17:08:56 2017 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
        Wed Jun 21 17:08:56 2017 – [info] Reading application default configuration from /etc/mha.cnf..
        Wed Jun 21 17:08:56 2017 – [info] Reading server configuration from /etc/mha.cnf..
        결과 화면 입니다..
        로그를 확인해보면 개발자님과 동일하게 확인됩니다.
        Wed Jun 21 17:11:52 2017 – [warning] master_ip_failover_script is not defined.
        Wed Jun 21 17:11:52 2017 – [warning] shutdown_script is not defined.
        Wed Jun 21 17:11:52 2017 – [info] Set master ping interval 3 seconds.
        Wed Jun 21 17:11:52 2017 – [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
        Wed Jun 21 17:11:52 2017 – [info] Starting ping health check on 192.168.0.3(192.168.0.3:3306)..
        Wed Jun 21 17:11:52 2017 – [info] Ping(SELECT) succeeded, waiting until MySQL doesn’t respond..
        mha.cnf 설정입니다
        [server default]
        user=root
        password=1234
        ssh_user=root
        port=3306
        manager_workdir=/var/lib/mysql/
        manager_log=/var/lib/mysql/log-bin.log
        master_binlog_dir=/var/lib/mysql/
        remote_workdir=/var/lib/mysql/
        ping_interval=3
        [server1]
        hostname=192.168.0.3
        [server2]
        hostname=192.168.0.4
        candidate_master=1
        check_repl_delay=0
        [server3]
        hostname=192.168.0.5
        no_master=1
        이렇게 설정했는데, 혹시 도움 주실 수 있을까요? ㅠㅠ…

        1. ssh / replication 체크가 문제 없이 완료 되셨다면 별 문제가 없을것 같은데 이상하네요..
          댓글에 남겨주신 내용만을 가지고 뭔가 유추해 보기는 어려울것 같습니다.
          mha.cnf 설정을 제 문서와 동일하게 맞춰 보시고 실행하면서 로그를 찍어 확인해 보는 방법밖에 없을듯 합니다.
          도움이 되지 않는 답변이라 죄송합니다.
          설정 맞춰보시고 로그 찍어서 한번 올려봐주세요~
          저도 다시한번 구성하면서 확인해봐야겠네요~

    2. 시작시 오래걸리는건 ssh 체크 및 replication 체크에서 발생되는 지연일가능성이큽니다. 주로 ssh 에서 지연이 발생하게되는데 이는 구글링에서 ssh 접속지연이라고 검색하시면 /etc/sshd_config 을 수정하는 방법이 있습니다.

  2. 포스팅 감사합니다.
    구성은 잘 되었는데요 vip 로 선언한 ip 주소로 mysql 을 접근하려 하니 접속이 되질 않네요.. 가상 ip 로 핑도 잘 가고 유저 권한도 문제가 없는데 말이죠.. 5.7.20 버전을 사용하는데 vip 관련된 다른 설정을 해야 하는지요..

    1. 혹시 vip로 선언을 한 IP라는게 제 글에서 어떤 부분인가요??
      master / slave 구조에서 vip로 묶을 수 있는 부분은 다수의 slave에 읽기 분산을 할 경우일것 같은데요~
      mha는 단순히 유일한 master가 죽었을때 slave중 한대를 master로 자동 승격하는 역할을 하는 건데요~
      vip를 어떻게 선언했는지 알려주실 수 있으신가요??
      저도 구성 테스트를 해본지가 오래되서 기억이 가물 가물 하네요 ㅠㅠ

    2. vip 는 장치 하고 vip 로 사용할 아이피가필요하고 해당 정보를 failover 스크립트상단 하고 online change 스크립트 상단에 넣어주셔야합니다 그리고 failover 가지연되는것은 테스트해볼수있는것이 ssh체크하실때 오래걸리는경우 /etc/sshd_config 에서 옵션변경을해줄수있습니다 관련정보는 구글에 ssh접속지연 이라고치면아마 해답이있으실거에용 그리고 리플리케이션이 오래걸린다면 리플리케이션 옵션중에 끊어진후 제연결시 딜레이를 설정하는 옵션이있습니다 옵션병은 기억이 잘안나네요 …. 그걸 짧게가지고 가시면 해결하실수있을겁니다 대부분의 지연은 ssh또는 selinux ,증에서 일어나니 참고하세용~
      아 질문을 잘못읽었네요 접속이안되는것은 실제 사용가능한 ip인지 또는 시스템 방화벽이켜져있는지확인하셔야합니당..

  3. 안녕하세요.
    저는 docker 기반으로 replication을 설정하고 MHA를 구성하는 중 mariadb쪽 버전체크 중 에러가 발생해서 동작이 안되고 있는데 혹시 ubuntu로 구성해서 정상적으로 넘어 가는지 확인하신분 계신가요?
    ubuntu 16.04 mariadb Server version: 10.1.29 replication
    root@Mha:/# masterha_check_repl –conf=/etc/app1.cnf
    Tue Nov 21 06:27:31 2017 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Tue Nov 21 06:27:31 2017 – [info] Reading application default configurations from /etc/app1.cnf..
    Tue Nov 21 06:27:31 2017 – [info] Reading server configurations from /etc/app1.cnf..
    Tue Nov 21 06:27:31 2017 – [info] MHA::MasterMonitor version 0.55.
    Tue Nov 21 06:27:32 2017 – [error][/usr/local/share/perl/5.22.1/MHA/MasterMonitor.pm, ln386] Error happend on checking configurations. Redundant argument in sprintf at /usr/local/share/perl/5.22.1/MHA/NodeUtil.pm line 184.
    Tue Nov 21 06:27:32 2017 – [error][/usr/local/share/perl/5.22.1/MHA/MasterMonitor.pm, ln482] Error happened on monitoring servers.
    Tue Nov 21 06:27:32 2017 – [info] Got exit code 1 (Not master dead).
    root@Mha:/# cat /etc/app1.cnf
    [server default]
    # mysql user and password
    user=mhauser
    password=12345
    # working directory on the manager
    manager_workdir=/var/log/masterha/app1
    # manager log file
    manager_log=/var/log/masterha/app1/app1.log
    # working directory on MySQL servers
    remote_workdir=/var/log/masterha/app1
    [server1]
    hostname=192.168.31.109
    master_binlog_dir=/usr/local/mariadb/data
    candidate_master=1
    [server2]
    hostname=192.168.31.110
    master_binlog_dir=/usr/local/mariadb/data
    candidate_master=1
    [server3]
    hostname=192.168.31.107
    master_binlog_dir=/usr/local/mariadb/data
    no_master=1
    root@Mha:/# masterha_check_ssh –conf=/etc/app1.cnf
    Tue Nov 21 07:26:03 2017 – [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Tue Nov 21 07:26:03 2017 – [info] Reading application default configurations from /etc/app1.cnf..
    Tue Nov 21 07:26:03 2017 – [info] Reading server configurations from /etc/app1.cnf..
    Tue Nov 21 07:26:03 2017 – [info] Starting SSH connection tests..
    Tue Nov 21 07:26:04 2017 – [debug]
    Tue Nov 21 07:26:03 2017 – [debug] Connecting via SSH from root@192.168.31.109(192.168.31.109:22) to root@192.168.31.110(192.168.31.110:22)..
    Tue Nov 21 07:26:03 2017 – [debug] ok.
    Tue Nov 21 07:26:03 2017 – [debug] Connecting via SSH from root@192.168.31.109(192.168.31.109:22) to root@192.168.31.107(192.168.31.107:22)..
    Tue Nov 21 07:26:03 2017 – [debug] ok.
    Tue Nov 21 07:26:04 2017 – [debug]
    Tue Nov 21 07:26:03 2017 – [debug] Connecting via SSH from root@192.168.31.110(192.168.31.110:22) to root@192.168.31.109(192.168.31.109:22)..
    Tue Nov 21 07:26:04 2017 – [debug] ok.
    Tue Nov 21 07:26:04 2017 – [debug] Connecting via SSH from root@192.168.31.110(192.168.31.110:22) to root@192.168.31.107(192.168.31.107:22)..
    Tue Nov 21 07:26:04 2017 – [debug] ok.
    Tue Nov 21 07:26:05 2017 – [debug]
    Tue Nov 21 07:26:04 2017 – [debug] Connecting via SSH from root@192.168.31.107(192.168.31.107:22) to root@192.168.31.109(192.168.31.109:22)..
    Tue Nov 21 07:26:04 2017 – [debug] ok.
    Tue Nov 21 07:26:04 2017 – [debug] Connecting via SSH from root@192.168.31.107(192.168.31.107:22) to root@192.168.31.110(192.168.31.110:22)..
    Tue Nov 21 07:26:04 2017 – [debug] ok.
    Tue Nov 21 07:26:05 2017 – [info] All SSH connection tests passed successfully.
    vim /usr/local/share/perl/5.22.1/MHA/NodeUtil.pm
    186 sub check_manager_version {
    187 my $manager_version = shift;
    188 if ( $manager_version < $MHA::NodeConst::MGR_MIN_VERSION ) {
    189 croak
    190 "MHA Manager version is $manager_version, but must be $MHA::NodeConst::MGR_MIN_VERSION or higher.\n";
    191 }
    192 }

    1. 아! 오타 발견 감사합니다! 애석하게도 저 글을 작성할때 까지만 해도 링크가 정상적이었는데 지금은 아닌가 보네요 ㅜㅜ 오래전에 테스트 해봤던거라 예제도 없네요… 도움을 드리지 못해 죄송합니다.

답글 남기기

이메일 주소는 공개되지 않습니다. 필수 필드는 *로 표시됩니다