개요
Redis 와 Sentinel 로 구성된 레디스가 있습니다.
Jedis 를 사용하는 어플리케이션에서 Sentinel 에서 마스터 IP 를 조회하여 Redis 에 접속하고 있습니다.
sentinel failover 를 수행한 후 어플리케이션에서 새로운 마스터를 인지하지 못하는 상황이 발생했습니다.
모든 Sentinel 에서는 새로운 마스터 IP 로 변경되어 있는 상태였습니다.
원인에 대한 범위 좁히기
왜 어플리케이션에서 새로운 마스터를 찾지 못했을까
Jedis 라이브러리를 사용하는 어플리케이션에서 Pub/Sub 기능을 이용하여 Sentinel 의 메시지를 받고
이 Sentinel 메시지 중 +switch-master 의 메시지를 받으면 Failover 가 수행되도록 쓰레드로 구현이 되어 있습니다.
그렇다면.. 아래와 같이 2가지 원인으로 좁힐 수 있습니다.
2번의 경우에는 모든 센티넬에서 마스터 IP 가 변경되었기 때문에 1번의 원인을 좀 더 우선순위로 두었습니다.
1. Jedis 에서 +switch-master 메시지를 받지 못하였거나
2. Sentinel 에서 +switch-master 메시지를 전달을 안했다는게 원인이 됩니다.
추가 근거
redisgate (레디스 전문 사이트) 에서는 아래와 같은 문구가 기록되어 있습니다.
http://redisgate.kr/redis/clients/jedis_pubsub.php
근거에 대한 객관적 자료를 확보하기 위해 테스트를 진행해 봅니다.
환경
버전
테스트를 위해 사용한 버전은 다음과 같습니다.
종류 | 버전 |
OS | Ubuntu 18.04 |
JDK | openjdk 1.8 |
Jedis | 2.9.0 |
Redis | 2.8.4 |
Redis 와 Sentinel 구조
Redis 와 Sentinel 의 IP 와 Port 입니다.
Redis1 과 Redis2 는 마스터/슬레이브 구조입니다.
Sentinel1 / Sentinel2 / Sentinel3 에서는 위 Redis1, Redis2 를 모니터링하고 있습니다.
레디스/센티넬 | IP 와 Port |
Redis1 | 127.0.0.1 6161 |
Redis2 | 127.0.0.1 6162 |
Sentinel1 | 127.0.0.1 26379 |
Sentinel2 | 127.0.0.1 26380 |
Sentinel3 | 127.0.0.1 26381 |
Redis 와 Sentinel 의 Property 입니다.
레디스/센티넬 | 프로퍼티 |
Redis1 | $ cat /home/psh/redis/redis_6161.conf daemonize yes port 6161 save 900 1 save 300 10 save 60 10000 masterauth "P@ssword" requirepass "P@ssword" pidfile "/home/psh/redis/redis_6161.pid" logfile "/home/psh/redis/redis_6161.log" dir "/home/psh/redis" dbfilename "6161.rdb" |
Redis2 | $ cat /home/psh/redis/redis_6162.pid daemonize yes port 6162 save 900 1 save 300 10 save 60 10000 masterauth "P@ssword" requirepass "P@ssword" pidfile "/home/psh/redis/redis_6162.pid" logfile "/home/psh/redis/redis_6162.log" dir "/home/psh/redis" dbfilename "6162.rdb" |
Sentinel1 | $ cat /home/psh/redis/sentinel_26379.conf daemonize yes pidfile "/home/psh/redis/sentinel_26379.pid" logfile "/home/psh/redis/sentinel_26379.log" dir "/home/psh/redis" port 26379 sentinel monitor test 127.0.0.1 6161 2 sentinel down-after-milliseconds test 3000 sentinel auth-pass test P@ssword |
Sentinel2 | $ cat /home/psh/redis/sentinel_26380.conf daemonize yes pidfile "/home/psh/redis/sentinel_26380.pid" logfile "/home/psh/redis/sentinel_26380.log" dir "/home/psh/redis" port 26380 sentinel monitor test 127.0.0.1 6161 2 sentinel down-after-milliseconds test 3000 sentinel auth-pass test P@ssword |
Sentinel3 | $ cat /home/psh/redis/sentinel_26381.conf daemonize yes pidfile "/home/psh/redis/sentinel_26381.pid" logfile "/home/psh/redis/sentinel_26381.log" dir "/home/psh/redis" port 26381 sentinel monitor test 127.0.0.1 6161 2 sentinel down-after-milliseconds test 3000 sentinel auth-pass test P@ssword |
센티넬 Pub/Sub 메시지 받기
Java 코드에서 메시지 받기
Java 코드를 간략하게 설명하면 jedis 2.9.0 라이브러리를 import 하였고
Sentinel1 / 2 / 3 에 접속하여 쓰레드로 Sentinel 의 Pub 메시지를 받도록 구현시켜 놓은 코드입니다.
switch-master 로그만 받도록 해놓은게 아닌 모든 메시지를 받도록 구현하였습니다.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
|
import java.text.SimpleDateFormat;
import redis.clients.jedis.*;
import java.util.Set;
import java.util.HashSet;
public class test {
public static void main(String[] args) {
Runnable r1 = new classRun1();
Thread[] t1 = new Thread[1];
Runnable r2 = new classRun2();
Thread[] t2 = new Thread[1];
Runnable r3 = new classRun3();
Thread[] t3 = new Thread[1];
t1[0] = new Thread(r1);
t2[0] = new Thread(r2);
t3[0] = new Thread(r3);
t1[0].start();
t2[0].start();
t3[0].start();
}
}
class classRun1 implements Runnable {
public void run()
{
final String host1 = "127.0.0.1";
final int port1 = 26379;
final Jedis sentinelJedis1 = new Jedis(host1, port1);
SimpleDateFormat format1 = new SimpleDateFormat ( "yyyy-MM-dd HH:mm:ss" );
while(!Thread.currentThread().isInterrupted())
{
try {
Thread.sleep(1000);
System.out.println("Ping 체크시작");
for(int j = 0 ; j < 10 ; j ++)
{
try {
sentinelJedis1.ping();
break;
}catch(Exception e) {
continue;
}
}
sentinelJedis1.psubscribe ( new JedisPubSub() {
public void onMessage(String channel, String message){}
public void onSubscribe(String channel, int num){}
public void onUnsubscribe(String channel, int num){}
public void onPSubscribe(String channel, int num){}
public void onPUnsubscribe(String channel, int num){}
public void onPMessage(String pattern, String event, String message) {
String format_time = format1.format(System.currentTimeMillis());
System.out.println("[" + format_time + "] sentinelJedis1 : pattern : " + pattern);
System.out.println("[" + format_time + "] sentinelJedis1 : event : " + event);
System.out.println("[" + format_time + "] sentinelJedis1 : message : " + message);
}
}, "*");
}catch (Exception e){
System.out.println("[ERROR] Exception : " + e.getMessage());
continue;
}
}
try {
Thread.sleep(1000);
}
catch (InterruptedException e) {
System.out.println("[ERROR] InterruptedException : " + e.getMessage());
}
}
}
class classRun2 implements Runnable {
public void run()
{
final String host2 = "127.0.0.1";
final int port2 = 26380;
final Jedis sentinelJedis2 = new Jedis(host2, port2);
SimpleDateFormat format1 = new SimpleDateFormat ( "yyyy-MM-dd HH:mm:ss" );
while(!Thread.currentThread().isInterrupted())
{
try {
Thread.sleep(1000);
System.out.println("Ping 체크시작");
for(int j = 0 ; j < 10 ; j ++)
{
try {
sentinelJedis2.ping();
break;
}catch(Exception e) {
continue;
}
}
sentinelJedis2.psubscribe ( new JedisPubSub() {
public void onMessage(String channel, String message){}
public void onSubscribe(String channel, int num){}
public void onUnsubscribe(String channel, int num){}
public void onPSubscribe(String channel, int num){}
public void onPUnsubscribe(String channel, int num){}
public void onPMessage(String pattern, String event, String message) {
String format_time = format1.format(System.currentTimeMillis());
System.out.println("[" + format_time + "] sentinelJedis2 : pattern : " + pattern);
System.out.println("[" + format_time + "] sentinelJedis2 : event : " + event);
System.out.println("[" + format_time + "] sentinelJedis2 : message : " + message);
}
}, "*");
}catch (Exception e){
System.out.println("[ERROR] Exception : " + e.getMessage());
continue;
}
}
try {
Thread.sleep(1000);
}
catch (InterruptedException e) {
System.out.println("[ERROR] InterruptedException : " + e.getMessage());
}
}
}
class classRun3 implements Runnable {
public void run()
{
final String host3 = "127.0.0.1";
final int port3 = 26381;
final Jedis sentinelJedis3 = new Jedis(host3, port3);
SimpleDateFormat format1 = new SimpleDateFormat ( "yyyy-MM-dd HH:mm:ss" );
while(!Thread.currentThread().isInterrupted())
{
try {
Thread.sleep(1000);
System.out.println("Ping 체크시작");
for(int j = 0 ; j < 10 ; j ++)
{
try {
sentinelJedis3.ping();
break;
}catch(Exception e) {
continue;
}
}
sentinelJedis3.psubscribe ( new JedisPubSub() {
public void onMessage(String channel, String message){
System.out.println("sentinelJedis3 onMessage : " + message);
}
public void onSubscribe(String channel, int num){}
public void onUnsubscribe(String channel, int num){}
public void onPSubscribe(String channel, int num){}
public void onPUnsubscribe(String channel, int num){}
public void onPMessage(String pattern, String event, String message) {
String format_time = format1.format(System.currentTimeMillis());
System.out.println("[" + format_time + "] sentinelJedis3 : pattern : " + pattern);
System.out.println("[" + format_time + "] sentinelJedis3 : event : " + event);
System.out.println("[" + format_time + "] sentinelJedis3 : message : " + message);
}
}, "*");
}catch (Exception e){
System.out.println("[ERROR] Exception : " + e.getMessage());
continue;
}
}
try {
Thread.sleep(1000);
}
catch (InterruptedException e) {
System.out.println("[ERROR] InterruptedException : " + e.getMessage());
}
}
}
|
cs |
redis-cli 로 직접 붙어서 subscribe 에서 메시지 받기
센티넬 | 명령어 |
Sentinel1 | $ redis-cli -p 26379 127.0.0.1:26379> psubscribe * |
Sentinel2 | $ redis-cli -p 26380 127.0.0.1:26380> psubscribe * |
Sentinel3 | $ redis-cli -p 26381 127.0.0.1:26381> psubscribe * |
테스트 진행
Sentinel1 에서 Failover 를 실행
127.0.0.1:26379> sentinel failover test OK |
Sentinel 로그
센티넬 로그파일에 switch-master 로그내용이 기록되었습니다. (진하게 표기)
센티넬 | 로그 파일의 내용 |
Sentinel1 | [16863] 27 Dec 22:57:16.459 # Executing user requested FAILOVER of 'test' [16863] 27 Dec 22:57:16.459 # +new-epoch 15 [16863] 27 Dec 22:57:16.459 # +try-failover master test 127.0.0.1 6161 [16863] 27 Dec 22:57:16.474 # +vote-for-leader 7c8f6176dfbcf5397ddf30e42df90c2ba8778279 15 [16863] 27 Dec 22:57:16.475 # +elected-leader master test 127.0.0.1 6161 [16863] 27 Dec 22:57:16.475 # +failover-state-select-slave master test 127.0.0.1 6161 [16863] 27 Dec 22:57:16.534 # +selected-slave slave 127.0.0.1:6162 127.0.0.1 6162 @ test 127.0.0.1 6161 [16863] 27 Dec 22:57:16.534 * +failover-state-send-slaveof-noone slave 127.0.0.1:6162 127.0.0.1 6162 @ test 127.0.0.1 6161 [16863] 27 Dec 22:57:16.610 * +failover-state-wait-promotion slave 127.0.0.1:6162 127.0.0.1 6162 @ test 127.0.0.1 6161 [16863] 27 Dec 22:57:17.905 # +promoted-slave slave 127.0.0.1:6162 127.0.0.1 6162 @ test 127.0.0.1 6161 [16863] 27 Dec 22:57:17.905 # +failover-state-reconf-slaves master test 127.0.0.1 6161 [16863] 27 Dec 22:57:17.905 # +failover-end master test 127.0.0.1 6161 [16863] 27 Dec 22:57:17.905 # +switch-master test 127.0.0.1 6161 127.0.0.1 6162 [16863] 27 Dec 22:57:17.906 * +slave slave 127.0.0.1:6161 127.0.0.1 6161 @ test 127.0.0.1 6162 [16863] 27 Dec 22:57:28.096 * +convert-to-slave slave 127.0.0.1:6161 127.0.0.1 6161 @ test 127.0.0.1 6162 |
Sentinel2 | [16867] 27 Dec 22:57:17.458 # +new-epoch 15 [16867] 27 Dec 22:57:19.516 # +switch-master test 127.0.0.1 6161 127.0.0.1 6162 [16867] 27 Dec 22:57:19.517 * +slave slave 127.0.0.1:6161 127.0.0.1 6161 @ test 127.0.0.1 6162 |
Sentinel3 | [16871] 27 Dec 22:57:17.458 # +new-epoch 15 [16871] 27 Dec 22:57:19.515 # +switch-master test 127.0.0.1 6161 127.0.0.1 6162 [16871] 27 Dec 22:57:19.516 * +slave slave 127.0.0.1:6161 127.0.0.1 6161 @ test 127.0.0.1 6162 |
redis-cli 로 직접 붙어서 subscribe 에서 메시지 받았을 때
switch-master 로그가 기록되어 있는 것을 확인할 수 있습니다. (진하게 표기)
센티넬 | psubscribe 로 확인한 로그 |
Sentinel1 | 1) "pmessage" 2) "*" 3) "+new-epoch" 4) "15" 1) "pmessage" 2) "*" 3) "+try-failover" 4) "master test 127.0.0.1 6161" 1) "pmessage" 2) "*" 3) "+vote-for-leader" 4) "7c8f6176dfbcf5397ddf30e42df90c2ba8778279 15" 1) "pmessage" 2) "*" 3) "+elected-leader" 4) "master test 127.0.0.1 6161" 1) "pmessage" 2) "*" 3) "+failover-state-select-slave" 4) "master test 127.0.0.1 6161" 1) "pmessage" 2) "*" 3) "+selected-slave" 4) "slave 127.0.0.1:6162 127.0.0.1 6162 @ test 127.0.0.1 6161" 1) "pmessage" 2) "*" 3) "+failover-state-send-slaveof-noone" 4) "slave 127.0.0.1:6162 127.0.0.1 6162 @ test 127.0.0.1 6161" 1) "pmessage" 2) "*" 3) "+failover-state-wait-promotion" 4) "slave 127.0.0.1:6162 127.0.0.1 6162 @ test 127.0.0.1 6161" 1) "pmessage" 2) "*" 3) "+promoted-slave" 4) "slave 127.0.0.1:6162 127.0.0.1 6162 @ test 127.0.0.1 6161" 1) "pmessage" 2) "*" 3) "+failover-state-reconf-slaves" 4) "master test 127.0.0.1 6161" 1) "pmessage" 2) "*" 3) "+failover-end" 4) "master test 127.0.0.1 6161" 1) "pmessage" 2) "*" 3) "+switch-master" 4) "test 127.0.0.1 6161 127.0.0.1 6162" 1) "pmessage" 2) "*" 3) "+slave" 4) "slave 127.0.0.1:6161 127.0.0.1 6161 @ test 127.0.0.1 6162" 1) "pmessage" 2) "*" 3) "+convert-to-slave" 4) "slave 127.0.0.1:6161 127.0.0.1 6161 @ test 127.0.0.1 6162" |
Sentinel2 | 1) "pmessage" 2) "*" 3) "+new-epoch" 4) "15" 1) "pmessage" 2) "*" 3) "+switch-master" 4) "test 127.0.0.1 6161 127.0.0.1 6162" 1) "pmessage" 2) "*" 3) "+slave" 4) "slave 127.0.0.1:6161 127.0.0.1 6161 @ test 127.0.0.1 6162" |
Sentinel3 | 1) "pmessage" 2) "*" 3) "+new-epoch" 4) "15" 1) "pmessage" 2) "*" 3) "+switch-master" 4) "test 127.0.0.1 6161 127.0.0.1 6162" 1) "pmessage" 2) "*" 3) "+slave" 4) "slave 127.0.0.1:6161 127.0.0.1 6161 @ test 127.0.0.1 6162" |
Java 코드에서 메시지 받기
switch-master 로그가 정상적으로 기록되었습니다.
센티넬 | 자바 로그 |
Sentinel1 | [2020-12-27 22:57:16] sentinelJedis1 : pattern : * [2020-12-27 22:57:16] sentinelJedis1 : event : +new-epoch [2020-12-27 22:57:16] sentinelJedis1 : message : 15 [2020-12-27 22:57:16] sentinelJedis1 : pattern : * [2020-12-27 22:57:16] sentinelJedis1 : event : +try-failover [2020-12-27 22:57:16] sentinelJedis1 : message : master test 127.0.0.1 6161 [2020-12-27 22:57:16] sentinelJedis1 : pattern : * [2020-12-27 22:57:16] sentinelJedis1 : event : +vote-for-leader [2020-12-27 22:57:16] sentinelJedis1 : 7c8f6176dfbcf5397ddf30e42df90c2ba8778279 15 [2020-12-27 22:57:16] sentinelJedis1 : pattern : * [2020-12-27 22:57:16] sentinelJedis1 : event : +elected-leader [2020-12-27 22:57:16] sentinelJedis1 : message : master test 127.0.0.1 6161 [2020-12-27 22:57:16] sentinelJedis1 : pattern : * [2020-12-27 22:57:16] sentinelJedis1 : event : +failover-state-select-slave [2020-12-27 22:57:16] sentinelJedis1 : message : master test 127.0.0.1 6161 [2020-12-27 22:57:16] sentinelJedis1 : pattern : * [2020-12-27 22:57:16] sentinelJedis1 : event : +selected-slave [2020-12-27 22:57:16] sentinelJedis1 : message : slave 127.0.0.1:6162 127.0.0.1 6162 @ test 127.0.0.1 6161 [2020-12-27 22:57:16] sentinelJedis1 : pattern : * [2020-12-27 22:57:16] sentinelJedis1 : event : +failover-state-send-slaveof-noone [2020-12-27 22:57:16] sentinelJedis1 : message : slave 127.0.0.1:6162 127.0.0.1 6162 @ test 127.0.0.1 6161 [2020-12-27 22:57:16] sentinelJedis1 : pattern : * [2020-12-27 22:57:16] sentinelJedis1 : event : +failover-state-wait-promotion [2020-12-27 22:57:16] sentinelJedis1 : message : slave 127.0.0.1:6162 127.0.0.1 6162 @ test 127.0.0.1 6161 [2020-12-27 22:57:17] sentinelJedis1 : pattern : * [2020-12-27 22:57:17] sentinelJedis1 : event : +promoted-slave [2020-12-27 22:57:17] sentinelJedis1 : message : slave 127.0.0.1:6162 127.0.0.1 6162 @ test 127.0.0.1 6161 [2020-12-27 22:57:17] sentinelJedis1 : pattern : * [2020-12-27 22:57:17] sentinelJedis1 : event : +failover-state-reconf-slaves [2020-12-27 22:57:17] sentinelJedis1 : message : master test 127.0.0.1 6161 [2020-12-27 22:57:17] sentinelJedis1 : pattern : * [2020-12-27 22:57:17] sentinelJedis1 : event : +failover-end [2020-12-27 22:57:17] sentinelJedis1 : message : master test 127.0.0.1 6161 [2020-12-27 22:57:17] sentinelJedis1 : pattern : * [2020-12-27 22:57:17] sentinelJedis1 : event : +switch-master [2020-12-27 22:57:17] sentinelJedis1 : message : test 127.0.0.1 6161 127.0.0.1 6162 [2020-12-27 22:57:17] sentinelJedis1 : pattern : * [2020-12-27 22:57:17] sentinelJedis1 : event : +slave [2020-12-27 22:57:17] sentinelJedis1 : message : slave 127.0.0.1:6161 127.0.0.1 6161 @ test 127.0.0.1 6162 [2020-12-27 22:57:27] sentinelJedis1 : pattern : * [2020-12-27 22:57:27] sentinelJedis1 : event : +convert-to-slave [2020-12-27 22:57:27] sentinelJedis1 : message : slave 127.0.0.1:6161 127.0.0.1 6161 @ test 127.0.0.1 6162 |
Sentinel2 | [2020-12-27 22:57:17] sentinelJedis2 : pattern : * [2020-12-27 22:57:17] sentinelJedis2 : event : +new-epoch [2020-12-27 22:57:17] sentinelJedis2 : message : 15 [2020-12-27 22:57:19] sentinelJedis2 : pattern : * [2020-12-27 22:57:19] sentinelJedis2 : event : +switch-master [2020-12-27 22:57:19] sentinelJedis2 : message : test 127.0.0.1 6161 127.0.0.1 6162 [2020-12-27 22:57:19] sentinelJedis2 : pattern : * [2020-12-27 22:57:19] sentinelJedis2 : event : +slave [2020-12-27 22:57:19] sentinelJedis2 : message : slave 127.0.0.1:6161 127.0.0.1 6161 @ test 127.0.0.1 6162 |
Sentinel3 | [2020-12-27 22:57:17] sentinelJedis3 : pattern : * [2020-12-27 22:57:17] sentinelJedis3 : event : +new-epoch [2020-12-27 22:57:17] sentinelJedis3 : message : 15 [2020-12-27 22:57:19] sentinelJedis3 : pattern : * [2020-12-27 22:57:19] sentinelJedis3 : event : +switch-master [2020-12-27 22:57:19] sentinelJedis3 : message : test 127.0.0.1 6161 127.0.0.1 6162 [2020-12-27 22:57:19] sentinelJedis3 : pattern : * [2020-12-27 22:57:19] sentinelJedis3 : event : +slave [2020-12-27 22:57:19] sentinelJedis3 : message : slave 127.0.0.1:6161 127.0.0.1 6161 @ test 127.0.0.1 6162 |
레디스 작업도중 센티넬에서 어떠한 이유에 의해.. switch-master event 를 던져주지 못했다..? 로 추정..
은 다음 포스팅에서 기회되면 정리
'Database > Redis' 카테고리의 다른 글
[Redis] save 동작 시 메모리 사용률에 대한 정리 (0) | 2021.01.09 |
---|---|
[Redis] Key 관리를 통해 메모리를 효율적으로 사용하는 방법 (0) | 2021.01.08 |
[Redis] Jedis 에서 Redis Sentinel 을 통한 마스터 정보 얻는 방법 (0) | 2020.12.23 |
[Redis] 리눅스환경에서 JAVA 와 Jedis 로 Redis 접속하는 방법 (2) | 2020.12.20 |
[Redis] 레디스 failover-abort-no-good-slave 오류 해결방법 (0) | 2020.11.22 |