CENTOS 7.6 / Postgresql -14.7 기준

공통 (Master, Slave1, Slave2)

만약 password 과련 error 발생시

PGPASSWORD=비밀번호 를 명령 앞에 같이 적어주길 바람

repmgr 설치

sudo yum install -y repmgr_14.x86_64 repmgr_14-devel.x86_64
  • 원하지 않는다면 안해줘도 되지만 커맨드를 실행할 때 /usr/pgsql-14/bin/repmgr 를 입력하면서 해주어야 한다.
    ln -s /usr/pgsql-14/bin/repmgr /usr/bin/repmgr
    
ln -s /usr/pgsql-14/bin/repmgrd /usr/bin/repmgrd

sudoers 권한 주기 (기본값은 Read Only)

sudo chmod 700 /etc/sudoers

sudoers 수정하기

vim /etc/sudoers

sudoers 내용 (마지막에 넣으면 됨)

postgres ALL=NOPASSWD: /usr/bin/systemctl stop postgresql-14.service, /usr/bin/systemctl start postgresql-14.service, /usr/bin/systemctl restart postgresql-14.service, /usr/bin/systemctl reload postgresql-14.service, /usr/bin/systemctl start repmgr14.service, /usr/bin/systemctl stop repmgr14.service
sudo chmod 440 /etc/sudoers

Master 설정

계정생성

su - postgres
createuser --superuser 유저명
createdb --owner=유저명 db이름
vim /var/lib/pgsql/14/data/postgresql.conf

각각의 옵션을 찾아서 수정해줘도 되지만 기본이 주석처리라 마지막 에서 커스텀입력을 해줘도 된다

max_wal_senders = 10 
max_replication_slots = 10 
wal_level = 'replica' 
hot_standby = on 
archive_mode = on 
archive_command = '/bin/true' 
wal_log_hints = on 
shared_preload_libraries = 'repmgr'

pg_hba.conf 수정

vim /var/lib/pgsql/14/data/pg_hba.conf
# for repmgr
# TYPE  DATABASE        USER            ADDRESS                 METHOD
 local  replication     유저명                                    trust
 host   replication     유저명            127.0.0.1/32            trust
 host   replication     유저명            마스터 아이피/24           trust
 host   replication     유저명            슬레이브1아이피/24          trust
 host   replication     유저명            슬레이브2아이피/24          trust
 local  DB이름           유저명                                    trust
 host   DB이름           유저명            127.0.0.1/32            trust
 host   DB이름           유저명            마스터 아이피/24           trust
 host   DB이름           유저명            슬레이브1아이피/24          trust
 host   DB이름           유저명            슬레이브2아이피/24          trust

repmgr.conf 수정

vim /etc/repmgr/14/repmgr.conf

################################################################################################
#                                       CUSTOM SETTING                                         #
################################################################################################

node_id=1
node_name='노드이름 (여기선 master '서버의 호스트명 추천')'
conninfo='host=마스터 아이피 user=계정명 dbname=DB 이름 connect_timeout=2'
data_directory='/var/lib/pgsql/14/data'
service_start_command = 'sudo systemctl start postgresql-14.service'
service_stop_command = 'sudo systemctl stop postgresql-14.service'
service_restart_command = 'sudo systemctl restart postgresql-14.service'
service_reload_command = 'sudo systemctl reload postgresql-14.service'

환경변수에 비밀번호 입력

su - postgres
export PGPASSWORD=포스트그래스 비밀번호

repmgr master 서버를 primary 로 등록

repmgr primary register

상태확인

repmgr cluster show

data 폴더 .postgresql.conf.swp 권한 조정

chown postgres:postgres /var/lib/pgsql/14/data
chown postgres:postgres /var/lib/pgsql/14/data/.postgresql.conf.swp
chmod 700 /var/lib/pgsql/14/data/.postgresql.conf.swp
sudo chmod 750 /var/lib/pgsql/14/data
sudo rm /var/lib/pgsql/14/data/.pg_hba.conf.swo

master 재시작

systemctl restart postgresql-14.service

Slave 서버 설정

postgres 종료

systemctl stop postgresql-14.service
vim /etc/repmgr/14/repmgr.conf

slave 1

#################################################################################################
##  CUSTOM SETTING      repmgr slave1 setting                                                  ##
#################################################################################################
#
node_id=2
node_name='slave1'
conninfo='host=SLAVE1 아이피 user=계정명 dbname=DB이름 connect_timeout=2'
data_directory='/var/lib/pgsql/14/data'
service_start_command = 'sudo systemctl start postgresql-14.service'
service_stop_command = 'sudo systemctl stop postgresql-14.service'
service_restart_command = 'sudo systemctl restart postgresql-14.service'
service_reload_command = 'sudo systemctl reload postgresql-14.service'

slave 2

#################################################################################################
##  CUSTOM SETTING      repmgr slave2 setting                                                  ##
#################################################################################################
#
node_id=3
node_name='slave2'
conninfo='host=SLAVE2 아이피 user=계정명 dbname=DB이름 connect_timeout=2'
data_directory='/var/lib/pgsql/14/data'
service_start_command = 'sudo systemctl start postgresql-14.service'
service_stop_command = 'sudo systemctl stop postgresql-14.service'
service_restart_command = 'sudo systemctl restart postgresql-14.service'
service_reload_command = 'sudo systemctl reload postgresql-14.service'

master data 복사

	su - postgres
repmgr -h 마스터아이피 -U 유저이름 -d DB이름 standby clone --force

swp error 발생시

vi / vim 파일 수정 및 작성 중 정상적으로 작성하지 않고 얘기치못하게 종료되거나 하면 swp || swn || swo 등의 임시파일을 만드는데 이것을 삭제 해주면 됨 파일을 확인 하는방법은 vim 들어갔을 때 Warning 문구가 나오면 R 을 눌러 리커버리 창을 확인하고 swp 파일의 디렉토리와 이름을 기록하고 다시 터미널에 나와서 삭제 하면됨

Using swap file "/var/lib/pgsql/14/data/.pg_hba.conf.swo"

Original file "/var/lib/pgsql/14/data/pg_hba.conf"

E308: Warning: Original file may have been changed

E309: Unable to read block 1 from /var/lib/pgsql/14/data/.pg_hba.conf.swo

db 재실행

 systemctl start postgresql-14.service

DB(노드) 등록

PGPASSWORD=비밀번호 repmgr standby register 

정상적으로 등록되면

PGPASSWORD=비밀번호 repmgr cluster show


Fail over

공통 (master && slave)

repmgr.conf 설정

vim /etc/repmgr/14/repmgr.conf
failover='automatic'
promote_command='repmgr standby promote'
follow_command='repmgr standby follow --upstream-node-id=%n'
repmgrd_service_start_command = 'sudo systemctl start repmgr14.service'
repmgrd_service_stop_command = 'sudo systemctl stop repmgr14.service'
log_file='/var/log/repmgr/repmgrd.log'
reconnect_attempts = 3
reconnect_interval = 5

전에 설정한 파일과 같이보면 master node의 모습은 다음과 같다

log 설정

로그가 쌓이지 않고 주기적으로 정리될 수 있도록 설정 주기는 공식 홈페이지 설정 repmgr -h 49.50.161.76 -U admin-d stock standby clone –force

vim /etc/logrotate.d/repmgr
/var/log/repmgr/repmgrd.log { 
		missingok 
		compress 
		rotate 52 
		maxsize 100M 
		weekly 
		create 0600 postgres postgres 
		postrotate /usr/bin/killall -HUP repmgrd 
		endscript 
}

설치

systemctl start repmgr14.service 실행 실패시

vim /etc/systemd/system/repmgr14.service
[Unit] 
Description=REPMGR for PostgreSQL 14 

[Service] 
Type=simple 
User=postgres 
Group=postgres 
ExecStart=/usr/pgsql-14/bin/repmgr -f /etc/repmgr/14/repmgr.conf daemon start
ExecStop=/usr/pgsql-14/bin/repmgr -f /etc/repmgr/14/repmgr.conf daemon stop 
Restart=on-failure 
RestartSec=5s 

[Install] 
WantedBy=multi-user.target

작성 후

systemctl daemon-reload
sudo systemctl start repmgr14.service
sudo systemctl stop repmgr14.service

Fail over test

su - postgres
repmgr cluster show

Master

Slave

Master 를 정지해서 Slave가 활성화 되는지 확인 해야 함

Slave는 처음 Read Only 로 데이터 입력이 되지 않는다

Master 만 입력 가능한 상황

Master Server 정지

Master 정지 전 Slave cluster show

Master 정지 후 Slave cluster show

Slave DB에서 입력이 가능해짐

Master Server 실행

Master 실행 후 Slave Cluster Show

원래 Master 정지후 Slave로 낮춰 주어야 함

systemctl stop postgresql-14.service
repmgr -h {Slave IP} -U {유저명} -d {DB명} standby clone --force
systemctl start postgresql-14.service
repmgr standby register -F

다시 cluster show를 확인하면

Slave 였던 Server

Master 였던 Server

하지만 우리에게 필요한건 자동화


Failover 자동화

vim /etc/repmgr/14/repmgr.conf
event_notification_command='/var/lib/pgsql/failover_promote.sh %n %e'
event_notifications='standby_promote'

failover_promote 스크립트 만들기

vi /var/lib/pgsql/failover_promote.sh
#!/bin/bash
up_id=$1
name=$2
if [ $name == 'standby_promote' ]; then
declare -A servers
if [ $up_id == 1 ] ; then
        down_id=2
else
        down_id=1
fi
servers[2]="2번 노드 아이피"
servers[1]="1번 노드 아이피"
while [[ `(ssh -o ConnectTimeout=5 ${servers[$down_id]} echo ok 2>&1)` != 'ok' ]]
do
sleep 2
echo Failed node ${servers[$down_id]} is not reachable >> /var/log/repmgr/repmgrd.log
done
ssh -T ${servers[$down_id]} << EOF
 sleep 5
 sudo systemctl stop postgresql-14.service && sleep 10
 rm -rf /var/lib/pgsql/14/data
 repmgr -h ${servers[$up_id]} -d testdb -U dbuser standby clone --force &&
 sudo systemctl start postgresql-14.service && sleep 2
 repmgr standby register -F
EOF
echo ${servers[$up_id]}
fi
chmod +x /var/lib/pgsql/failover_promote.sh
#!/bin/bash

# Primary server IP address
PRIMARY_IP=192.168.1.10

# Secondary server IP address
SECONDARY_IP=192.168.1.11

# Tertiary server IP address
TERTIARY_IP=192.168.1.12

# PostgreSQL database name
DBNAME=mydatabase

# PostgreSQL superuser name
PGUSER=postgres

# Replication user name
REPLICATION_USER=repluser

# Replication user password
REPLICATION_PASSWORD=replpassword

# Path to repmgr configuration file
REPMGR_CONF=/etc/repmgr/repmgr.conf

# Path to repmgr log file
REPMGR_LOG=/var/log/repmgr/repmgr.log

# Function to promote a standby server to primary
function promote_standby() {
  echo "Promoting standby server..."
  repmgr standby promote -f $REPMGR_CONF --log-to-file >> $REPMGR_LOG 2>&1
}

# Function to follow the new primary server
function follow_primary() {
  echo "Following new primary server..."
  repmgr standby follow -f $REPMGR_CONF --log-to-file >> $REPMGR_LOG 2>&1
}

# Check if the primary server is available
echo "Checking primary server..."
nc -z -w1 $PRIMARY_IP 5432
PRIMARY_STATUS=$?

# Check if the secondary server is available
echo "Checking secondary server..."
nc -z -w1 $SECONDARY_IP 5432
SECONDARY_STATUS=$?

# Check if the tertiary server is available
echo "Checking tertiary server..."
nc -z -w1 $TERTIARY_IP 5432
TERTIARY_STATUS=$?

# If the primary server is down, promote the secondary server to primary
if [ $PRIMARY_STATUS -ne 0 ]; then
  echo "Primary server is down, promoting secondary server..."
  promote_standby
  exit 0
fi

# If the secondary server is down, promote the tertiary server to primary
if [ $SECONDARY_STATUS -ne 0 ]; then
  echo "Secondary server is down, promoting tertiary server..."
  promote_standby
  follow_primary
  exit 0
fi

# If the tertiary server is down, promote the secondary server to primary
if [ $TERTIARY_STATUS -ne 0 ]; then
  echo "Tertiary server is down, promoting secondary server..."
  promote_standby
  follow_primary
  exit 0
fi

echo "All servers are running, no action needed."

#!/bin/bash

# Set node IDs and names
node1_id=1
node1_name="node1"
node2_id=2
node2_name="node2"
node3_id=3
node3_name="node3"

# Set node IPs
node1_ip="49.50.161.76"
node2_ip="49.50.162.160"
node3_ip="49.50.165.53"

# Set PostgreSQL database and replication user information
dbname="testdb"
pguser="dbuser"
repluser="repluser"
replpassword="replpassword"

# Set repmgr configuration file and log file paths
repmgr_conf="/etc/repmgr/14/repmgr.conf"
repmgr_log="/var/log/repmgr/repmgrd.log"

# Promote standby node to primary
promote_standby() {
  local node_id=$1
  local node_name=$2
  local primary_ip=$3
  local standby_ip=$4

  # Check if standby node is reachable
  while [[ `(nc -z -w1 $standby_ip 5432; echo $?)` != '0' ]]
  do
    sleep 2
    echo "Failed node $node_name is not reachable" >> $repmgr_log
  done

  # Stop PostgreSQL service on primary node
  ssh -T $primary_ip << EOF
    sudo systemctl stop postgresql-14.service
    sleep 10
EOF

  # Remove data directory on primary node
  ssh -T $primary_ip "rm -rf /var/lib/pgsql/14/data"

  # Clone standby node from primary node
  ssh -T $standby_ip << EOF
    sudo -u postgres repmgr -h $primary_ip -d $dbname -U $pguser standby clone --force
EOF

  # Start PostgreSQL service on primary node
  ssh -T $primary_ip << EOF
    sudo systemctl start postgresql-14.service
    sleep 2
EOF

  # Register standby node as a new primary node
  ssh -T $standby_ip << EOF
    sudo -u postgres repmgr primary register
EOF

  echo "Promoted $node_name to primary node" >> $repmgr_log
}

# Follow new primary node
follow_primary() {
  local node_id=$1
  local node_name=$2
  local primary_ip=$3
  local standby_ip=$4

  # Check if standby node is reachable
  while [[ `(nc -z -w1 $standby_ip 5432; echo $?)` != '0' ]]
  do
    sleep 2
    echo "Failed node $node_name is not reachable" >> $repmgr_log
  done

  # Stop PostgreSQL service on standby node
  ssh -T $standby_ip << EOF
    sudo systemctl stop postgresql-14.service
    sleep 10
EOF

  # Remove data directory on standby node
  ssh -T $standby_ip "rm -rf /var/lib/pgsql/14/data"

  # Clone new primary node from standby node
  ssh -T $primary_ip << EOF
    sudo -u postgres repmgr -h $standby_ip -d $dbname -U $pguser standby clone --force
EOF

  # Start PostgreSQL service on standby node
  ssh -T $standby_ip << EOF
    sudo systemctl start postgresql-14
EOF

`repmgr`을 사용하여 primary 노드를 standby 노드로 낮추는 방법은 다음과 같습니다.

1.  primary 노드에서 `repmgr standby switchover` 명령어를 실행합니다.

bashCopy code

`repmgr standby switchover -f /etc/repmgr/14/repmgr.conf --log-to-file`

위 명령어를 실행하면, primary 노드가 standby 노드로 낮추어지고, 해당 노드에서 실행되던 PostgreSQL 서비스가 중지됩니다.

2.  다른 standby 노드를 새로운 primary 노드로 승격합니다.

bashCopy code

`repmgr standby promote -f /etc/repmgr/14/repmgr.conf --log-to-file`

위 명령어를 실행하면, standby 노드가 primary 노드로 승격되고, 해당 노드에서 실행되던 PostgreSQL 서비스가 중지됩니다.

3.  낮추어진 primary 노드를 standby 노드로 등록합니다.

bashCopy code

`repmgr standby follow -f /etc/repmgr/14/repmgr.conf --log-to-file --upstream-node-id=<새로운 primary 노드 ID>`

위 명령어를 실행하면, primary 노드가 standby 노드로 등록되고, 해당 노드에서 실행되던 PostgreSQL 서비스가 시작됩니다.

위 과정을 수행하면, 기존 primary 노드가 standby 노드로 낮추어지고, 새로운 primary 노드가 승격되어 replication 과정이 다시 시작됩니다.
`repmgr`을 사용하여 standby 노드를 primary 노드로 승격하고, 새로운 standby 노드를 등록하는 방법은 다음과 같습니다.

1.  standby 노드를 primary 노드로 승격합니다.

bashCopy code

`repmgr standby promote -f /etc/repmgr/14/repmgr.conf --log-to-file`

위 명령어를 실행하면 standby 노드가 primary 노드로 승격되고, 해당 노드에서 실행되던 PostgreSQL 서비스가 중지됩니다.

2.  승격된 primary 노드를 새로운 standby 노드로 등록합니다.

bashCopy code

`repmgr standby follow -f /etc/repmgr/14/repmgr.conf --log-to-file --upstream-node-id=<primary 노드 ID>`

위 명령어를 실행하면 승격된 primary 노드가 standby 노드로 등록되고, 해당 노드에서 실행되던 PostgreSQL 서비스가 시작됩니다.

위 과정을 수행하면, 기존 standby 노드가 primary 노드로 승격되고, 새로운 standby 노드가 등록되어 replication 과정이 다시 시작됩니다.