Configure High Availability using Autofailover Extension PostgreSQL14

 
*************************************************************************
Replication configuration using Auto-failover Extension PostgreSQL 14
*************************************************************************

My current environment configuration:


admin/monitor node    : 172.20.10.100
Master db                      : 172.20.10.102
Standy01                   : 172.20.10.103
Standby02                     : 172.20.10.104


Install auto-failover software on all the 4 nodes using root
**************************************************|

# wget https://install.citusdata.com/community/rpm.sh .  <--- it will create repository

[root@admin01 ~]# chmod 777 rpm.sh
[root@admin01 ~]# ./rpm.sh
Detected operating system as centos/7.
Checking for curl...
Detected curl...
Checking for postgresql14-server...
Installing pgdg14 repo... done.
Checking for EPEL repositories...
Detected EPEL repoitories
Downloading repository file: https://repos.citusdata.com/community/config_file.repo?os=centos&dist=7&source=script... done.
Installing pygpgme to verify GPG signatures... done.
Installing yum-utils... done.
Generating yum cache for citusdata_community... done.

The repository is set up! You can now install packages.
You have new mail in /var/spool/mail/root


[root@admin01 ~]# yum install pg-auto-failover16_14 full -y

Create monitor server:
=================

su - postgres

mkdir /var/lib/pgsql/14/data  (all 4 nodes)

-bash-4.2$ cat .bash_profile
export PATH=$PATH:/usr/pgsql-14/bin
export PGDATA=/var/lib/pgsql/14/data
export PGPORT=5432

-bash-4.2$ pg_autoctl --version
pg_autoctl version 1.6.3
pg_autoctl extension version 1.6
compiled with PostgreSQL 14.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
compatible with Postgres 10, 11, 12, 13, and 14


nohup pg_autoctl create monitor --ssl-self-signed --auth trust --run &   <----------- run in the background


16:43:39 4793 INFO  Using default --ssl-mode "require"
16:43:39 4793 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
16:43:39 4793 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
16:43:39 4793 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
16:43:39 4793 INFO  Initialising a PostgreSQL cluster at "/var/lib/pgsql/14/data"
16:43:39 4793 INFO  /usr/pgsql-14/bin/pg_ctl initdb -s -D /var/lib/pgsql/14/data --option '--auth=trust'
16:43:39 4793 INFO   /bin/openssl req -new -x509 -days 365 -nodes -text -out /var/lib/pgsql/14/data/server.crt -keyout /var/lib/pgsql/14/data/server.key -subj "/CN=admin01.localdomain"
16:43:39 4793 INFO  Started pg_autoctl postgres service with pid 4809
16:43:39 4793 INFO  Started pg_autoctl listener service with pid 4810
16:43:39 4809 INFO   /usr/pgsql-14/bin/pg_autoctl do service postgres --pgdata /var/lib/pgsql/14/data -v
16:43:39 4814 INFO   /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data -p 5432 -h *
16:44:29 4906 INFO   /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data -p 5432 -h *
16:44:29 4809 WARN  PostgreSQL was not running, restarted with pid 4906
16:44:29 4886 WARN  NOTICE:  installing required extension "btree_gist"
16:44:29 4886 INFO  Granting connection privileges on 172.20.10.0/24
16:44:29 4886 INFO  Reloading Postgres configuration and HBA rules
16:44:29 4886 INFO  Your pg_auto_failover monitor instance is now ready on port 5432.
16:44:29 4886 INFO  Monitor has been successfully initialized.
16:44:29 4886 INFO   /usr/pgsql-14/bin/pg_autoctl do service listener --pgdata /var/lib/pgsql/14/data -v
16:44:29 4886 INFO  Managing the monitor at postgres://autoctl_node@admin01.localdomain:5432/pg_auto_failover?sslmode=require
16:44:29 4886 INFO  Reloaded the new configuration from "/var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/14/data/pg_autoctl.cfg"
16:44:29 4886 INFO  Reloading Postgres configuration and HBA rules
16:44:30 4886 INFO  The version of extension "pgautofailover" is "1.6" on the monitor
16:44:30 4886 INFO  Contacting the monitor to LISTEN to its events.
16:44:55 5027 INFO  Using default --ssl-mode "require"
16:44:55 5027 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
16:44:55 5027 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
16:44:55 5027 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
16:44:55 5027 FATAL pg_autoctl is already running with pid 4793


-bash-4.2$ nohup pg_autoctl run &
[1] 24337
-bash-4.2$ nohup: ignoring input and appending output to ‘nohup.out’

-bash-4.2$ jobs
[1]+  Running                 nohup pg_autoctl run &


Add all the servers IP in monitor node hba configuration file.

# IPv4 local connections:
host    all             postgres        172.20.10.100/32       trust
host    all             postgres        172.20.10.101/32       trust
host    all             postgres        172.20.10.102/32       trust
host    all             postgres        172.20.10.103/32       trust

# replication privilege.
host    replication             postgres        172.20.10.100/32       trust
host    replication             postgres        172.20.10.101/32       trust
host    replication             postgres        172.20.10.102/32       trust
host    replication             postgres        172.20.10.103/32       trust


-- Roload to re-read the configurations of hba file

-bash-4.2$ /usr/pgsql-14/bin/pg_ctl -D /var/lib/pgsql/14/data/ reload
server signaled

-bash-4.2$ psql
psql (15.6, server 14.11)
Type "help" for help.

postgres=# select * from pg_hba_file_rules;
 line_number | type  |   database    |          user_name          |       address        |                 netmask                 | auth_method | options | error
-------------+-------+---------------+-----------------------------+----------------------+-----------------------------------------+-------------+---------+-------
          89 | local | {all}         | {all}                       |                      |                                         | trust       |         |
          91 | host  | {all}         | {all}                       | 127.0.0.1            | 255.255.255.255                         | trust       |         |
          93 | host  | {all}         | {all}                       | ::1                  | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         |
          96 | local | {replication} | {all}                       |                      |                                         | trust       |         |
          97 | host  | {replication} | {all}                       | 127.0.0.1            | 255.255.255.255                         | trust       |         |
          98 | host  | {replication} | {all}                       | ::1                  | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         |
          99 | host  | {autodb}      | {postgres}                  | master01.localdomain |                                         | trust       |         |
         100 | host  | {all}         | {pgautofailover_monitor}    | admin01.localdomain  |                                         | trust       |         |
         101 | host  | {replication} | {pgautofailover_replicator} | rep01.localdomain    |                                         | trust       |         |
         102 | host  | {autodb}      | {pgautofailover_replicator} | rep01.localdomain    |                                         | trust       |         |
         103 | host  | {replication} | {pgautofailover_replicator} | rep02.localdomain    |                                         | trust       |         |
         104 | host  | {autodb}      | {pgautofailover_replicator} | rep02.localdomain    |                                         | trust       |         |
         105 | host  | {replication} | {pgautofailover_replicator} | master01.localdomain |                                         | trust       |         |
         106 | host  | {autodb}      | {pgautofailover_replicator} | master01.localdomain |                                         | trust       |         |
(14 rows)

postgres=#


-bash-4.2$ pg_autoctl show uri
        Type |    Name | Connection String
-------------+---------+-------------------------------
     monitor | monitor | postgres://autoctl_node@admin01.localdomain:5432/pg_auto_failover
   formation | default |


-- stop firewall on all servers

systemctl stop firewalld.service
systemctl disable firewalld.service



build master server:
*********************
-bash-4.2$ cat .bash_profile
export PATH=$PATH:/usr/pgsql-14/bin
export PGDATA=/var/lib/pgsql/14/data/
export PGPORT=5432


-bash-4.2$ pg_autoctl create postgres --dbname autodb --auth trust --no-ssl --monitor postgres://autoctl_node@admin01.localdomain:5432/pg_auto_failover

6:43:39 4793 INFO  Using default --ssl-mode "require"
16:43:39 4793 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
16:43:39 4793 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
16:43:39 4793 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
16:43:39 4793 INFO  Initialising a PostgreSQL cluster at "/var/lib/pgsql/14/data"
16:43:39 4793 INFO  /usr/pgsql-14/bin/pg_ctl initdb -s -D /var/lib/pgsql/14/data --option '--auth=trust'
16:43:39 4793 INFO   /bin/openssl req -new -x509 -days 365 -nodes -text -out /var/lib/pgsql/14/data/server.crt -keyout /var/lib/pgsql/14/data/server.key -subj "/CN=admin01.localdomain"
16:43:39 4793 INFO  Started pg_autoctl postgres service with pid 4809
16:43:39 4793 INFO  Started pg_autoctl listener service with pid 4810
16:43:39 4809 INFO   /usr/pgsql-14/bin/pg_autoctl do service postgres --pgdata /var/lib/pgsql/14/data -v
16:43:39 4814 INFO   /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data -p 5432 -h *
16:44:29 4886 INFO  Granting connection privileges on 172.20.10.0/24
16:44:29 4886 INFO  Reloading Postgres configuration and HBA rules
16:44:29 4886 INFO  Your pg_auto_failover monitor instance is now ready on port 5432.
16:44:29 4886 INFO  Monitor has been successfully initialized.
16:44:29 4886 INFO   /usr/pgsql-14/bin/pg_autoctl do service listener --pgdata /var/lib/pgsql/14/data -v
16:44:29 4886 INFO  Managing the monitor at postgres://autoctl_node@admin01.localdomain:5432/pg_auto_failover?sslmode=require
16:44:29 4886 INFO  Reloaded the new configuration from "/var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/14/data/pg_autoctl.cfg"
16:44:29 4886 INFO  Reloading Postgres configuration and HBA rules
16:44:30 4886 INFO  The version of extension "pgautofailover" is "1.6" on the monitor
16:44:30 4886 INFO  Contacting the monitor to LISTEN to its events.
16:44:55 5027 INFO  Using default --ssl-mode "require"
16:44:55 5027 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
16:44:55 5027 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
16:44:55 5027 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
16:44:55 5027 FATAL pg_autoctl is already running with pid 4793


nohup pg_autoctl run &

->make 0.0.0.0/0 entries in pg_hba.conf file.[instead of all zeros keep each and every servers]

->reload postgresql services


check formation URI:


-bash-4.2$ pg_autoctl show uri
        Type |    Name | Connection String
-------------+---------+-------------------------------
     monitor | monitor | postgres://autoctl_node@admin01.localdomain:5432/pg_auto_failover?sslmode=require
   formation | default | postgres://master01.localdomain:5432/autodb?target_session_attrs=read-write&sslmode=require


-bash-4.2$ pg_autoctl show state
  Name |  Node |                 Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
-------+-------+---------------------------+----------------+--------------+---------------------+--------------------
node_1 |     1 | master01.localdomain:5432 |   1: 0/17CFAC8 |   read-write |              single |              single




build standby1:
***************
vi .bash_profile

export PATH=$PATH:/usr/pgsql-14/bin
export PGDATA=/var/lib/pgsql/14/data/
export PGPORT=5002

-bash-4.2$ pg_autoctl create postgres --dbname autodb --auth trust --no-ssl --monitor postgres://autoctl_node@admin01.localdomain:5432/pg_auto_failover

12:58:38 9894 WARN  No encryption is used for network traffic! This allows an attacker on the network to read all replication data.
12:58:38 9894 WARN  Using --ssl-self-signed instead of --no-ssl is recommend to achieve more security with the same ease of deployment.
12:58:38 9894 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details on how to improve
12:58:38 9894 INFO  Using default --ssl-mode "prefer"
12:58:38 9894 INFO  Connecting to 172.20.10.100 (port 5432)
12:58:38 9894 INFO  Using --hostname "rep01.localdomain", which resolves to IP address "172.20.10.102"
12:58:38 9894 INFO  Started pg_autoctl postgres service with pid 9896
12:58:38 9894 INFO  Started pg_autoctl node-init service with pid 9897
12:58:38 9896 INFO   /usr/pgsql-14/bin/pg_autoctl do service postgres --pgdata /var/lib/pgsql/14/data -v
12:58:38 9897 INFO  Registered node 2 "node_2" (rep01.localdomain:5432)in formation "default", group 0, state "wait_standby"
12:58:38 9897 INFO  Writing keeper state file at "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/14/data/pg_autoctl.state"
12:58:38 9897 INFO  Writing keeper init state file at "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/14/data/pg_autoctl.init"
12:58:38 9897 INFO  Successfully registered as "wait_standby" to the monitor.
12:58:38 9897 INFO  FSM transition from "init" to "wait_standby": Start following a primary
12:58:38 9897 INFO  Transition complete: current state is now "wait_standby"
12:58:38 9897 INFO  New state for node 1 "node_1" (master01.localdomain:5432): single ➜ wait_primary
12:58:38 9897 INFO  New state for node 1 "node_1" (master01.localdomain:5432): wait_primary ➜ wait_primary
12:58:38 9897 INFO  FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
12:58:38 9897 INFO  Initialising PostgreSQL as a hot standby
12:58:38 9897 INFO   /usr/pgsql-14/bin/pg_basebackup -w -d 'application_name=pgautofailover_standby_2 host=master01.localdomain port=5432 user=pgautofailover_replicator sslmode=prefer' --pgdata /var/lib/pgsql/14/backup/node_2 -U pgautofailover_replicator --verbose --progress --max-rate 100M --wal-method=stream --slot pgautofailover_standby_2
12:58:38 9897 INFO  pg_basebackup: initiating base backup, waiting for checkpoint to complete
12:58:41 9897 INFO  pg_basebackup: checkpoint completed
12:58:41 9897 INFO  pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
12:58:41 9897 INFO  pg_basebackup: starting background WAL receiver
12:58:41 9897 INFO      0/35810 kB (0%), 0/1 tablespace (...ql/14/backup/node_2/backup_label)
12:58:41 9897 INFO  35820/35820 kB (100%), 0/1 tablespace (.../backup/node_2/global/pg_control)
12:58:41 9897 INFO  35820/35820 kB (100%), 1/1 tablespace                                         
12:58:41 9897 INFO  pg_basebackup: write-ahead log end point: 0/2000138
12:58:41 9897 INFO  pg_basebackup: waiting for background process to finish streaming ...
12:58:41 9897 INFO  pg_basebackup: syncing data to disk ...
12:58:41 9897 INFO  pg_basebackup: renaming backup_manifest.tmp to backup_manifest
12:58:41 9897 INFO  pg_basebackup: base backup completed
12:58:41 9897 INFO  Creating the standby signal file at "/var/lib/pgsql/14/data/standby.signal", and replication setup at "/var/lib/pgsql/14/data/postgresql-auto-failover-standby.conf"
12:58:41 9904 INFO   /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data -p 5432 -h *
12:58:42 9897 INFO  PostgreSQL started on port 5432
12:58:42 9897 INFO  Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes.
12:58:42 9897 INFO  Ensuring HBA rules for node 1 "node_1" (master01.localdomain:5432)
12:58:42 9897 INFO  Adding HBA rule: host replication "pgautofailover_replicator" master01.localdomain trust
12:58:42 9897 INFO  Adding HBA rule: host "autodb" "pgautofailover_replicator" master01.localdomain trust
12:58:42 9897 INFO  Writing new HBA rules in "/var/lib/pgsql/14/data/pg_hba.conf"
12:58:42 9897 INFO  Reloading Postgres configuration and HBA rules
12:58:42 9897 INFO  Transition complete: current state is now "catchingup"
12:58:42 9896 INFO  Postgres is now serving PGDATA "/var/lib/pgsql/14/data" on port 5432 with pid 9904
12:58:42 9897 INFO  keeper has been successfully initialized.
12:58:42 9894 WARN  pg_autoctl service node-init exited with exit status 0
12:58:42 9896 INFO  Postgres controller service received signal SIGTERM, terminating
12:58:42 9896 INFO  Stopping pg_autoctl postgres service
12:58:42 9896 INFO  /usr/pgsql-14/bin/pg_ctl --pgdata /var/lib/pgsql/14/data --wait stop --mode fast
12:58:42 9894 INFO  Stop pg_autoctl



-bash-4.2$ nohup pg_autoctl run &


-bash-4.2$ pg_autoctl show uri
        Type |    Name | Connection String
-------------+---------+-------------------------------
     monitor | monitor | postgres://autoctl_node@admin01.localdomain:5432/pg_auto_failover?sslmode=require
   formation | default | postgres://master01.localdomain:5432,rep01.localdomain:5432/autodb?target_session_attrs=read-write&sslmode=require



-bash-4.2$ pg_autoctl show state
  Name |  Node |                 Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
-------+-------+---------------------------+----------------+--------------+---------------------+--------------------
node_1 |     1 | master01.localdomain:5432 |   1: 0/3000110 |   read-write |             primary |             primary
node_2 |     2 |    rep01.localdomain:5432 |   1: 0/3000110 |    read-only |           secondary |           secondary



->To make it consitent across the cluster, add entries in pg_hba.conf file.

build standby server2:
**********************
vi .bash_profile

export PATH=$PATH:/usr/pgsql-14/bin
export PGDATA=/var/lib/pgsql/14/data/
export PGPORT=5003

-bash-4.2$ pg_autoctl create postgres --dbname autodb --auth trust --no-ssl --monitor postgres://autoctl_node@admin01.localdomain:5432/pg_auto_failover

13:00:48 10067 WARN  No encryption is used for network traffic! This allows an attacker on the network to read all replication data.
13:00:48 10067 WARN  Using --ssl-self-signed instead of --no-ssl is recommend to achieve more security with the same ease of deployment.
13:00:48 10067 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details on how to improve
13:00:48 10067 INFO  Using default --ssl-mode "prefer"
13:00:48 10067 INFO  Connecting to 172.20.10.100 (port 5432)
13:00:48 10067 INFO  Using --hostname "rep02.localdomain", which resolves to IP address "172.20.10.103"
13:00:48 10067 INFO  Started pg_autoctl postgres service with pid 10069
13:00:48 10067 INFO  Started pg_autoctl node-init service with pid 10070
13:00:48 10069 INFO   /usr/pgsql-14/bin/pg_autoctl do service postgres --pgdata /var/lib/pgsql/14/data -v
13:00:48 10070 INFO  Registered node 3 "node_3" (rep02.localdomain:5432)in formation "default", group 0, state "wait_standby"
13:00:48 10070 INFO  Writing keeper state file at "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/14/data/pg_autoctl.state"
13:00:48 10070 INFO  Writing keeper init state file at "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/14/data/pg_autoctl.init"
13:00:48 10070 INFO  Successfully registered as "wait_standby" to the monitor.
13:00:48 10070 INFO  FSM transition from "init" to "wait_standby": Start following a primary
13:00:48 10070 INFO  Transition complete: current state is now "wait_standby"
13:00:48 10070 WARN  Failed to connect to "postgres://pgautofailover_replicator@master01.localdomain:5432/autodb?", retrying until the server is ready
13:00:49 10070 INFO  Successfully connected to "postgres://pgautofailover_replicator@master01.localdomain:5432/autodb?" after 5 attempts in 95 ms.
13:00:50 10070 INFO  FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
13:00:50 10070 INFO  Initialising PostgreSQL as a hot standby
13:00:50 10070 INFO   /usr/pgsql-14/bin/pg_basebackup -w -d 'application_name=pgautofailover_standby_3 host=master01.localdomain port=5432 user=pgautofailover_replicator sslmode=prefer' --pgdata /var/lib/pgsql/14/backup/node_3 -U pgautofailover_replicator --verbose --progress --max-rate 100M --wal-method=stream --slot pgautofailover_standby_3
13:00:50 10070 INFO  pg_basebackup: initiating base backup, waiting for checkpoint to complete
13:00:50 10070 INFO  pg_basebackup: checkpoint completed
13:00:50 10070 INFO  pg_basebackup: write-ahead log start point: 0/4000028 on timeline 1
13:00:50 10070 INFO  pg_basebackup: starting background WAL receiver
13:00:50 10070 INFO      0/35813 kB (0%), 0/1 tablespace (...ql/14/backup/node_3/backup_label)
13:00:50 10070 INFO  35823/35823 kB (100%), 0/1 tablespace (.../backup/node_3/global/pg_control)
13:00:50 10070 INFO  35823/35823 kB (100%), 1/1 tablespace                                         
13:00:50 10070 INFO  pg_basebackup: write-ahead log end point: 0/4000100
13:00:50 10070 INFO  pg_basebackup: waiting for background process to finish streaming ...
13:00:50 10070 INFO  pg_basebackup: syncing data to disk ...
13:00:50 10070 INFO  pg_basebackup: renaming backup_manifest.tmp to backup_manifest
13:00:50 10070 INFO  pg_basebackup: base backup completed
13:00:50 10070 INFO  Creating the standby signal file at "/var/lib/pgsql/14/data/standby.signal", and replication setup at "/var/lib/pgsql/14/data/postgresql-auto-failover-standby.conf"
13:00:50 10077 INFO   /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data -p 5432 -h *
13:00:51 10070 INFO  PostgreSQL started on port 5432
13:00:51 10070 INFO  Fetched current list of 2 other nodes from the monitor to update HBA rules, including 2 changes.
13:00:51 10070 INFO  Ensuring HBA rules for node 1 "node_1" (master01.localdomain:5432)
13:00:51 10070 INFO  Adding HBA rule: host replication "pgautofailover_replicator" master01.localdomain trust
13:00:51 10070 INFO  Adding HBA rule: host "autodb" "pgautofailover_replicator" master01.localdomain trust
13:00:51 10070 INFO  Ensuring HBA rules for node 2 "node_2" (rep01.localdomain:5432)
13:00:51 10070 INFO  Writing new HBA rules in "/var/lib/pgsql/14/data/pg_hba.conf"
13:00:51 10070 INFO  Reloading Postgres configuration and HBA rules
13:00:51 10069 INFO  Postgres is now serving PGDATA "/var/lib/pgsql/14/data" on port 5432 with pid 10077
13:00:51 10070 INFO  Transition complete: current state is now "catchingup"
13:00:51 10070 INFO  keeper has been successfully initialized.
13:00:51 10067 WARN  pg_autoctl service node-init exited with exit status 0
13:00:51 10069 INFO  Postgres controller service received signal SIGTERM, terminating
13:00:51 10069 INFO  Stopping pg_autoctl postgres service
13:00:51 10069 INFO  /usr/pgsql-14/bin/pg_ctl --pgdata /var/lib/pgsql/14/data --wait stop --mode fast
13:00:51 10067 INFO  Stop pg_autoctl


-bash-4.2$ nohup pg_autoctl run &

-bash-4.2$ pg_autoctl show uri
        Type |    Name | Connection String
-------------+---------+-------------------------------
     monitor | monitor | postgres://autoctl_node@admin01.localdomain:5432/pg_auto_failover?sslmode=require
   formation | default | postgres://rep01.localdomain:5432,master01.localdomain:5432,rep02.localdomain:5432/autodb?target_session_attrs=read-write&sslmode=require



cluster state:

-bash-4.2$ pg_autoctl show state
  Name |  Node |                 Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
-------+-------+---------------------------+----------------+--------------+---------------------+--------------------
node_1 |     1 | master01.localdomain:5432 |   1: 0/5000060 |   read-write |             primary |             primary
node_2 |     2 |    rep01.localdomain:5432 |   1: 0/5000060 |    read-only |           secondary |           secondary
node_3 |     3 |    rep02.localdomain:5432 |   1: 0/5000060 |    read-only |           secondary |           secondary




---> test connectivity from any of the server server , test the connectivity using connection URI. (READ-WRITE)

-bash-4.2$ psql postgres://master01.localdomain:5432,rep01.localdomain:5432,rep02.localdomain:5432/autodb?target_session_attrs=read-write
psql (15.6, server 14.11)
Type "help" for help.

autodb=# select inet_server_addr();
 inet_server_addr
------------------
 172.20.10.101
(1 row)

autodb=# \x
Expanded display is on.
autodb=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 10748
usesysid         | 16411
usename          | pgautofailover_replicator
application_name | pgautofailover_standby_2
client_addr      | 172.20.10.102
client_hostname  | rep01.localdomain
client_port      | 50833
backend_start    | 2024-03-23 12:59:01.007396-04
backend_xmin     |
state            | streaming
sent_lsn         | 0/5002170
write_lsn        | 0/5002170
flush_lsn        | 0/5002170
replay_lsn       | 0/5002170
write_lag        | 00:00:00.000468
flush_lag        | 00:00:00.001515
replay_lag       | 00:00:00.001806
sync_priority    | 1
sync_state       | quorum
reply_time       | 2024-03-23 15:30:49.360313-04
-[ RECORD 2 ]----+------------------------------
pid              | 11108
usesysid         | 16411
usename          | pgautofailover_replicator
application_name | pgautofailover_standby_3
client_addr      | 172.20.10.103
client_hostname  | rep02.localdomain
client_port      | 48331
backend_start    | 2024-03-23 13:01:03.656056-04
backend_xmin     |
state            | streaming
sent_lsn         | 0/5002170
write_lsn        | 0/5002170
flush_lsn        | 0/5002170
replay_lsn       | 0/5002170
write_lag        | 00:00:00.000523
flush_lag        | 00:00:00.001478
replay_lag       | 00:00:00.001771
sync_priority    | 1
sync_state       | quorum
reply_time       | 2024-03-23 15:30:47.256643-04



---> test connectivity from any of the server server , test the connectivity using connection URI. (READ-ONLY)

-bash-4.2$ psql postgres://master01.localdomain:5432,rep01.localdomain:5432,rep02.localdomain:5432/autodb?target_session_attrs=read-only
psql (15.6, server 14.11)
Type "help" for help.

autodb=# select inet_server_addr();
 inet_server_addr
------------------
 172.20.10.102
(1 row)

autodb=# \x
Expanded display is on.
autodb=# select * from pg_stat_replication;
(0 rows)





Switchover test:
***************************

-- connect Admin server ---


-bash-4.2$ pg_autoctl watch     <---------- to watch the live activity


Switchover Test-1:
---------------------

-bash-4.2$ pg_autoctl show state
  Name |  Node |                 Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
-------+-------+---------------------------+----------------+--------------+---------------------+--------------------
node_1 |     1 | master01.localdomain:5432 |   1: 0/50042B0 |   read-write |             primary |             primary
node_2 |     2 |    rep01.localdomain:5432 |   1: 0/50042B0 |    read-only |           secondary |           secondary
node_3 |     3 |    rep02.localdomain:5432 |   1: 0/50042B0 |    read-only |           secondary |           secondary


-bash-4.2$ pg_autoctl perform switchover
19:37:09 6159 INFO  Waiting 60 secs for a notification with state "primary" in formation "default" and group 0
19:37:09 6159 INFO  Listening monitor notifications about state changes in formation "default" and group 0
19:37:09 6159 INFO  Following table displays times when notifications are received
    Time |   Name |  Node |                 Host:Port |       Current State |      Assigned State
---------+--------+-------+---------------------------+---------------------+--------------------
19:37:09 | node_1 |     1 | master01.localdomain:5432 |             primary |            draining
19:37:09 | node_1 |     1 | master01.localdomain:5432 |            draining |            draining
19:37:09 | node_1 |     1 | master01.localdomain:5432 |            draining |          report_lsn
19:37:09 | node_2 |     2 |    rep01.localdomain:5432 |           secondary |          report_lsn
19:37:09 | node_3 |     3 |    rep02.localdomain:5432 |           secondary |          report_lsn
19:37:09 | node_2 |     2 |    rep01.localdomain:5432 |          report_lsn |          report_lsn
19:37:09 | node_3 |     3 |    rep02.localdomain:5432 |          report_lsn |          report_lsn
19:37:10 | node_1 |     1 | master01.localdomain:5432 |          report_lsn |          report_lsn
19:37:10 | node_2 |     2 |    rep01.localdomain:5432 |          report_lsn |   prepare_promotion
19:37:10 | node_2 |     2 |    rep01.localdomain:5432 |   prepare_promotion |   prepare_promotion
19:37:10 | node_2 |     2 |    rep01.localdomain:5432 |   prepare_promotion |        wait_primary
19:37:10 | node_1 |     1 | master01.localdomain:5432 |          report_lsn |      join_secondary
19:37:10 | node_3 |     3 |    rep02.localdomain:5432 |          report_lsn |      join_secondary
19:37:11 | node_1 |     1 | master01.localdomain:5432 |      join_secondary |      join_secondary
19:37:11 | node_3 |     3 |    rep02.localdomain:5432 |      join_secondary |      join_secondary
19:37:11 | node_2 |     2 |    rep01.localdomain:5432 |        wait_primary |        wait_primary
19:37:11 | node_1 |     1 | master01.localdomain:5432 |      join_secondary |           secondary
19:37:11 | node_3 |     3 |    rep02.localdomain:5432 |      join_secondary |           secondary
19:37:12 | node_1 |     1 | master01.localdomain:5432 |           secondary |           secondary
19:37:12 | node_3 |     3 |    rep02.localdomain:5432 |           secondary |           secondary
19:37:12 | node_2 |     2 |    rep01.localdomain:5432 |        wait_primary |             primary
19:37:12 | node_2 |     2 |    rep01.localdomain:5432 |             primary |             primary

-bash-4.2$ pg_autoctl show state
  Name |  Node |                 Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
-------+-------+---------------------------+----------------+--------------+---------------------+--------------------
node_1 |     1 | master01.localdomain:5432 |   2: 0/5004568 |    read-only |           secondary |           secondary
node_2 |     2 |    rep01.localdomain:5432 |   2: 0/5004568 |   read-write |             primary |             primary
node_3 |     3 |    rep02.localdomain:5432 |   2: 0/5004568 |    read-only |           secondary |           secondary


Note: Applications should have connection retry functionality, then only connection will be established to new master server once it is back up and available.

Error while database switchover:

autodb=# \l
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (15.6, server 14.11)



--- test the connection post switchover


-bash-4.2$ psql postgres://master01.localdomain:5432,rep01.localdomain:5432,rep02.localdomain:5432/autodb?target_session_attrs=read-write
psql (15.6, server 14.11)
Type "help" for help.

autodb=# select inet_server_addr();
 inet_server_addr
------------------
 172.20.10.102
(1 row)

autodb=# \x
Expanded display is on.
autodb=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 31245
usesysid         | 16411
usename          | pgautofailover_replicator
application_name | pgautofailover_standby_3
client_addr      | 172.20.10.103
client_hostname  | rep02.localdomain
client_port      | 38929
backend_start    | 2024-03-23 15:37:11.999628-04
backend_xmin     |
state            | streaming
sent_lsn         | 0/5006070
write_lsn        | 0/5006070
flush_lsn        | 0/5006070
replay_lsn       | 0/5006070
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 1
sync_state       | quorum
reply_time       | 2024-03-23 15:40:23.13997-04
-[ RECORD 2 ]----+------------------------------
pid              | 31246
usesysid         | 16411
usename          | pgautofailover_replicator
application_name | pgautofailover_standby_1
client_addr      | 172.20.10.101
client_hostname  | master01.localdomain
client_port      | 55647
backend_start    | 2024-03-23 15:37:12.026932-04
backend_xmin     |
state            | streaming
sent_lsn         | 0/5006070
write_lsn        | 0/5006070
flush_lsn        | 0/5006070
replay_lsn       | 0/5006070
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 1
sync_state       | quorum
reply_time       | 2024-03-23 15:40:23.121922-04


Info: I have performed multiple switchover test by the switchver always happens between node-1 and node-2, it will never pick the node-3 because of the priority settings.



How to set the priority for failover and switchover:
*****************************************************


-bash-4.2$ pg_autoctl show settings
  Context |    Name |                   Setting | Value                                                       
----------+---------+---------------------------+-------------------------------------------------------------
formation | default |      number_sync_standbys | 1                                                           
  primary |  node_2 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_1, pgautofailover_standby_3)'
     node |  node_1 |        candidate priority | 50                                                          
     node |  node_2 |        candidate priority | 50                                                          
     node |  node_3 |        candidate priority | 50                                                          
     node |  node_1 |        replication quorum | true                                                        
     node |  node_2 |        replication quorum | true                                                        
     node |  node_3 |        replication quorum | true                                                        


-bash-4.2$ pg_autoctl set node candidate-priority --name node_3 70
19:44:04 11861 INFO  Waiting for the settings to have been applied to the monitor and primary node
19:44:04 11861 INFO  New state is reported by node 2 "node_2" (rep01.localdomain:5432): "apply_settings"
19:44:04 11861 INFO  Setting goal state of node 2 "node_2" (rep01.localdomain:5432) to primary after it applied replication properties change.
19:44:04 11861 INFO  New state is reported by node 2 "node_2" (rep01.localdomain:5432): "primary"
70

-bash-4.2$ pg_autoctl show settings
  Context |    Name |                   Setting | Value                                                       
----------+---------+---------------------------+-------------------------------------------------------------
formation | default |      number_sync_standbys | 1                                                           
  primary |  node_2 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_3, pgautofailover_standby_1)'
     node |  node_1 |        candidate priority | 50                                                          
     node |  node_2 |        candidate priority | 50                                                          
     node |  node_3 |        candidate priority | 70                                                          
     node |  node_1 |        replication quorum | true                                                        
     node |  node_2 |        replication quorum | true                                                        
     node |  node_3 |        replication quorum | true     



--- Now try another switchover post setting the candidate priority

-bash-4.2$ pg_autoctl perform switchover



-bash-4.2$ pg_autoctl show state
  Name |  Node |                 Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
-------+-------+---------------------------+----------------+--------------+---------------------+--------------------
node_1 |     1 | master01.localdomain:5432 |   2: 0/5006240 |    read-only |           secondary |           secondary
node_2 |     2 |    rep01.localdomain:5432 |   2: 0/5006240 |   read-write |             primary |             primary
node_3 |     3 |    rep02.localdomain:5432 |   2: 0/5006240 |    read-only |           secondary |           secondary

-bash-4.2$ pg_autoctl perform switchover
19:49:27 16218 INFO  Waiting 60 secs for a notification with state "primary" in formation "default" and group 0
19:49:27 16218 INFO  Listening monitor notifications about state changes in formation "default" and group 0
19:49:27 16218 INFO  Following table displays times when notifications are received
    Time |   Name |  Node |                 Host:Port |       Current State |      Assigned State
---------+--------+-------+---------------------------+---------------------+--------------------
19:49:27 | node_2 |     2 |    rep01.localdomain:5432 |             primary |            draining
19:49:27 | node_2 |     2 |    rep01.localdomain:5432 |            draining |            draining
19:49:27 | node_1 |     1 | master01.localdomain:5432 |           secondary |          report_lsn
19:49:27 | node_2 |     2 |    rep01.localdomain:5432 |            draining |          report_lsn
19:49:27 | node_3 |     3 |    rep02.localdomain:5432 |           secondary |          report_lsn
19:49:28 | node_1 |     1 | master01.localdomain:5432 |          report_lsn |          report_lsn
19:49:28 | node_3 |     3 |    rep02.localdomain:5432 |          report_lsn |          report_lsn
19:49:28 | node_2 |     2 |    rep01.localdomain:5432 |          report_lsn |          report_lsn
19:49:28 | node_3 |     3 |    rep02.localdomain:5432 |          report_lsn |   prepare_promotion
19:49:28 | node_3 |     3 |    rep02.localdomain:5432 |   prepare_promotion |   prepare_promotion
19:49:28 | node_3 |     3 |    rep02.localdomain:5432 |   prepare_promotion |        wait_primary
19:49:29 | node_2 |     2 |    rep01.localdomain:5432 |          report_lsn |      join_secondary
19:49:29 | node_1 |     1 | master01.localdomain:5432 |          report_lsn |      join_secondary
19:49:29 | node_3 |     3 |    rep02.localdomain:5432 |        wait_primary |        wait_primary
19:49:29 | node_2 |     2 |    rep01.localdomain:5432 |      join_secondary |      join_secondary
19:49:29 | node_2 |     2 |    rep01.localdomain:5432 |      join_secondary |           secondary
19:49:30 | node_1 |     1 | master01.localdomain:5432 |      join_secondary |      join_secondary
19:49:30 | node_1 |     1 | master01.localdomain:5432 |      join_secondary |           secondary
19:49:30 | node_1 |     1 | master01.localdomain:5432 |           secondary |           secondary
19:49:30 | node_3 |     3 |    rep02.localdomain:5432 |        wait_primary |             primary
19:49:30 | node_2 |     2 |    rep01.localdomain:5432 |           secondary |           secondary
19:49:30 | node_3 |     3 |    rep02.localdomain:5432 |             primary |             primary


-bash-4.2$ pg_autoctl show state
  Name |  Node |                 Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
-------+-------+---------------------------+----------------+--------------+---------------------+--------------------
node_1 |     1 | master01.localdomain:5432 |   3: 0/50064F8 |    read-only |           secondary |           secondary
node_2 |     2 |    rep01.localdomain:5432 |   3: 0/50064F8 |    read-only |           secondary |           secondary
node_3 |     3 |    rep02.localdomain:5432 |   3: 0/50064F8 |   read-write |             primary |             primary    ---------- it picked node-3




Failover test:
************
To simulate Master database crash, I will shut down master node to perform the failover

$ pg_autoctl stop    --------- run on master node

bash-4.2$ pg_autoctl show state
  Name |  Node |                 Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
-------+-------+---------------------------+----------------+--------------+---------------------+--------------------
node_1 |     1 | master01.localdomain:5432 |   4: 0/5006820 | read-write ! |             primary |             demoted   ---------- demoted
node_2 |     2 |    rep01.localdomain:5432 |   5: 0/5006B48 |    read-only |           secondary |           secondary
node_3 |     3 |    rep02.localdomain:5432 |   5: 0/5006B48 |   read-write |             primary |             primary   ------------- primary



Rejoin the failed node to cluster:
**********************************

start services on failed node:

-bash-4.2$ nohup pg_autoctl run &
[1] 3111
-bash-4.2$ nohup: ignoring input and appending output to ‘nohup.out’


bash-4.2$ pg_autoctl show state
  Name |  Node |                 Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
-------+-------+---------------------------+----------------+--------------+---------------------+--------------------
node_1 |     1 | master01.localdomain:5432 |   5: 0/7000000 |    read-only |           secondary |           secondary   --- rejoined as a Secondary
node_2 |     2 |    rep01.localdomain:5432 |   5: 0/7000000 |    read-only |           secondary |           secondary
node_3 |     3 |    rep02.localdomain:5432 |   5: 0/7000000 |   read-write |             primary |             primary



Note: Automatically failed node will be added as a secondary.


What if both standby's down at a time, all the writes will be hung state in master, in that case we need to change the sync configuration.


-bash-4.2$ pg_autoctl show settings
  Context |    Name |                   Setting | Value                                                       
----------+---------+---------------------------+-------------------------------------------------------------
formation | default |      number_sync_standbys | 1                                                           <---------- modify this value to 0
  primary |  node_3 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_1, pgautofailover_standby_2)'   
     node |  node_1 |        candidate priority | 50                                                          
     node |  node_2 |        candidate priority | 50                                                          
     node |  node_3 |        candidate priority | 70                                                          
     node |  node_1 |        replication quorum | true                                                        
     node |  node_2 |        replication quorum | true                                                        
     node |  node_3 |        replication quorum | true                                                        


like set the value of number_sync_standbys to 0.


-bash-4.2$ pg_autoctl set formation number-sync-standbys 0
20:02:25 26392 INFO  Waiting for the settings to have been applied to the monitor and primary node
20:02:25 26392 INFO  New state is reported by node 3 "node_3" (rep02.localdomain:5432): "apply_settings"
20:02:25 26392 INFO  Setting goal state of node 3 "node_3" (rep02.localdomain:5432) to primary after it applied replication properties change.
20:02:25 26392 INFO  New state is reported by node 3 "node_3" (rep02.localdomain:5432): "primary"
20:02:25 26392 INFO  primary node has now set synchronous_standby_names = 'ANY 1 (pgautofailover_standby_1, pgautofailover_standby_2)'
0
-bash-4.2$ pg_autoctl show settings
  Context |    Name |                   Setting | Value                                                       
----------+---------+---------------------------+-------------------------------------------------------------
formation | default |      number_sync_standbys | 0                                                            <----------  the value set to 0
  primary |  node_3 | synchronous_standby_names | 'ANY 1 (pgautofailover_standby_1, pgautofailover_standby_2)'
     node |  node_1 |        candidate priority | 50                                                          
     node |  node_2 |        candidate priority | 50                                                          
     node |  node_3 |        candidate priority | 70                                                          
     node |  node_1 |        replication quorum | true                                                        
     node |  node_2 |        replication quorum | true                                                        
     node |  node_3 |        replication quorum | true     


If we want to do so maintenance activities on the server, we have to set the cluster to maintenance state.

set maintance mode for the servers:
************************************


--- connect to a server where you want to enable maintenance


-bash-4.2$ pg_autoctl show state
  Name |  Node |                 Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
-------+-------+---------------------------+----------------+--------------+---------------------+--------------------
node_1 |     1 | master01.localdomain:5432 |   5: 0/7000230 |    read-only |           secondary |           secondary
node_2 |     2 |    rep01.localdomain:5432 |   5: 0/7000230 |    read-only |           secondary |           secondary
node_3 |     3 |    rep02.localdomain:5432 |   5: 0/7000230 |   read-write |             primary |             primary

-bash-4.2$ pg_autoctl enable maintenance
16:05:23 2811 INFO  Listening monitor notifications about state changes in formation "default" and group 0
16:05:23 2811 INFO  Following table displays times when notifications are received
    Time |   Name |  Node |                 Host:Port |       Current State |      Assigned State
---------+--------+-------+---------------------------+---------------------+--------------------
16:05:23 | node_2 |     2 |    rep01.localdomain:5432 |           secondary |         maintenance
16:05:23 | node_2 |     2 |    rep01.localdomain:5432 |         maintenance |         maintenance


-bash-4.2$ pg_autoctl show state
  Name |  Node |                 Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
-------+-------+---------------------------+----------------+--------------+---------------------+--------------------
node_1 |     1 | master01.localdomain:5432 |   5: 0/7000230 |    read-only |           secondary |           secondary
node_2 |     2 |    rep01.localdomain:5432 |   5: 0/7000230 |         none |         maintenance |         maintenance   ---------- it's now in maintenance state
node_3 |     3 |    rep02.localdomain:5432 |   5: 0/7000230 |   read-write |             primary |             primary


--- disable the maintenance state once the activity completed.



-bash-4.2$ pg_autoctl disable maintenance
16:06:32 2906 INFO  Listening monitor notifications about state changes in formation "default" and group 0
16:06:32 2906 INFO  Following table displays times when notifications are received
    Time |   Name |  Node |                 Host:Port |       Current State |      Assigned State
---------+--------+-------+---------------------------+---------------------+--------------------
16:06:33 | node_2 |     2 |    rep01.localdomain:5432 |         maintenance |          catchingup
16:06:33 | node_2 |     2 |    rep01.localdomain:5432 |          catchingup |          catchingup
16:06:33 | node_2 |     2 |    rep01.localdomain:5432 |          catchingup |           secondary
16:06:33 | node_2 |     2 |    rep01.localdomain:5432 |           secondary |           secondary

-bash-4.2$ pg_autoctl show state
  Name |  Node |                 Host:Port |       TLI: LSN |   Connection |      Reported State |      Assigned State
-------+-------+---------------------------+----------------+--------------+---------------------+--------------------
node_1 |     1 | master01.localdomain:5432 |   5: 0/9000000 |    read-only |           secondary |           secondary
node_2 |     2 |    rep01.localdomain:5432 |   5: 0/9000000 |    read-only |           secondary |           secondary
node_3 |     3 |    rep02.localdomain:5432 |   5: 0/9000000 |   read-write |             primary |             primary



Comments

Popular posts from this blog

Auditing in Oracle database

Tuples in Python

rs.stepDown() in MongoDB replication