Description
What happened?
after starting the patroni service to create standby using create replica method.
Patroni creates an additional segment wal files due to which the primary side is unaware and standby database gets out of sync and facing below error:
2024-08-20 06:37:05.605 EDT [372906] LOG: started streaming WAL from primary at E/3C000000 on timeline 30
2024-08-20 06:37:05.605 EDT [372906] FATAL: could not receive data from WAL stream: ERROR: requested starting point E/3C000000 is ahe
2024-08-20 06:37:05.606 EDT [372738] LOG: waiting for WAL to become available at E/3C0000B8
How can we reproduce it (as minimally and precisely as possible)?
steps:
set the patroni yml file and starts the patroni service and error starts FATAL: could not receive data from WAL stream:
What did you expect to happen?
simply stop the service of patroni and started service of postgresql
I try it by simple perform it manually I have executed pg_basebackup command it on standby:
pg_basebackup -h 10.114.16.28 -D /data/ctdatabase -P -U replica_user -R -X stream -c fast --create-slot --slot=testbed04
it starts syncing with primary side
Patroni/PostgreSQL/DCS version
- Patroni version: 3.3.2
- PostgreSQL version: 16.3
- DCS (and its version): etcdctl version: 3.5.9 API version: 3.5
Patroni configuration file
scope: postgres-cluster2
namespace: /mydb2/
name: testbed04
log:
format: '%(asctime)s %(levelname)s: %(message)s'
level: INFO
max_queue_size: 1000
traceback_level: ERROR
type: plain
restapi:
connect_address: 10.114.16.33:8008
listen: 10.114.16.33:8008
etcd3:
host:
10.114.16.51:2379
# The bootstrap configuration. Works only when the cluster is not yet initialized.
# If the cluster is already initialized, all changes in the `bootstrap` section are ignored!
bootstrap:
# This section will be written into <dcs>:/<namespace>/<scope>/config after initializing
# new cluster and all other cluster members will use it as a `global configuration`.
# WARNING! If you want to change any of the parameters that were set up
# via `bootstrap.dcs` section, please use `patronictl edit-config`!
dcs:
loop_wait: 10
retry_timeout: 10
ttl: 30
postgresql:
parameters:
DateStyle: ISO, MDY
TimeZone: America/New_York
cluster_name: ''
default_text_search_config: pg_catalog.english
dynamic_shared_memory_type: posix
hot_standby: 'on'
lc_messages: en_US.UTF-8
lc_monetary: en_US.UTF-8
lc_numeric: en_US.UTF-8
lc_time: en_US.UTF-8
log_destination: stderr
log_directory: log
log_filename: postgresql-%a.log
log_line_prefix: '%m [%p] '
log_rotation_age: 1d
log_rotation_size: '0'
log_timezone: America/New_York
log_truncate_on_rotation: 'on'
logging_collector: 'on'
max_connections: '100'
max_locks_per_transaction: '64'
max_prepared_transactions: '200'
max_replication_slots: '10'
max_wal_senders: '10'
max_wal_size: 1GB
max_worker_processes: '8'
min_wal_size: 80MB
shared_buffers: 128MB
shared_preload_libraries: citus
track_commit_timestamp: 'off'
wal_keep_size: '0'
wal_level: replica
wal_log_hints: 'on'
use_slots: true
citus:
group: 0 # 0 for coordinator and 1, 2, 3, etc for workers
database: ctafiniti # must be the same on all nodes
postgresql:
authentication:
replication:
password: test
username: replica_user
superuser:
password: test
username: postgres
create_replica_methods:
- basebackup
basebackup:
wal-method: 'stream'
checkpoint: 'fast'
bin_dir: /usr/pgsql-16/bin
connect_address: 10.114.16.33:5432
data_dir: /data/ctdatabase
listen: 10.114.16.33:5432
parameters:
hba_file: /data/ctdatabase/pg_hba.conf
ident_file: /data/ctdatabase/pg_ident.conf
pg_hba:
- local all all peer
- host all all 10.0.0.0/8 trust
- host all all 127.0.0.1/32 trust
- host all all ::1/128 trust
- local replication all peer
- host replication replica_user 10.114.16.28/32 md5
- host replication replica_user 10.114.16.33/32 md5
tags:
nofailover: true
noloadbalance: false
nostream: false
nosync: false
patronictl show-config
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
parameters:
archive_command: cp %p /data/archive/%f
archive_mode: 'on'
archive_timeout: 1800s
hot_standby: 'on'
max_connections: 100
max_locks_per_transaction: 64
max_replication_slots: 10
max_wal_senders: 10
max_worker_processes: 8
shared_preload_libraries: citus,pg_cron
ssl_dh_params_file: /data/ctdatabase/dhparams.pem
synchronous_commit: 'on'
synchronous_standby_names: '*'
unix_socket_directories: /run/postgresql
wal_keep_size: 16
wal_level: replica
use_pg_rewind: true
use_slots: true
retry_timeout: 10
synchronous_mode: true
ttl: 30
Patroni log files
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
Loaded: loaded (/usr/lib/systemd/system/patroni.service; disabled; preset: disabled)
Active: active (running) since Tue 2024-08-20 06:31:20 EDT; 20min ago
Main PID: 372548 (patroni)
Tasks: 13 (limit: 98870)
Memory: 10.2G
CPU: 33.143s
CGroup: /system.slice/patroni.service
├─372548 /usr/bin/python3 /usr/bin/patroni /etc/patroni/patroni.yml
├─372730 /usr/pgsql-16/bin/postgres -D /data/ctdatabase --config-file=/data/ctdatabase/postgresql.conf --listen_addresses>
├─372735 "postgres: postgres-cluster2: logger "
├─372736 "postgres: postgres-cluster2: checkpointer "
├─372737 "postgres: postgres-cluster2: background writer "
├─372738 "postgres: postgres-cluster2: startup recovering 0000001E0000000E0000003C"
└─372745 "postgres: postgres-cluster2: postgres postgres 10.114.16.33(45616) idle"
Aug 20 06:49:51 testbed04 patroni[372548]: 2024-08-20 15:49:51,660 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:50:01 testbed04 patroni[372548]: 2024-08-20 15:50:01,706 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:50:11 testbed04 patroni[372548]: 2024-08-20 15:50:11,660 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:50:21 testbed04 patroni[372548]: 2024-08-20 15:50:21,704 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:50:31 testbed04 patroni[372548]: 2024-08-20 15:50:31,660 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:50:41 testbed04 patroni[372548]: 2024-08-20 15:50:41,704 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:50:51 testbed04 patroni[372548]: 2024-08-20 15:50:51,660 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:51:01 testbed04 patroni[372548]: 2024-08-20 15:51:01,704 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:51:11 testbed04 patroni[372548]: 2024-08-20 15:51:11,660 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:51:21 testbed04 patroni[372548]: 2024-08-20 15:51:21,705 INFO: no action. I am (testbed04), a secondary, and following a lea>
PostgreSQL log files
2024-08-20 06:36:45.598 EDT [372854] LOG: started streaming WAL from primary at E/3C000000 on timeline 30
2024-08-20 06:36:45.598 EDT [372854] FATAL: could not receive data from WAL stream: ERROR: requested starting point E/3C000000 is ahe
2024-08-20 06:36:45.598 EDT [372738] LOG: waiting for WAL to become available at E/3C0000B8
2024-08-20 06:36:50.600 EDT [372877] LOG: started streaming WAL from primary at E/3C000000 on timeline 30
2024-08-20 06:36:50.600 EDT [372877] FATAL: could not receive data from WAL stream: ERROR: requested starting point E/3C000000 is ahe
2024-08-20 06:36:50.600 EDT [372738] LOG: waiting for WAL to become available at E/3C0000B8
2024-08-20 06:36:55.605 EDT [372882] LOG: started streaming WAL from primary at E/3C000000 on timeline 30
2024-08-20 06:36:55.605 EDT [372882] FATAL: could not receive data from WAL stream: ERROR: requested starting point E/3C000000 is ahe
2024-08-20 06:36:55.605 EDT [372738] LOG: waiting for WAL to become available at E/3C0000B8
2024-08-20 06:37:00.609 EDT [372898] LOG: started streaming WAL from primary at E/3C000000 on timeline 30
2024-08-20 06:37:00.609 EDT [372898] FATAL: could not receive data from WAL stream: ERROR: requested starting point E/3C000000 is ahe
2024-08-20 06:37:00.609 EDT [372738] LOG: waiting for WAL to become available at E/3C0000B8
Have you tried to use GitHub issue search?
- Yes
Anything else we need to know?
patroni_create_replica_method_issue.docx
PFA the details of segment wal files as snapshots