Skip to content

FATAL: could not receive data from WAL stream: #3131

Open
@adnanhamdussalam

Description

What happened?

after starting the patroni service to create standby using create replica method.
Patroni creates an additional segment wal files due to which the primary side is unaware and standby database gets out of sync and facing below error:
2024-08-20 06:37:05.605 EDT [372906] LOG: started streaming WAL from primary at E/3C000000 on timeline 30
2024-08-20 06:37:05.605 EDT [372906] FATAL: could not receive data from WAL stream: ERROR: requested starting point E/3C000000 is ahe
2024-08-20 06:37:05.606 EDT [372738] LOG: waiting for WAL to become available at E/3C0000B8

How can we reproduce it (as minimally and precisely as possible)?

steps:
set the patroni yml file and starts the patroni service and error starts FATAL: could not receive data from WAL stream:

What did you expect to happen?

simply stop the service of patroni and started service of postgresql

I try it by simple perform it manually I have executed pg_basebackup command it on standby:

pg_basebackup -h 10.114.16.28 -D /data/ctdatabase -P -U replica_user -R -X stream -c fast --create-slot --slot=testbed04

it starts syncing with primary side

Patroni/PostgreSQL/DCS version

  • Patroni version: 3.3.2
  • PostgreSQL version: 16.3
  • DCS (and its version): etcdctl version: 3.5.9 API version: 3.5

Patroni configuration file

scope: postgres-cluster2
namespace: /mydb2/
name: testbed04

log:
  format: '%(asctime)s %(levelname)s: %(message)s'
  level: INFO
  max_queue_size: 1000
  traceback_level: ERROR
  type: plain

restapi:
  connect_address: 10.114.16.33:8008
  listen: 10.114.16.33:8008

etcd3:
  host:
     10.114.16.51:2379

# The bootstrap configuration. Works only when the cluster is not yet initialized.
# If the cluster is already initialized, all changes in the `bootstrap` section are ignored!
bootstrap:
  # This section will be written into <dcs>:/<namespace>/<scope>/config after initializing
  # new cluster and all other cluster members will use it as a `global configuration`.
  # WARNING! If you want to change any of the parameters that were set up
  # via `bootstrap.dcs` section, please use `patronictl edit-config`!
  dcs:
    loop_wait: 10
    retry_timeout: 10
    ttl: 30
    postgresql:
      parameters:
        DateStyle: ISO, MDY
        TimeZone: America/New_York
        cluster_name: ''
        default_text_search_config: pg_catalog.english
        dynamic_shared_memory_type: posix
        hot_standby: 'on'
        lc_messages: en_US.UTF-8
        lc_monetary: en_US.UTF-8
        lc_numeric: en_US.UTF-8
        lc_time: en_US.UTF-8
        log_destination: stderr
        log_directory: log
        log_filename: postgresql-%a.log
        log_line_prefix: '%m [%p] '
        log_rotation_age: 1d
        log_rotation_size: '0'
        log_timezone: America/New_York
        log_truncate_on_rotation: 'on'
        logging_collector: 'on'
        max_connections: '100'
        max_locks_per_transaction: '64'
        max_prepared_transactions: '200'
        max_replication_slots: '10'
        max_wal_senders: '10'
        max_wal_size: 1GB
        max_worker_processes: '8'
        min_wal_size: 80MB
        shared_buffers: 128MB
        shared_preload_libraries: citus
        track_commit_timestamp: 'off'
        wal_keep_size: '0'
        wal_level: replica
        wal_log_hints: 'on'
      use_slots: true
citus:
  group: 0  # 0 for coordinator and 1, 2, 3, etc for workers
  database: ctafiniti  # must be the same on all nodes


postgresql:
  authentication:
    replication:
      password: test
      username: replica_user
    superuser:
      password: test
      username: postgres
  create_replica_methods:
        - basebackup
  basebackup:
        wal-method: 'stream'
        checkpoint: 'fast'
  bin_dir: /usr/pgsql-16/bin
  connect_address: 10.114.16.33:5432
  data_dir: /data/ctdatabase
  listen: 10.114.16.33:5432
  parameters:
    hba_file: /data/ctdatabase/pg_hba.conf
    ident_file: /data/ctdatabase/pg_ident.conf
  pg_hba:
  - local   all             all                                     peer
  - host    all             all             10.0.0.0/8              trust
  - host    all             all             127.0.0.1/32            trust
  - host    all             all             ::1/128                 trust
  - local   replication     all                                     peer
  - host    replication     replica_user         10.114.16.28/32              md5
  - host    replication     replica_user         10.114.16.33/32              md5

tags:
  nofailover: true
  noloadbalance: false
  nostream: false
  nosync: false

patronictl show-config

loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    archive_command: cp %p /data/archive/%f
    archive_mode: 'on'
    archive_timeout: 1800s
    hot_standby: 'on'
    max_connections: 100
    max_locks_per_transaction: 64
    max_replication_slots: 10
    max_wal_senders: 10
    max_worker_processes: 8
    shared_preload_libraries: citus,pg_cron
    ssl_dh_params_file: /data/ctdatabase/dhparams.pem
    synchronous_commit: 'on'
    synchronous_standby_names: '*'
    unix_socket_directories: /run/postgresql
    wal_keep_size: 16
    wal_level: replica
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
synchronous_mode: true
ttl: 30

Patroni log files

● patroni.service - Runners to orchestrate a high-availability PostgreSQL
     Loaded: loaded (/usr/lib/systemd/system/patroni.service; disabled; preset: disabled)
     Active: active (running) since Tue 2024-08-20 06:31:20 EDT; 20min ago
   Main PID: 372548 (patroni)
      Tasks: 13 (limit: 98870)
     Memory: 10.2G
        CPU: 33.143s
     CGroup: /system.slice/patroni.service
             ├─372548 /usr/bin/python3 /usr/bin/patroni /etc/patroni/patroni.yml
             ├─372730 /usr/pgsql-16/bin/postgres -D /data/ctdatabase --config-file=/data/ctdatabase/postgresql.conf --listen_addresses>
             ├─372735 "postgres: postgres-cluster2: logger "
             ├─372736 "postgres: postgres-cluster2: checkpointer "
             ├─372737 "postgres: postgres-cluster2: background writer "
             ├─372738 "postgres: postgres-cluster2: startup recovering 0000001E0000000E0000003C"
             └─372745 "postgres: postgres-cluster2: postgres postgres 10.114.16.33(45616) idle"

Aug 20 06:49:51 testbed04 patroni[372548]: 2024-08-20 15:49:51,660 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:50:01 testbed04 patroni[372548]: 2024-08-20 15:50:01,706 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:50:11 testbed04 patroni[372548]: 2024-08-20 15:50:11,660 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:50:21 testbed04 patroni[372548]: 2024-08-20 15:50:21,704 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:50:31 testbed04 patroni[372548]: 2024-08-20 15:50:31,660 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:50:41 testbed04 patroni[372548]: 2024-08-20 15:50:41,704 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:50:51 testbed04 patroni[372548]: 2024-08-20 15:50:51,660 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:51:01 testbed04 patroni[372548]: 2024-08-20 15:51:01,704 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:51:11 testbed04 patroni[372548]: 2024-08-20 15:51:11,660 INFO: no action. I am (testbed04), a secondary, and following a lea>
Aug 20 06:51:21 testbed04 patroni[372548]: 2024-08-20 15:51:21,705 INFO: no action. I am (testbed04), a secondary, and following a lea>

PostgreSQL log files

2024-08-20 06:36:45.598 EDT [372854] LOG:  started streaming WAL from primary at E/3C000000 on timeline 30
2024-08-20 06:36:45.598 EDT [372854] FATAL:  could not receive data from WAL stream: ERROR:  requested starting point E/3C000000 is ahe
2024-08-20 06:36:45.598 EDT [372738] LOG:  waiting for WAL to become available at E/3C0000B8
2024-08-20 06:36:50.600 EDT [372877] LOG:  started streaming WAL from primary at E/3C000000 on timeline 30
2024-08-20 06:36:50.600 EDT [372877] FATAL:  could not receive data from WAL stream: ERROR:  requested starting point E/3C000000 is ahe
2024-08-20 06:36:50.600 EDT [372738] LOG:  waiting for WAL to become available at E/3C0000B8
2024-08-20 06:36:55.605 EDT [372882] LOG:  started streaming WAL from primary at E/3C000000 on timeline 30
2024-08-20 06:36:55.605 EDT [372882] FATAL:  could not receive data from WAL stream: ERROR:  requested starting point E/3C000000 is ahe
2024-08-20 06:36:55.605 EDT [372738] LOG:  waiting for WAL to become available at E/3C0000B8
2024-08-20 06:37:00.609 EDT [372898] LOG:  started streaming WAL from primary at E/3C000000 on timeline 30
2024-08-20 06:37:00.609 EDT [372898] FATAL:  could not receive data from WAL stream: ERROR:  requested starting point E/3C000000 is ahe
2024-08-20 06:37:00.609 EDT [372738] LOG:  waiting for WAL to become available at E/3C0000B8

Have you tried to use GitHub issue search?

  • Yes

Anything else we need to know?

patroni_create_replica_method_issue.docx
PFA the details of segment wal files as snapshots

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions