Skip to content

BGP daemon does not check last keepalive > hold timeΒ #634

Open
@floatingstatic

Description

Description
I have a scenario where the passive BGP speaker in pmacct (sfacctd) sometimes doesn't receive a BGP notification or TCP reset to signal that a neighbor has gone away. This is due the host running the non-pmacctd bgp daemon being taken offline in a very ungraceful way. When this happens it seems that sfacctd considered the BGP session established for a very long time which appears to be the product of linux default tcp timers (net.ipv4.tcp_keepalive_time = 7200 in the upstream docker container for sfacctd). While I can mess around with the tcp stack settings to try to step around this it seems this should be something the BGP daemon in pmacct handles.

To reproduce this I built a small docker-compose environment that has two containers, pmacct/sfacctd:latest and gobgpd.
sfacctd runs on ip 10.200.0.10
gobgp runs on 10.200.0.20

The sfacctd config looks like:

!
daemonize: false
sfacctd_port: 6343
!
bgp_daemon: true
bgp_daemon_max_peers: 4
bgp_daemon_as: 65534
bgp_daemon_port: 179
bgp_agent_map: /etc/pmacct/bgp.map

The gobgp config looks like:

[global.config]
  as = 65534
  router-id = "10.200.0.20"
  port = 179

[[peer-groups]]
  [peer-groups.config]
    peer-group-name = "telemetry"
    peer-as = 65534
    local-as = 65534
    [peer-groups.transport.config]
        local-address = "10.200.0.20"
        remote-port = 179
    [[peer-groups.afi-safis]]
      [peer-groups.afi-safis.config]
        afi-safi-name = "ipv4-unicast"
    [[peer-groups.afi-safis]]
      [peer-groups.afi-safis.config]
        afi-safi-name = "ipv6-unicast"

[[neighbors]]
  [neighbors.config]
    neighbor-address = "10.200.0.10"
    peer-group = "telemetry"

I spin up both containers and let them form an MP-BGP session (no routes are actually exchanged). Once established to reproduce this behavior I null route 10.200.0.10/32 on the gobgp container. We see something like this:

nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): maximum BGP peers allowed: 4
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): waiting for BGP data on :::179
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core ): waiting for sFlow data on :::6343
nco-4618-sfacctd-keepalive-sfacctd-1  | OK ( default_memory/memory ): waiting for data on: '/tmp/collect.pipe'
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): [10.200.0.20] BGP peers usage: 1/4
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): [10.200.0.20] Capability: MultiProtocol [1] AFI [1] SAFI [1]
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): [10.200.0.20] Capability: MultiProtocol [1] AFI [2] SAFI [1]
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): [10.200.0.20] Capability: 4-bytes AS [41] ASN [65534]
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): [10.200.0.20] BGP_OPEN: Local AS: 65534 Remote AS: 65534 HoldTime: 90
nco-4618-sfacctd-keepalive-gobgp-1    | time="2022-08-16T16:12:26Z" level=info msg="Peer Up" Key=10.200.0.10 State=BGP_FSM_OPENCONFIRM Topic=Peer
nco-4618-sfacctd-keepalive-gobgp-1    | time="2022-08-16T16:14:26Z" level=warning msg="hold timer expired" Key=10.200.0.10 State=BGP_FSM_ESTABLISHED Topic=Peer
nco-4618-sfacctd-keepalive-gobgp-1    | time="2022-08-16T16:14:26Z" level=warning msg="sent notification" Code=4 Data="[]" Key=10.200.0.10 State=BGP_FSM_ESTABLISHED Subcode=0 Topic=Peer
nco-4618-sfacctd-keepalive-gobgp-1    | time="2022-08-16T16:14:26Z" level=info msg="Peer Down" Key=10.200.0.10 Reason=hold-timer-expired State=BGP_FSM_ESTABLISHED Topic=Peer

gobgp successfully detects that the neighbor is down after the 90 second hold time expires but sfacctd logs nothing. The local linux stack on the sfacctd container shows the following and stays this way for hours:

root@sfacctd:/# netstat -an | grep 10.200.0.20
tcp6       0      0 10.200.0.10:179         10.200.0.20:48777       ESTABLISHED

From sfacctd's perspective the neighbor is "stuck" this way for a long time. The kicker is in my production environment I use tmp_bgp_lookup_compare_ports (not used in the experiment above) which means that we only compare router id so a new connection attempt coming in after my remote BGP speaker reboots is rejected by sfacctd because it thinks there is an existing BGP session from that router id.

Would it be possible to add some sort of background thread to check keepalive vs hold time for neighbors? Something similar to

if ((now - peers[peers_check_idx].last_keepalive) > peers[peers_check_idx].ht) {
but as a background thread?

Version

root@sfacctd:/# sfacctd -V
sFlow Accounting Daemon, sfacctd 1.7.7-git [20211107-0 (ef37a415)]

Arguments:
 '--enable-mysql' '--enable-pgsql' '--enable-sqlite3' '--enable-kafka' '--enable-geoipv2' '--enable-jansson' '--enable-rabbitmq' '--enable-nflog' '--enable-ndpi' '--enable-zmq' '--enable-avro' '--enable-serdes' '--enable-redis' '--enable-gnutls' 'AVRO_CFLAGS=-I/usr/local/avro/include' 'AVRO_LIBS=-L/usr/local/avro/lib -lavro' '--enable-l2' '--enable-traffic-bins' '--enable-bgp-bins' '--enable-bmp-bins' '--enable-st-bins'

Libs:
cdada 0.3.5
libpcap version 1.8.1
MariaDB 10.3.31
PostgreSQL 110013
sqlite3 3.27.2
rabbimq-c 0.11.0
rdkafka 1.8.2
jansson 2.14
MaxmindDB 1.6.0
ZeroMQ 4.3.2
Redis 1.0.3
GnuTLS 3.6.7
avro-c
serdes
nDPI 3.4.0
netfilter_log

Plugins:
memory
print
nfprobe
sfprobe
tee
mysql
postgresql
sqlite
amqp
kafka

System:
Linux 5.10.104-linuxkit #1 SMP Thu Mar 17 17:08:06 UTC 2022 x86_64

Compiler:
gcc 8.3.0

Appreciation
Please consider starring this project to boost our reach on github!

πŸ‘πŸΌ DONE!

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions