BGP daemon does not check last keepalive > hold timeΒ #634
Description
Description
I have a scenario where the passive BGP speaker in pmacct (sfacctd) sometimes doesn't receive a BGP notification or TCP reset to signal that a neighbor has gone away. This is due the host running the non-pmacctd bgp daemon being taken offline in a very ungraceful way. When this happens it seems that sfacctd considered the BGP session established for a very long time which appears to be the product of linux default tcp timers (net.ipv4.tcp_keepalive_time = 7200
in the upstream docker container for sfacctd). While I can mess around with the tcp stack settings to try to step around this it seems this should be something the BGP daemon in pmacct handles.
To reproduce this I built a small docker-compose environment that has two containers, pmacct/sfacctd:latest
and gobgpd
.
sfacctd runs on ip 10.200.0.10
gobgp runs on 10.200.0.20
The sfacctd config looks like:
!
daemonize: false
sfacctd_port: 6343
!
bgp_daemon: true
bgp_daemon_max_peers: 4
bgp_daemon_as: 65534
bgp_daemon_port: 179
bgp_agent_map: /etc/pmacct/bgp.map
The gobgp config looks like:
[global.config]
as = 65534
router-id = "10.200.0.20"
port = 179
[[peer-groups]]
[peer-groups.config]
peer-group-name = "telemetry"
peer-as = 65534
local-as = 65534
[peer-groups.transport.config]
local-address = "10.200.0.20"
remote-port = 179
[[peer-groups.afi-safis]]
[peer-groups.afi-safis.config]
afi-safi-name = "ipv4-unicast"
[[peer-groups.afi-safis]]
[peer-groups.afi-safis.config]
afi-safi-name = "ipv6-unicast"
[[neighbors]]
[neighbors.config]
neighbor-address = "10.200.0.10"
peer-group = "telemetry"
I spin up both containers and let them form an MP-BGP session (no routes are actually exchanged). Once established to reproduce this behavior I null route 10.200.0.10/32 on the gobgp container. We see something like this:
nco-4618-sfacctd-keepalive-sfacctd-1 | INFO ( default/core/BGP ): maximum BGP peers allowed: 4
nco-4618-sfacctd-keepalive-sfacctd-1 | INFO ( default/core/BGP ): waiting for BGP data on :::179
nco-4618-sfacctd-keepalive-sfacctd-1 | INFO ( default/core ): waiting for sFlow data on :::6343
nco-4618-sfacctd-keepalive-sfacctd-1 | OK ( default_memory/memory ): waiting for data on: '/tmp/collect.pipe'
nco-4618-sfacctd-keepalive-sfacctd-1 | INFO ( default/core/BGP ): [10.200.0.20] BGP peers usage: 1/4
nco-4618-sfacctd-keepalive-sfacctd-1 | INFO ( default/core/BGP ): [10.200.0.20] Capability: MultiProtocol [1] AFI [1] SAFI [1]
nco-4618-sfacctd-keepalive-sfacctd-1 | INFO ( default/core/BGP ): [10.200.0.20] Capability: MultiProtocol [1] AFI [2] SAFI [1]
nco-4618-sfacctd-keepalive-sfacctd-1 | INFO ( default/core/BGP ): [10.200.0.20] Capability: 4-bytes AS [41] ASN [65534]
nco-4618-sfacctd-keepalive-sfacctd-1 | INFO ( default/core/BGP ): [10.200.0.20] BGP_OPEN: Local AS: 65534 Remote AS: 65534 HoldTime: 90
nco-4618-sfacctd-keepalive-gobgp-1 | time="2022-08-16T16:12:26Z" level=info msg="Peer Up" Key=10.200.0.10 State=BGP_FSM_OPENCONFIRM Topic=Peer
nco-4618-sfacctd-keepalive-gobgp-1 | time="2022-08-16T16:14:26Z" level=warning msg="hold timer expired" Key=10.200.0.10 State=BGP_FSM_ESTABLISHED Topic=Peer
nco-4618-sfacctd-keepalive-gobgp-1 | time="2022-08-16T16:14:26Z" level=warning msg="sent notification" Code=4 Data="[]" Key=10.200.0.10 State=BGP_FSM_ESTABLISHED Subcode=0 Topic=Peer
nco-4618-sfacctd-keepalive-gobgp-1 | time="2022-08-16T16:14:26Z" level=info msg="Peer Down" Key=10.200.0.10 Reason=hold-timer-expired State=BGP_FSM_ESTABLISHED Topic=Peer
gobgp successfully detects that the neighbor is down after the 90 second hold time expires but sfacctd logs nothing. The local linux stack on the sfacctd container shows the following and stays this way for hours:
root@sfacctd:/# netstat -an | grep 10.200.0.20
tcp6 0 0 10.200.0.10:179 10.200.0.20:48777 ESTABLISHED
From sfacctd's perspective the neighbor is "stuck" this way for a long time. The kicker is in my production environment I use tmp_bgp_lookup_compare_ports
(not used in the experiment above) which means that we only compare router id so a new connection attempt coming in after my remote BGP speaker reboots is rejected by sfacctd because it thinks there is an existing BGP session from that router id.
Would it be possible to add some sort of background thread to check keepalive vs hold time for neighbors? Something similar to
Line 893 in cd2dc26
Version
root@sfacctd:/# sfacctd -V
sFlow Accounting Daemon, sfacctd 1.7.7-git [20211107-0 (ef37a415)]
Arguments:
'--enable-mysql' '--enable-pgsql' '--enable-sqlite3' '--enable-kafka' '--enable-geoipv2' '--enable-jansson' '--enable-rabbitmq' '--enable-nflog' '--enable-ndpi' '--enable-zmq' '--enable-avro' '--enable-serdes' '--enable-redis' '--enable-gnutls' 'AVRO_CFLAGS=-I/usr/local/avro/include' 'AVRO_LIBS=-L/usr/local/avro/lib -lavro' '--enable-l2' '--enable-traffic-bins' '--enable-bgp-bins' '--enable-bmp-bins' '--enable-st-bins'
Libs:
cdada 0.3.5
libpcap version 1.8.1
MariaDB 10.3.31
PostgreSQL 110013
sqlite3 3.27.2
rabbimq-c 0.11.0
rdkafka 1.8.2
jansson 2.14
MaxmindDB 1.6.0
ZeroMQ 4.3.2
Redis 1.0.3
GnuTLS 3.6.7
avro-c
serdes
nDPI 3.4.0
netfilter_log
Plugins:
memory
print
nfprobe
sfprobe
tee
mysql
postgresql
sqlite
amqp
kafka
System:
Linux 5.10.104-linuxkit #1 SMP Thu Mar 17 17:08:06 UTC 2022 x86_64
Compiler:
gcc 8.3.0
Appreciation
Please consider starring this project to boost our reach on github!
ππΌ DONE!