BGP daemon does not check last keepalive > hold time

**Description**
I have a scenario where the passive BGP speaker in pmacct (sfacctd) sometimes doesn't receive a BGP notification or TCP reset to signal that a neighbor has gone away. This is due the host running the non-pmacctd bgp daemon being taken offline in a very ungraceful way. When this happens it seems that sfacctd considered the BGP session established for a very long time which appears to be the product of linux default tcp timers (`net.ipv4.tcp_keepalive_time = 7200` in the upstream docker container for sfacctd). While I can mess around with the tcp stack settings to try to step around this it seems this should be something the BGP daemon in pmacct handles.

To reproduce this I built a small docker-compose environment that has two containers, `pmacct/sfacctd:latest` and `gobgpd`.
sfacctd runs on ip 10.200.0.10
gobgp runs on 10.200.0.20

The sfacctd config looks like:

```
!
daemonize: false
sfacctd_port: 6343
!
bgp_daemon: true
bgp_daemon_max_peers: 4
bgp_daemon_as: 65534
bgp_daemon_port: 179
bgp_agent_map: /etc/pmacct/bgp.map
```

The gobgp config looks like:

```
[global.config]
  as = 65534
  router-id = "10.200.0.20"
  port = 179

[[peer-groups]]
  [peer-groups.config]
    peer-group-name = "telemetry"
    peer-as = 65534
    local-as = 65534
    [peer-groups.transport.config]
        local-address = "10.200.0.20"
        remote-port = 179
    [[peer-groups.afi-safis]]
      [peer-groups.afi-safis.config]
        afi-safi-name = "ipv4-unicast"
    [[peer-groups.afi-safis]]
      [peer-groups.afi-safis.config]
        afi-safi-name = "ipv6-unicast"

[[neighbors]]
  [neighbors.config]
    neighbor-address = "10.200.0.10"
    peer-group = "telemetry"

```

I spin up both containers and let them form an MP-BGP session (no routes are actually exchanged). Once established to reproduce this behavior I null route 10.200.0.10/32 on the gobgp container. We see something like this:

```
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): maximum BGP peers allowed: 4
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): waiting for BGP data on :::179
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core ): waiting for sFlow data on :::6343
nco-4618-sfacctd-keepalive-sfacctd-1  | OK ( default_memory/memory ): waiting for data on: '/tmp/collect.pipe'
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): [10.200.0.20] BGP peers usage: 1/4
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): [10.200.0.20] Capability: MultiProtocol [1] AFI [1] SAFI [1]
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): [10.200.0.20] Capability: MultiProtocol [1] AFI [2] SAFI [1]
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): [10.200.0.20] Capability: 4-bytes AS [41] ASN [65534]
nco-4618-sfacctd-keepalive-sfacctd-1  | INFO ( default/core/BGP ): [10.200.0.20] BGP_OPEN: Local AS: 65534 Remote AS: 65534 HoldTime: 90
nco-4618-sfacctd-keepalive-gobgp-1    | time="2022-08-16T16:12:26Z" level=info msg="Peer Up" Key=10.200.0.10 State=BGP_FSM_OPENCONFIRM Topic=Peer
nco-4618-sfacctd-keepalive-gobgp-1    | time="2022-08-16T16:14:26Z" level=warning msg="hold timer expired" Key=10.200.0.10 State=BGP_FSM_ESTABLISHED Topic=Peer
nco-4618-sfacctd-keepalive-gobgp-1    | time="2022-08-16T16:14:26Z" level=warning msg="sent notification" Code=4 Data="[]" Key=10.200.0.10 State=BGP_FSM_ESTABLISHED Subcode=0 Topic=Peer
nco-4618-sfacctd-keepalive-gobgp-1    | time="2022-08-16T16:14:26Z" level=info msg="Peer Down" Key=10.200.0.10 Reason=hold-timer-expired State=BGP_FSM_ESTABLISHED Topic=Peer
```

gobgp successfully detects that the neighbor is down after the 90 second hold time expires but sfacctd logs nothing. The local linux stack on the sfacctd container shows the following and stays this way for hours:

```
root@sfacctd:/# netstat -an | grep 10.200.0.20
tcp6       0      0 10.200.0.10:179         10.200.0.20:48777       ESTABLISHED
```

From sfacctd's perspective the neighbor is "stuck" this way for a long time. The kicker is in my production environment I use `tmp_bgp_lookup_compare_ports` (not used in the experiment above) which means that we only compare router id so a new connection attempt coming in after my remote BGP speaker reboots is rejected by sfacctd because it thinks there is an existing BGP session from that router id.

Would it be possible to add some sort of background thread to check keepalive vs hold time for neighbors? Something similar to https://github.com/pmacct/pmacct/blob/cd2dc261f682563de52829799aee40ed8bd12f13/src/bgp/bgp.c#L893 but as a background thread?

**Version**
```
root@sfacctd:/# sfacctd -V
sFlow Accounting Daemon, sfacctd 1.7.7-git [20211107-0 (ef37a415)]

Arguments:
 '--enable-mysql' '--enable-pgsql' '--enable-sqlite3' '--enable-kafka' '--enable-geoipv2' '--enable-jansson' '--enable-rabbitmq' '--enable-nflog' '--enable-ndpi' '--enable-zmq' '--enable-avro' '--enable-serdes' '--enable-redis' '--enable-gnutls' 'AVRO_CFLAGS=-I/usr/local/avro/include' 'AVRO_LIBS=-L/usr/local/avro/lib -lavro' '--enable-l2' '--enable-traffic-bins' '--enable-bgp-bins' '--enable-bmp-bins' '--enable-st-bins'

Libs:
cdada 0.3.5
libpcap version 1.8.1
MariaDB 10.3.31
PostgreSQL 110013
sqlite3 3.27.2
rabbimq-c 0.11.0
rdkafka 1.8.2
jansson 2.14
MaxmindDB 1.6.0
ZeroMQ 4.3.2
Redis 1.0.3
GnuTLS 3.6.7
avro-c
serdes
nDPI 3.4.0
netfilter_log

Plugins:
memory
print
nfprobe
sfprobe
tee
mysql
postgresql
sqlite
amqp
kafka

System:
Linux 5.10.104-linuxkit #1 SMP Thu Mar 17 17:08:06 UTC 2022 x86_64

Compiler:
gcc 8.3.0
```

**Appreciation**
Please consider starring this project to boost our reach on github!

👍🏼 DONE!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BGP daemon does not check last keepalive > hold time #634

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development