Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault during cleanup_server for bidirectional or with parallel stream tests ended early #1696

Open
MattCatz opened this issue May 10, 2024 · 5 comments

Comments

@MattCatz
Copy link
Contributor

Context

  • Version of iperf3: 3.16

  • Hardware: N/A

  • Operating system (and distribution, if any): 6.5.0-26-generic # 26~22.04.1-Ubuntu

  • Other relevant information (for example, non-default compilers,
    libraries, cross-compiling, etc.): N/A

Bug Report

While doing some testing I would occasionally use the wrong iperf flags/parameters and would terminate the test early rather than waiting for it to run completely.

  • Expected Behavior: Terminating a test early causes the client and server to stop testing. The client cleans up and terminates. The server cleans up and prepares for next test.

  • Actual Behavior: Server segfaults during cleanup

  • Steps to Reproduce

    1. Simulate a high-ish latency link on the loopback interface: tc qdisc add dev lo root netem delay 50ms
    2. Start server: iperf3 -s
    3. Start client and terminate test early: iperf3 -c 127.0.0.1 -t 10 -P 10 or iperf3 -c 127.0.0.1 -t 10 --bidir
      • It seems to be a race condition so to have better chances at finding it I'll typically run something like for i in $(seq 100); do iperf3 -c 127.0.0.1 -t 10 -P 10; done then repeatedly use ctrl-c to kill tests.
    4. Check server crashed
  • Possible Solution
    Adding an assert into the code here shows the root cause. Something like assert(sp->thr != 0);. This would indicate that a NULL values is being passed into pthread_cancel. A possible solution would be a NULL check before attempting to cancel the thread.

  • Other observations
    I was not able to reproduce the issue using 3.15 as the server.

@MattCatz
Copy link
Contributor Author

You can also get a similar crash on the client side here if you queue up a bunch of client side tests (i.e. for i in $(seq 100); do iperf3 -c 127.0.0.1 -t 10 -P 10; done) then repeatedly start and kill the server.

@davidBar-On
Copy link
Contributor

Can you try ruining these tests using PR #1654 code? The issues may be related, so it seems to be worth testing whether the PR also fix this issue. (I am using WSL which does not support tc qdisc ... netem ....)

@MattCatz
Copy link
Contributor Author

It does not.

You can see your changes working correctly in test # 1 but it still segfaults in test # 2. ( I added an assert to show where it was failing)

-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
get_parameters:
{
	"tcp":	true,
	"omit":	0,
	"time":	10,
	"num":	0,
	"blockcount":	0,
	"parallel":	10,
	"len":	131072,
	"pacing_timer":	1000,
	"client_version":	"3.16+"
}
SNDBUF is 16384, expecting 0
RCVBUF is 131072, expecting 0
Accepted connection from 127.0.0.1, port 52362
Congestion algorithm is cubic
[  5] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52376
Congestion algorithm is cubic
[  8] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52382
Congestion algorithm is cubic
[ 10] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52384
Congestion algorithm is cubic
[ 12] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52396
Congestion algorithm is cubic
[ 14] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52412
Congestion algorithm is cubic
[ 16] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52414
Congestion algorithm is cubic
[ 18] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52430
Congestion algorithm is cubic
[ 20] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52440
Congestion algorithm is cubic
[ 22] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52452
Congestion algorithm is cubic
[ 24] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52462
Thread number 1 FD 5 created
Thread number 2 FD 8 created
Thread number 3 FD 10 created
Thread number 4 FD 12 created
Thread number 5 FD 14 created
Thread number 6 FD 16 created
Thread number 7 FD 18 created
Thread number 8 FD 20 created
Thread number 9 FD 22 created
Thread number 10 FD 24 created
All threads created
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100211
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100202
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100178
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100119
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100222
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100203
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100278
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100274
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100295
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100248
interval_len 1.001124 bytes_transferred 11272192
interval forces keep
interval_len 1.001168 bytes_transferred 11403264
interval forces keep
interval_len 1.001173 bytes_transferred 11403264
interval forces keep
interval_len 1.001177 bytes_transferred 10616832
interval forces keep
interval_len 1.001181 bytes_transferred 11403264
interval forces keep
interval_len 1.001186 bytes_transferred 11141120
interval forces keep
interval_len 1.001194 bytes_transferred 10747904
interval forces keep
interval_len 1.001241 bytes_transferred 10747904
interval forces keep
interval_len 1.001247 bytes_transferred 9699328
interval forces keep
interval_len 1.001252 bytes_transferred 7733248
interval forces keep
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  10.8 MBytes  90.1 Mbits/sec                  
[  8]   0.00-1.00   sec  10.9 MBytes  91.1 Mbits/sec                  
[ 10]   0.00-1.00   sec  10.9 MBytes  91.1 Mbits/sec                  
[ 12]   0.00-1.00   sec  10.1 MBytes  84.8 Mbits/sec                  
[ 14]   0.00-1.00   sec  10.9 MBytes  91.1 Mbits/sec                  
[ 16]   0.00-1.00   sec  10.6 MBytes  89.0 Mbits/sec                  
[ 18]   0.00-1.00   sec  10.2 MBytes  85.9 Mbits/sec                  
[ 20]   0.00-1.00   sec  10.2 MBytes  85.9 Mbits/sec                  
[ 22]   0.00-1.00   sec  9.25 MBytes  77.5 Mbits/sec                  
[ 24]   0.00-1.00   sec  7.38 MBytes  61.8 Mbits/sec                  
[SUM]   0.00-1.00   sec   101 MBytes   848 Mbits/sec                  
interval_len 1.001124 bytes_transferred 11272192
interval forces keep
interval_len 1.001168 bytes_transferred 11403264
interval forces keep
interval_len 1.001173 bytes_transferred 11403264
interval forces keep
interval_len 1.001177 bytes_transferred 10616832
interval forces keep
interval_len 1.001181 bytes_transferred 11403264
interval forces keep
interval_len 1.001186 bytes_transferred 11141120
interval forces keep
interval_len 1.001194 bytes_transferred 10747904
interval forces keep
interval_len 1.001241 bytes_transferred 10747904
interval forces keep
interval_len 1.001247 bytes_transferred 9699328
interval forces keep
interval_len 1.001252 bytes_transferred 7733248
interval forces keep
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  10.8 MBytes  90.1 Mbits/sec                  
[  8]   0.00-1.00   sec  10.9 MBytes  91.1 Mbits/sec                  
[ 10]   0.00-1.00   sec  10.9 MBytes  91.1 Mbits/sec                  
[ 12]   0.00-1.00   sec  10.1 MBytes  84.8 Mbits/sec                  
[ 14]   0.00-1.00   sec  10.9 MBytes  91.1 Mbits/sec                  
[ 16]   0.00-1.00   sec  10.6 MBytes  89.0 Mbits/sec                  
[ 18]   0.00-1.00   sec  10.2 MBytes  85.9 Mbits/sec                  
[ 20]   0.00-1.00   sec  10.2 MBytes  85.9 Mbits/sec                  
[ 22]   0.00-1.00   sec  9.25 MBytes  77.5 Mbits/sec                  
[ 24]   0.00-1.00   sec  7.38 MBytes  61.8 Mbits/sec                  
[SUM]   0.00-1.00   sec   101 MBytes   848 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  27.9 MBytes   234 Mbits/sec                  receiver
[  8]   0.00-1.00   sec  28.0 MBytes   235 Mbits/sec                  receiver
[ 10]   0.00-1.00   sec  28.0 MBytes   235 Mbits/sec                  receiver
[ 12]   0.00-1.00   sec  26.8 MBytes   224 Mbits/sec                  receiver
[ 14]   0.00-1.00   sec  28.0 MBytes   235 Mbits/sec                  receiver
[ 16]   0.00-1.00   sec  27.6 MBytes   231 Mbits/sec                  receiver
[ 18]   0.00-1.00   sec  27.4 MBytes   229 Mbits/sec                  receiver
[ 20]   0.00-1.00   sec  27.5 MBytes   230 Mbits/sec                  receiver
[ 22]   0.00-1.00   sec  26.6 MBytes   223 Mbits/sec                  receiver
[ 24]   0.00-1.00   sec  24.0 MBytes   201 Mbits/sec                  receiver
[SUM]   0.00-1.00   sec   272 MBytes  2.28 Gbits/sec                  receiver
iperf3: the client has terminated
Thread number 1 FD 5 stopped
Thread number 2 FD 8 stopped
Thread number 3 FD 10 stopped
Thread number 6 FD 16 terminated unexpectedly
Thread number 4 FD 12 stopped
Thread number 5 FD 14 stopped
Thread number 6 FD 16 stopped
Thread number 7 FD 18 stopped
Thread number 8 FD 20 stopped
Thread number 9 FD 22 stopped
Thread number 10 FD 24 stopped
All threads stopped
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
get_parameters:
{
	"tcp":	true,
	"omit":	0,
	"time":	10,
	"num":	0,
	"blockcount":	0,
	"parallel":	10,
	"len":	131072,
	"pacing_timer":	1000,
	"client_version":	"3.16+"
}
SNDBUF is 16384, expecting 0
RCVBUF is 131072, expecting 0
Accepted connection from 127.0.0.1, port 55980
Congestion algorithm is cubic
[  5] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 55982
ignoring short interval with no data
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-0.00   sec  0.00 Bytes  0.00 bits/sec                  receiver
[SUM]   0.00-0.00   sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: the client has terminated
iperf3: iperf_server_api.c:433: cleanup_server: Assertion `sp->thr != 0' failed.
Aborted (core dumped)

@davidBar-On
Copy link
Contributor

Thanks for testing. The second test failed because the termination happened before all threads where created. I enhanced PR #1654 to also handle this case. Can you check if the PR now fully resolves the issue?

@MattCatz
Copy link
Contributor Author

Thanks for testing. The second test failed because the termination happened before all threads where created. I enhanced PR #1654 to also handle this case. Can you check if the PR now fully resolves the issue?

I am not able to recreate the issue using the most recent changes in PR #1654. Seems fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants