Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare intermittent DTLS failure in compat.sh #9816

Open
gilles-peskine-arm opened this issue Dec 2, 2024 · 0 comments
Open

Rare intermittent DTLS failure in compat.sh #9816

gilles-peskine-arm opened this issue Dec 2, 2024 · 0 comments

Comments

@gilles-peskine-arm
Copy link
Contributor

This issue documents an unexplained failure of compat.sh on an unremarkable DTLS test case on the CI, with few clues as to the reason. This is very likely to be a problem in the test environment. We are unlikely to spend any more time on it unless the symptoms become more frequent.

In a run of compat.sh, on one test case, the client reports a failed handshake because the server apparently didn't reply (MBEDTLS_ERR_NET_RECV_FAILED). The server logs show no attempt to connect, although there is no conclusive evidence that the server didn't receive anything: it may have died before it was able to print logs.

This happened with the mbedtls-3.6 nightly tests on 2024-12-01 evening, in ./tests/scripts/all.sh --seed 4 --keep-going test_psa_crypto_config_reference_cipher_aead_cmac. Extract from the console log:

01:05:55.291  m->m dtls12,no TLS_PSK_WITH_AES_128_CBC_SHA ............................ PASS
01:05:55.291  m->m dtls12,no TLS_PSK_WITH_AES_256_CBC_SHA ............................ PASS
01:05:55.291  m->m dtls12,no TLS_DHE_PSK_WITH_AES_128_CBC_SHA ........................ PASS
01:05:55.291  m->m dtls12,no TLS_DHE_PSK_WITH_AES_128_CBC_SHA256 ..................... PASS
01:05:55.291  m->m dtls12,no TLS_DHE_PSK_WITH_AES_128_CCM ............................ PASS
01:05:55.291  m->m dtls12,no TLS_DHE_PSK_WITH_AES_128_CCM_8 .......................... PASS
01:05:55.291  m->m dtls12,no TLS_DHE_PSK_WITH_AES_128_GCM_SHA256 ..................... PASS
01:05:55.551  m->m dtls12,no TLS_DHE_PSK_WITH_AES_256_CBC_SHA ........................ PASS
01:05:55.551  m->m dtls12,no TLS_DHE_PSK_WITH_AES_256_CBC_SHA384 ..................... PASS
01:05:55.551  m->m dtls12,no TLS_DHE_PSK_WITH_AES_256_CCM ............................ FAIL
01:05:55.551    ! outputs saved to c-srv-186.log, c-cli-186.log
01:05:55.551  m->m dtls12,no TLS_DHE_PSK_WITH_AES_256_CCM_8 .......................... PASS
01:05:55.551  m->m dtls12,no TLS_DHE_PSK_WITH_AES_256_GCM_SHA384 ..................... PASS
01:05:55.551  m->m dtls12,no TLS_DHE_PSK_WITH_CAMELLIA_128_CBC_SHA256 ................ PASS
01:05:55.551  m->m dtls12,no TLS_DHE_PSK_WITH_CAMELLIA_128_GCM_SHA256 ................ PASS
01:05:55.551  m->m dtls12,no TLS_DHE_PSK_WITH_CAMELLIA_256_CBC_SHA384 ................ PASS
01:05:55.807  m->m dtls12,no TLS_DHE_PSK_WITH_CAMELLIA_256_GCM_SHA384 ................ PASS
01:05:55.807  m->m dtls12,no TLS_ECDHE_PSK_WITH_AES_128_CBC_SHA ...................... PASS
01:05:55.807  m->m dtls12,no TLS_ECDHE_PSK_WITH_AES_128_CBC_SHA256 ................... PASS
01:05:55.807  m->m dtls12,no TLS_ECDHE_PSK_WITH_AES_256_CBC_SHA ...................... PASS

Note that this is happening within less than 1 second, so it's not a timeout.

Client and server logs:
all_u16-test_psa_crypto_config_reference_cipher_aead_cmac-c-cli-186.log.txt
all_u16-test_psa_crypto_config_reference_cipher_aead_cmac-c-srv-186.log.txt

The server reports successful connections with several cipher suites (this is normal: we keep the server running within a batch of cipher suites with similar parameters), the last one being TLS_DHE_PSK_WITH_AES_256_CBC_SHA384. The next one should have been TLS_DHE_PSK_WITH_AES_256_CCM, but the server logs stop there.

The least implausible explanations I can think of:

  • The OS dropped the ClientHello packet. Ok, but then why wasn't it retransmitted?
  • The shell script killed the server too early. Ok, but why? The script only ever attempts to kill the server after it has processed the output from the client.
  • The server died. Ok, but why, and why don't the server logs and console show anything?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

1 participant