-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix VST / HTTP2 idle connection keep alive behaviour #12549
Conversation
another one with go-tests enabled: http://jenkins01.arangodb.biz:8080/job/arangodb-matrix-pr/11664/ |
@@ -330,7 +330,7 @@ class GeneralConnection : public fuerte::Connection { | |||
Socket<ST> _proto; | |||
|
|||
/// elements to send out | |||
boost::lockfree::queue<RT*, boost::lockfree::capacity<64>> _queue; | |||
boost::lockfree::queue<RT*, boost::lockfree::capacity<32>> _queue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why it was 64 and why it is now 32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to match what happens on the server, the number is sort of arbitrary
@@ -86,7 +86,7 @@ enum class EncodingType { | |||
UNSET | |||
}; | |||
|
|||
enum class AuthenticationMethod { BASIC, JWT, NONE }; | |||
enum class AuthenticationMethod : uint8_t { BASIC = 1, JWT = 2, NONE = 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this change necessary? Did I miss something?
} | ||
|
||
auto& me = static_cast<HttpCommTask<T>&>(*s); | ||
if ((wasReading && me._reading) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, this code will always call asyncReadNextRequest
and thus _reading
almost always true.
Now, previously I added the _requestCount
which is now gone again. The reason was the following: imagine the a new request coming in and the timeout expires at the same time. Now both completion handlers are on the queue, first the read completion runs and decides to execute the request, and then restart reading. Now the timeout completion runs, sees that it is still reading and closes the request. Thus we have the same bug as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are mixing these, this is the http 1.1 version there is no asyncReadNextRequest
in this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VST / HTTP2 now have their own setIOTimeout()
implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_reading is not true unless we are onto the next request actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I thouht we start reading the next request as soon as we called processRequest
. However, that is indeed not the case here.
arangod/GeneralServer/H2CommTask.cpp
Outdated
const bool wasWriting = this->_writing; | ||
TRI_ASSERT(wasReading || wasWriting); | ||
if (wasWriting) { | ||
secs = std::max(300.0, secs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be configurable?
arangod/GeneralServer/H2CommTask.cpp
Outdated
const bool wasWriting = this->_writing; | ||
TRI_ASSERT(wasReading || wasWriting); | ||
if (wasWriting) { | ||
secs = std::max(300.0, secs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the timeout at least 300 seconds here? Is there any specific reason for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the idle timeout may be super low, so this is a 300 second write timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a different timeout here (not an idle timeout, but a write timeout), shouldn't it be configurable?
DTraceHttpCommTaskResponseWritten((size_t) self.get()); | ||
|
||
auto* thisPtr = static_cast<HttpCommTask<T>*>(self.get()); | ||
auto& me = static_cast<HttpCommTask<T>&>(*self); | ||
me._writing = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation level looks strange here.
if (ec || !thisPtr->_shouldKeepAlive || err != HPE_PAUSED) { | ||
thisPtr->close(ec); | ||
llhttp_errno_t err = llhttp_get_errno(&me._parser); | ||
if (ec || !me._shouldKeepAlive || err != HPE_PAUSED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation level looks strange here
const bool wasWriting = this->_writing; | ||
TRI_ASSERT(wasReading || wasWriting); | ||
if (wasWriting) { | ||
secs = std::max(300.0, secs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the timeout at least 300 seconds hee? the 300 seconds minimum seems to be in place for HTTP/2 and VST, but not for HTTP/1. I don't see a reason for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because i am not gonna change the http 1.1 behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but then this still leaves us with protocol-specific behavior:
- HTTP/1.1 doesn't have a write timeout
- HTTP/2 and VST have a write timeout which is max(idle timeout, 300)
Ideally we wouldn't have differences between the protocols, but try to establish the same behavior for all protocols. If that requires a write-timeout to be configurable, we should add it as a new configuration option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah some people are sending GB image files as part of foxx apps and stuff like that (and we are not even using sendfile here). I'd rather keep it sane
} | ||
|
||
auto& me = static_cast<HttpCommTask<T>&>(*s); | ||
if ((wasReading && me._reading) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I thouht we start reading the next request as soon as we called processRequest
. However, that is indeed not the case here.
LGTM except that the WriteTimeout is still hard-coded to at least 300 seconds. @fceller @graetzer : ideally we should also make it possible to enforce this write timeout for HTTP/1 connections too, because that would mean all protocols will have the same behavior.
And in a future version, we can deprecate the |
i don't know this is just a fix for buggy idle timeouts, i have too many diverging branches, I understand customers are waiting for this |
idgaf if I have to wait for this PR to be merged I cannot work |
(cherry picked from commit 8dfe68f)
Scope & Purpose
Make VST and HTTP2 connection have the same keep-alive behaviour as a HTTP 1 connection. Namely only truely cancel an IDLE connection
Backports:
Should be more or less covered by existing tests
http://jenkins01.arangodb.biz:8080/job/arangodb-matrix-pr/11663/