Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix VST / HTTP2 idle connection keep alive behaviour #12549

Merged
merged 6 commits into from
Sep 1, 2020
Merged

Conversation

graetzer
Copy link
Contributor

@graetzer graetzer commented Aug 27, 2020

Scope & Purpose

Make VST and HTTP2 connection have the same keep-alive behaviour as a HTTP 1 connection. Namely only truely cancel an IDLE connection

  • 💩 Bugfix

Backports:

  • Backports required for: 3.7

Should be more or less covered by existing tests
http://jenkins01.arangodb.biz:8080/job/arangodb-matrix-pr/11663/

@graetzer graetzer changed the title Bug fix/bts 173 Fix VST / HTTP2 idle connection keep alive behaviour Aug 27, 2020
@graetzer
Copy link
Contributor Author

another one with go-tests enabled: http://jenkins01.arangodb.biz:8080/job/arangodb-matrix-pr/11664/

@@ -330,7 +330,7 @@ class GeneralConnection : public fuerte::Connection {
Socket<ST> _proto;

/// elements to send out
boost::lockfree::queue<RT*, boost::lockfree::capacity<64>> _queue;
boost::lockfree::queue<RT*, boost::lockfree::capacity<32>> _queue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why it was 64 and why it is now 32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to match what happens on the server, the number is sort of arbitrary

@@ -86,7 +86,7 @@ enum class EncodingType {
UNSET
};

enum class AuthenticationMethod { BASIC, JWT, NONE };
enum class AuthenticationMethod : uint8_t { BASIC = 1, JWT = 2, NONE = 0 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change necessary? Did I miss something?

}

auto& me = static_cast<HttpCommTask<T>&>(*s);
if ((wasReading && me._reading) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, this code will always call asyncReadNextRequest and thus _reading almost always true.

Now, previously I added the _requestCount which is now gone again. The reason was the following: imagine the a new request coming in and the timeout expires at the same time. Now both completion handlers are on the queue, first the read completion runs and decides to execute the request, and then restart reading. Now the timeout completion runs, sees that it is still reading and closes the request. Thus we have the same bug as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are mixing these, this is the http 1.1 version there is no asyncReadNextRequest in this class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VST / HTTP2 now have their own setIOTimeout() implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_reading is not true unless we are onto the next request actually

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I thouht we start reading the next request as soon as we called processRequest. However, that is indeed not the case here.

const bool wasWriting = this->_writing;
TRI_ASSERT(wasReading || wasWriting);
if (wasWriting) {
secs = std::max(300.0, secs);
Copy link
Contributor

@fceller fceller Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be configurable?

const bool wasWriting = this->_writing;
TRI_ASSERT(wasReading || wasWriting);
if (wasWriting) {
secs = std::max(300.0, secs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the timeout at least 300 seconds here? Is there any specific reason for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the idle timeout may be super low, so this is a 300 second write timeout

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a different timeout here (not an idle timeout, but a write timeout), shouldn't it be configurable?

DTraceHttpCommTaskResponseWritten((size_t) self.get());

auto* thisPtr = static_cast<HttpCommTask<T>*>(self.get());
auto& me = static_cast<HttpCommTask<T>&>(*self);
me._writing = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation level looks strange here.

if (ec || !thisPtr->_shouldKeepAlive || err != HPE_PAUSED) {
thisPtr->close(ec);
llhttp_errno_t err = llhttp_get_errno(&me._parser);
if (ec || !me._shouldKeepAlive || err != HPE_PAUSED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation level looks strange here

const bool wasWriting = this->_writing;
TRI_ASSERT(wasReading || wasWriting);
if (wasWriting) {
secs = std::max(300.0, secs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the timeout at least 300 seconds hee? the 300 seconds minimum seems to be in place for HTTP/2 and VST, but not for HTTP/1. I don't see a reason for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because i am not gonna change the http 1.1 behaviour

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but then this still leaves us with protocol-specific behavior:

  • HTTP/1.1 doesn't have a write timeout
  • HTTP/2 and VST have a write timeout which is max(idle timeout, 300)

Ideally we wouldn't have differences between the protocols, but try to establish the same behavior for all protocols. If that requires a write-timeout to be configurable, we should add it as a new configuration option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah some people are sending GB image files as part of foxx apps and stuff like that (and we are not even using sendfile here). I'd rather keep it sane

}

auto& me = static_cast<HttpCommTask<T>&>(*s);
if ((wasReading && me._reading) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I thouht we start reading the next request as soon as we called processRequest. However, that is indeed not the case here.

@graetzer
Copy link
Contributor Author

@graetzer graetzer requested a review from jsteemann August 31, 2020 13:53
@jsteemann
Copy link
Contributor

LGTM except that the WriteTimeout is still hard-coded to at least 300 seconds.
I am fine with this being the default value, but I think we should make this value configurable via a startup option.

@fceller @graetzer : ideally we should also make it possible to enforce this write timeout for HTTP/1 connections too, because that would mean all protocols will have the same behavior.
As that would cause downwards-incompatibility for HTTP/1, this probably needs to be opt-in.
What about having the following options:

  • --http.request-timeout: config option to set this timeout
  • --http.request-timeout-behavior: values: "enforce", "compat", with "enforce" enforcing the timeout for all protocols, and "compat" enforcing it just for HTTP/2 and VST (downwards-compatible)

And in a future version, we can deprecate the --http.* options and rename them to something else, because they are used for VST as well. Probably something like --protocol.*?

@graetzer
Copy link
Contributor Author

graetzer commented Aug 31, 2020

i don't know this is just a fix for buggy idle timeouts, i have too many diverging branches, I understand customers are waiting for this

@graetzer graetzer mentioned this pull request Aug 31, 2020
8 tasks
@jsteemann
Copy link
Contributor

Has this PR been obsoleted by #12579?
Because #12579 contains the changes made in this PR as well. 🤔

@graetzer
Copy link
Contributor Author

idgaf if I have to wait for this PR to be merged I cannot work

@jsteemann
Copy link
Contributor

@graetzer : I am confused now.
Is your preference to merge this PR here sooner than #12579, and merge devel into 12579 once we have merged this PR here?
Just not clear to me, so an explanation would help. Thanks!

@jsteemann jsteemann merged commit 8dfe68f into devel Sep 1, 2020
@jsteemann jsteemann deleted the bug-fix/bts-173 branch September 1, 2020 07:47
graetzer added a commit that referenced this pull request Sep 1, 2020
KVS85 added a commit that referenced this pull request Sep 7, 2020
* Fix VST / HTTP2 idle connection keep alive behaviour (#12549)

(cherry picked from commit 8dfe68f)

* remove wrong merge

Co-authored-by: KVS85 <vadim@arangodb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants