Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaying requests for large files #217

Conversation

ironsmile
Copy link

I wanted to use gor for testing a HTTP server while mirroring real traffic. Unfortunately the traffic was for relatively big media files (from 100MB up to few GBs). In that case the buffer for reading the response body was too small and gor replayer was failing to replicate the load which real users are generating on the mirrored server.

An other limitation was the fixed deadline for reading the response body. With very large files a timeout of 5s, 30s or whatever is not practical as very big files can take some time to be downloaded. On the other hand a relatively small timeout is required to stop stalled connections which do not move any data around anymore. Maybe because of a faulty HTTP server under test, network problems or something else.

This pull request aims to remedy both of the problems. It does it by

  • Reading the "whole" body of the response. Or at least trying to do so. The reading is done in small chunks which are discarded instantly in order not to consume too much memory for large responses. An upper limit of the read data is still present to make sure a faulty server will not create an never ending chunked response or some other problem of the kind.
  • Instead of a fixed deadline, a rolling timeout is introduced while reading the response. The connection will timeout only if there is no network activity at all for the set amount of time.
  • The behaviour regarding small files is not changed in any way.
  • Fixes few crashes which were happening from time to time during prolonged tests.

Every packet which had 4 bytes of data or less was
causing a panic - slice bounds out of range.

Closes #215
This makes sure replaying with big responses work well. Without
reading the whole response the server under test was behaving
differently from the one which traffic is mirrored.
The --output-http-timeout option mieaning is changed. Previously it was
a hard limit for which the whole body should have been read. But most of
the time this does not work for large bodies.

Tho timeout is now set before every Read operation effectively meaning
the body will continue to be read as long as there is any data flowing.
The timeout will be triggered only after an period of inactivity is reached.
}
toConn.readTimeout = c.config.Timeout

var readBytes int64 = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should drop = 0 from declaration of var readBytes; it is the zero value

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Forgot to run go vet against it.

@ironsmile
Copy link
Author

The tests failed. This happened for me too few times. But other times they all pass. Should I look into the tests as well?

Unreachable code and default value for the int64 type.
@buger
Copy link
Owner

buger commented Oct 8, 2015

Thanks for doing it! Can you clarify how you use response from http client? input-raw is unable to handle such big files by design, so it means you using another input? Basically pls describe how you run gor.

@ironsmile
Copy link
Author

Well, the set up is this:

A production machine which serves big files has an gor instance running with

gor --http-original-host --input-raw :80 -output-tcp-stats --output-tcp <replayer:port>|20%

An other machine, the replayer, is running

gor --http-original-host --input-tcp <replayer:port> -output-http-timeout 10s --output-http <server-under-test:port>

As far as I can tell it works great with more than 5 Gbps of traffic and the replayer is not struggling at all.

UPDATE:
I have some more numbers now. -input-raw may be missing some of the requests but those it catches are good enough for me. In my setup the production server has about 9k concurrent connections and is serving about 8GBps. The server under test receives about 300 concurrent connections. This means -input-raw is able to relay about 16% of the supposed requests, right? Because 20% of 9000 is 1800 and 300 is 16% of 1800.

@buger
Copy link
Owner

buger commented Oct 8, 2015

Well its not really about number of requests, but their consistency. Payloads larger then 200kb (or those who have more then 3 packets (64kb each), have big chances for corruption. See #167

@ironsmile
Copy link
Author

Unfortunatelly our users do not do POST requests. Pretty much all of the requests are GETs and HEADs which do not have a body. So far I haven't noticed any broken requests, in HTTP return code 400 sense.

Maybe we can devise some test for consistency? And try it this way?

@buger
Copy link
Owner

buger commented Oct 8, 2015

Then i do not really get what reading of whole response gives to you. Can you clarify?

@ironsmile
Copy link
Author

The gor replayer is reading the whole response from the HTTP request it makes to its --output-http. Previously it was closing the connection without reading the whole body.

@buger
Copy link
Owner

buger commented Oct 9, 2015

So main problem for you that it closes connection, and not properly emulate user behavior, right?

Sorry if i'm too pedantic :)

@ironsmile
Copy link
Author

Yes, this is my problem which I am trying to solve with this pull request. This is not a small change in the behaviour of the software. You are right in your effort to understand it completely.

@buger
Copy link
Owner

buger commented Oct 9, 2015

I'm not sure if custom conn timeout implementation needed here, few lines above it sets timeout c.conn.SetReadDeadline(timeout). As you mentioned it is fixed, but when you read in batches using io.CopyN, it should apply separate timeout for each, chunk, so it will not be really fixed, right?

@ironsmile
Copy link
Author

The custom timeout is required. The documentation of net.Conn about deadlines read this:

        // A deadline is an absolute time after which I/O operations
        // fail with a timeout (see type Error) instead of
        // blocking. The deadline applies to all future I/O, not just
        // the immediately following call to Read or Write.

An absolute time deadline is set for all I/O no matter how many times Read is called (io.CopyN uses the connection's Read method). This means that if now is 15:30:05 and the deadline is set to 15:30:10 any Reads after 15:30:10 will result in a timeout error. If reading the whole body takes more than 5 seconds, then the first read after 15:30:10 will timeout.

@buger
Copy link
Owner

buger commented Oct 9, 2015

I see, but you can just call c.conn.SetReadDeadline(timeout) inside for after each io.CopyN call, then it will be reseted each time.

@ironsmile
Copy link
Author

Yes, the c.conn.SetReadDeadline call on line 47 is the thing which makes sure reading the body will not timeout unless there is no network activity at all.

Read again the second bullet in the pull request description and my last comment. Or maybe you think this approach is not enough? Or does not work as intended?

@buger
Copy link
Owner

buger commented Oct 9, 2015

I see what you mean, but in my view inlining it in for loop will make things easier, like this:

for {
    if n, err := io.CopyN(ioutil.Discard, c.conn, readChunkSize); err == io.EOF {
        break
    } else if err != nil {
        Debug("[HTTPClient] Read the whole body error:", err, c.baseURL)
        break
    } else {
        readBytes += n

        // Setting new timeout for next chunk
        timeout = time.Now().Add(c.config.Timeout)
        c.conn.SetReadDeadline(timeout)
    }

    if readBytes >= maxResponseSize {
        Debug("[HTTPClient] Body is more than the max size", maxSizeFromBody,
            c.baseURL)
        break
    }
}

@@ -184,6 +184,12 @@ func (o *HTTPOutput) Read(data []byte) (int, error) {

func (o *HTTPOutput) sendRequest(client *HTTPClient, request []byte) {
meta := payloadMeta(request)

if len(meta) < 2 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta is constructed manually, and always should be (type, uuid, time), do you know a repeatable case when meta gets corrupted (because it never should) ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does happen fairly often my setup. Normally it takes from few minutes to few hours to happen. I was not able to reproduce it in an automatic test.

But I have stack traces from crashes and can reproduce it easily.

@ironsmile
Copy link
Author

Ah, I see. I didn't understand you before. Yes, it does make it a lot easier to read. Will amend the pull request shortly.

@buger
Copy link
Owner

buger commented Nov 18, 2015

@ironsmile ping?:)

@salimane
Copy link
Collaborator

any update on this ?

@buger
Copy link
Owner

buger commented May 19, 2016

Fixed in #277

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants