-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf test: WebSockets (http4s) #3520
Conversation
.toWebSocketRoutes( | ||
wsEndpoint.serverLogicSuccess(_ => | ||
IO.pure { (in: Stream[IO, Long]) => | ||
Http4sCommon.wsResponseStream.evalMap(_ => IO.realTime.map(_.toMillis)).concurrently(in.as(())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same as the http4s version? why the "concurrently"?
second question: isn't IO.realTime
"slow" by itself, esp when called under stress? For better isolation of the purely WS performance, would it make a difference if we just returned IO.pure(1823)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been entirely copied from https://github.com/kamilkloch/websocket-benchmark/
Is this the same as the http4s version? why the "concurrently"?
The behavior of the stream should be the same. Vanilla implementation requires a send: Stream[F, WebSocketFrame]
and a receive: Pipe[F, WebSocketFrame, Unit]
.
The Tapir logic has to return a Stream[F, Long]
based on an input Stream so maybe it needs this additional concurrently
operator to correctly make the result stream "run after the input stream is read".
I wanted to keep the implementation 1:1 with the base benchmark for a start.
second question
The returned timestamp is used to calculate latency on the client. I found that suspicious too, but if Kamil's tests result in latencies of a few milliseconds, then it might not be that expensive after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so we ignore the input, and as the output produce timestamps. But how do we calculate the latency from that? The scenario reads a single message, compares timestamps and waits for a second, right? But isn't that skewed by any delay that there might be when establishing the web socket, that is, this assumes that the WS sends its timestamp, and that the scenario ends its sleep at exactly the same time (to properly measure latency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the server returns its timestamp after the connection is established, so the client receives it immediately and compares to its timestamp. The await call is unclear to me, maybe it only means to await at most, but I need to double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this example in Gatling docs:
// expecting 2 messages
// 1st message will be validated against wsCheck1
// 2nd message will be validated against wsCheck2
// whole sequence must complete withing 30 seconds
exec(ws("Send").sendText("hello")
.await(30)(wsCheck1, wsCheck2))
It confirms what I suggested, that the await(1.second) operator means only that the response check has to complete in 1 second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok ... so we want to measure the total latency of: establish a ws connection, send one message, receive one message? Why not simply doing it on the client side, capturing the timestamp, doing the operations, and then getting the local timestamp again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection part is not measured. After connecting, the client sends requests and does the checks in a loop. It looks the intention was to measure only the latency of one direction: not full req->response cycle, but just the server->client part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand the response stream, it emits a timestamp every 100 ms. It looks like we're simply ignoring requests, just emitting timestamp every 100ms and measuring the time between sending it and receiving on the client side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this entire scheme may be caused by Gatling's communication model, where you can't simply connect and receive messages, you have to send a request and receive a response, but the test is interested only in the response generated -> response received
part, and the responses are generated in an independent stream.
@@ -516,7 +516,7 @@ lazy val perfTests: ProjectMatrix = (projectMatrix in file("perf-tests")) | |||
"nl.grons" %% "metrics4-scala" % Versions.metrics4Scala % Test, | |||
"com.lihaoyi" %% "scalatags" % Versions.scalaTags % Test, | |||
// Needs to match version used by Gatling | |||
"com.github.scopt" %% "scopt" % "4.1.0", | |||
"com.github.scopt" %% "scopt" % "3.7.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pin in steward's config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pinned, but I pinned it after steward had updated it.
object Http4sCommon { | ||
// Websocket response is returned with a lag, so that we can have more concurrent users talking to the server. | ||
// This lag is not relevant for measurements, because the server returns a timestamp after having a response ready to send back, | ||
// so the client can measure only the latency of the server stack handling the response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah now it's clear, thanks :)
This PR adds an endpoint for performance tests of WebSockets:
ws://127.0.0.1:8080/ws/ts
, returning current timestamp.A special simulation called
WebSocketsSimulation
can then be executed. It would build a histogram reporting latency percentiles.This simulation cannot be configured and run using standard way (
PerfTestSuiteRunner
), so the process is described inperfTests/README.md
.Based on https://github.com/kamilkloch/websocket-benchmark
WS endpoint will be added to other servers in separate PRs.