Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry: add TCP RTT info collection #4745

Merged
merged 20 commits into from
Nov 17, 2022
Merged

Conversation

cce
Copy link
Contributor

@cce cce commented Nov 3, 2022

Summary

This collects RTT information from TCP and reports it in PeerConnectionDetails, if telemetry is enabled.

Test Plan

Added test for setting up a pair of networks connected to each other and gathering telemetry from open and closed connections. Reduced some code duplication in wsNetwork_test.go for setting up two networks.

@cce cce added the Enhancement label Nov 3, 2022
util/rtt_linux.go Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 3, 2022

Codecov Report

Merging #4745 (c07523c) into master (23890a8) will increase coverage by 0.00%.
The diff coverage is 23.40%.

@@           Coverage Diff           @@
##           master    #4745   +/-   ##
=======================================
  Coverage   54.68%   54.69%           
=======================================
  Files         414      416    +2     
  Lines       53550    53596   +46     
=======================================
+ Hits        29286    29313   +27     
- Misses      21836    21856   +20     
+ Partials     2428     2427    -1     
Impacted Files Coverage Δ
network/limitlistener/rejectingLimitListener.go 85.71% <0.00%> (-3.18%) ⬇️
network/wsPeer.go 68.44% <ø> (-0.49%) ⬇️
util/tcpinfo.go 0.00% <0.00%> (ø)
util/tcpinfo_linux.go 0.00% <0.00%> (ø)
network/wsNetwork.go 66.69% <83.33%> (+1.35%) ⬆️
network/requestTracker.go 71.24% <100.00%> (+0.12%) ⬆️
ledger/blockqueue.go 85.63% <0.00%> (-2.88%) ⬇️
agreement/cryptoVerifier.go 67.60% <0.00%> (-2.12%) ⬇️
agreement/proposalManager.go 96.07% <0.00%> (-1.97%) ⬇️
catchup/peerSelector.go 98.95% <0.00%> (-1.05%) ⬇️
... and 6 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

jannotti
jannotti previously approved these changes Nov 4, 2022
Copy link
Contributor

@jannotti jannotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just bike shedding because it's pleasant to code review a small PR!

Ship it.

util/rtt_darwin.go Outdated Show resolved Hide resolved
network/wsNetwork.go Outdated Show resolved Hide resolved
@cce cce marked this pull request as ready for review November 5, 2022 11:26
@AlgoAxel
Copy link
Contributor

AlgoAxel commented Nov 7, 2022

Could you attach a sample of what the new PeerConnections payload looks like? (or, highlights of what changes)?

@cce
Copy link
Contributor Author

cce commented Nov 9, 2022

Example output from the test I added, from a Linux CI build:

wsNetwork_test.go:2715: detailsA {"IncomingPeers":[{"Address":"127.0.0.1","HostName":"","InstanceName":"","ConnectionDuration":0,"DuplicateFilterCount":0,"TCP":{"RTT":13,"RTTVar":4,"RTTMin":10,"SndMSS":32768,"RcvMSS":536,"SndCwnd":10,"RcvWnd":65536,"Rate":3276800000}}],"OutgoingPeers":null}
wsNetwork_test.go:2716: detailsB {"IncomingPeers":null,"OutgoingPeers":[{"Address":"127.0.0.1","HostName":"","InstanceName":"","ConnectionDuration":0,"Endpoint":"http://127.0.0.1:42987","DuplicateFilterCount":0,"TCP":{"RTT":28,"RTTVar":16,"RTTMin":13,"SndMSS":32768,"RcvMSS":536,"SndCwnd":10,"RcvWnd":65536,"Rate":2520615384}}]}

network/wsNetwork.go Outdated Show resolved Hide resolved
// underlying network implementation, using a system call on Linux and Mac
// and returning an error for unsupported platforms.
func GetConnTCPInfo(conn net.Conn) (*TCPInfo, error) {
sysconn, ok := conn.(syscall.Conn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is the answer: unwrap until you get a syscall.Conn not as long as you the type implements underlyingConn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I could change it so GetConnTCPInfo required a syscall.Conn and then the loop in wsNetwork resolved and broke out of the loop with a syscall.Conn

util/tcpinfo_linux.go Show resolved Hide resolved
util/tcpinfo_linux.go Show resolved Hide resolved
brianolson
brianolson previously approved these changes Nov 16, 2022
Copy link
Contributor

@brianolson brianolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -1821,6 +1830,18 @@ func (wn *WebsocketNetwork) sendPeerConnectionsTelemetryStatus() {
InstanceName: peer.InstanceName,
DuplicateFilterCount: peer.duplicateFilterCount,
}
// unwrap websocket.Conn, requestTrackedConnection, rejectingLimitListenerConn
var uconn net.Conn = peer.conn.UnderlyingConn()
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an expected upper bound on this? If so, will be safer if this loop is bounded to say 10x that expected bound, just to avoid the infinite loop situation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Shant and JJ, need to bound here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a for-1-to-10 around it, WDYT

Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but consider adding a bound while unwinding connections.

@@ -1821,6 +1830,18 @@ func (wn *WebsocketNetwork) sendPeerConnectionsTelemetryStatus() {
InstanceName: peer.InstanceName,
DuplicateFilterCount: peer.duplicateFilterCount,
}
// unwrap websocket.Conn, requestTrackedConnection, rejectingLimitListenerConn
var uconn net.Conn = peer.conn.UnderlyingConn()
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Shant and JJ, need to bound here

network/wsNetwork_test.go Show resolved Hide resolved
@algorandskiy algorandskiy merged commit 526cb89 into algorand:master Nov 17, 2022
@cce cce deleted the tcp-rtt-info branch March 1, 2023 16:59
@cce cce mentioned this pull request Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants