Relayer slows down exponentially in some circumstances #2008
Description
Summary of Bug
The relayer sometimes slows down exponentially.
The cause of slowness is that the height of the events we are pulling from the subscriptions drift exponentially from the real latest height, because we are not pulling them from the event monitor stream fast enough.
We are trying to get an event from the stream of events (with try_recv_multiple
) every 500ms.
So with two chains we have:
- call try_recv_multiple and get a NewBlock from chain A
- wait 500ms
- call try_recv_multiple and get a NewBlock from chain B
- wait 500ms
- call try_recv_multiple and get a NewBlock from chain A
etc.
So we get an event per chain roughly once every second.
With three chains:
- call try_recv_multiple and get a NewBlock from chain A
- wait 500ms
- call try_recv_multiple and get a NewBlock from chain B
- wait 500ms
- call try_recv_multiple and get a NewBlock from chain C
- wait 500ms
- call try_recv_multiple and get a NewBlock from chain A
- etc.
So we were getting a NewBlock event per chain every 1.5s.
But since the block time for testing is 1s, we end up drifting behind more and more.
I guess that's why we only see this in testing and not in prod, because in prod we query often enough that we are always up to date.
The problem gets worse the lower the block time and the higher the number of chains the relayer is connected to.
To fix this, we should use a blocking recv_multiple
on the subscriptions stream so that we get the events as fast as they are emitted, which solves the drift.
Version
v0.13.0-rc0
Steps to Reproduce
- Spawn 3 chains with a block time of 1s
- Create a channel between 2 chains
- Start Hermes
- Wait a few minutes
- Do a
ft-transfer
- See that the relayer only processes the transfer after a long time
- Wait more
- Do another
ft-transfer
- It takes even longer until the relayer processes the transfer
Acceptance Criteria
The relayer does not exhibit this issue anymore.
For Admin Use
- Not duplicate issue
- Appropriate labels applied
- Appropriate milestone (priority) applied
- Appropriate contributors tagged
- Contributor assigned/self-assigned