fix nodes panic during synchronization #5081

pfi79 · 2024-12-18T17:15:17Z

What's going on?

During synchronisation, the synchroniser reads blocks from another node and writes them to itself.
Block writes are started asynchronously, and we have no control over when this goroutine will actually be executed.
Without waiting for the block to be written, the synchroniser proceeds to the next step.
When finished, the synchroniser exits and the code immediately detects that the node has not reached altitude and enters a new synchronisation cycle.

The startHeight := s.Support.Height() method is executed as the start of the range to synchronise. But it happens that the previous block has not been written yet (but will be soon) and we choose the number of the block that is about to be written as the start. Thus it is possible to write two identical blocks in a row.

Proposal
Set block recording to synchronous mode

yacovm · 2024-12-18T17:54:04Z

Thanks for looking into this.

What about a unit test that reproduces the problem?
I don't understand why we need such a complex solution, when we can just write the blocks synchronously?

pfi79 · 2024-12-18T17:59:59Z

2. I don't understand why we need such a complex solution, when we can just write the blocks synchronously?

I was thinking about synchronous writing, but as I think the idea here was to parallele getting the next block from puller and writing the current one to state. That is a time saver.
But if other maintainers agree to convert to synchronous recording, I will.

pfi79 · 2024-12-18T18:02:25Z

What about a unit test that reproduces the problem?

I'm at work on a test.
Was the first to throw a pr with changes to get corrected right away.

yacovm · 2024-12-18T18:06:42Z

I don't understand why we need such a complex solution, when we can just write the blocks synchronously?

I was thinking about synchronous writing, but as I think the idea here was to parallele getting the next block from puller and writing the current one to state. That is a time saver. But if other maintainers agree to convert to synchronous recording, I will.

we do get the next block from the remote node though, whether you are doing a sync write or an async one.

Because when you retrieve a block from the sync buffer, a new block is fed into it.

pfi79 · 2024-12-18T18:08:46Z

we do get the next block from the remote node though, whether you are doing a sync write or an async one.

Because when you retrieve a block from the sync buffer, a new block is fed into it.

That's right. I'll redo it.

Signed-off-by: Fedor Partanskiy <fedor.partanskiy@atme.com>

yacovm · 2024-12-18T19:20:21Z

I know that you didn't provide a test, but I am perfectly fine with using the sync block writing when we synchronize.

pfi79 requested a review from a team as a code owner December 18, 2024 17:15

pfi79 force-pushed the fix-sync-bft branch 2 times, most recently from fdb6cf8 to 315657f Compare December 18, 2024 18:33

fix nodes panic during synchronization

25e8517

Signed-off-by: Fedor Partanskiy <fedor.partanskiy@atme.com>

pfi79 force-pushed the fix-sync-bft branch from 315657f to 25e8517 Compare December 18, 2024 18:53

yacovm approved these changes Dec 18, 2024

View reviewed changes

yacovm merged commit 131f9bc into hyperledger:main Dec 18, 2024
15 checks passed

pfi79 deleted the fix-sync-bft branch December 18, 2024 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix nodes panic during synchronization #5081

fix nodes panic during synchronization #5081

pfi79 commented Dec 18, 2024 •

edited

Loading

yacovm commented Dec 18, 2024

pfi79 commented Dec 18, 2024

pfi79 commented Dec 18, 2024

yacovm commented Dec 18, 2024

pfi79 commented Dec 18, 2024

yacovm commented Dec 18, 2024

fix nodes panic during synchronization #5081

fix nodes panic during synchronization #5081

Conversation

pfi79 commented Dec 18, 2024 • edited Loading

yacovm commented Dec 18, 2024

pfi79 commented Dec 18, 2024

pfi79 commented Dec 18, 2024

yacovm commented Dec 18, 2024

pfi79 commented Dec 18, 2024

yacovm commented Dec 18, 2024

pfi79 commented Dec 18, 2024 •

edited

Loading