You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the dolos node is killed with kill -9, for example, it is unable to start up and re-establish the socket.
In particular, the file already exists, so it will encounter an "address already in use" error, but it won't print this until the node is gracefully shutting down.
The logs on startup don't indicate that anything is amiss:
2024-07-13T20:16:03.425860Z INFO dolos::serve: found gRPC config
2024-07-13T20:16:03.425830Z WARN serve: dolos::relay: relay not enabled, skipping serve
2024-07-13T20:16:03.426886Z INFO stage{stage="roll"}: gasket::runtime: stage bootstrap ok
2024-07-13T20:16:03.426904Z INFO stage{stage="roll"}: gasket::runtime: switching stage phase prev_phase=Bootstrap next_phase=Working
2024-07-13T20:16:03.427317Z INFO stage{stage="ledger"}:bootstrap: dolos::sync::ledger: cursor found cursor=Some(ChainPoint(54245659, Hash<32>("c4888
56f1695fbc2a73c8c0809e804ff9a15295f29eed20872460b3a73eabbae")))
2024-07-13T20:16:03.427505Z INFO stage{stage="ledger"}:bootstrap: dolos::sync::ledger: wal sequence found seq=2285931
2024-07-13T20:16:03.427518Z INFO stage{stage="ledger"}: gasket::runtime: stage bootstrap ok
2024-07-13T20:16:03.427524Z INFO stage{stage="ledger"}: gasket::runtime: switching stage phase prev_phase=Bootstrap next_phase=Working
2024-07-13T20:16:03.429317Z INFO dolos::serve::grpc: serving via gRPC on address: [::]:50051
2024-07-13T20:16:03.431380Z INFO dolos::serve: found Ouroboros config
2024-07-13T20:16:03.858860Z INFO stage{stage="pull"}:bootstrap: dolos::sync::pull: connected to peer address="preview-node.world.dev.cardano.org:300
02" magic=2
2024-07-13T20:16:03.972779Z INFO stage{stage="pull"}:bootstrap: dolos::sync::pull: found intersection intersection=(54245659, c488856f1695fbc2a73c8c
0809e804ff9a15295f29eed20872460b3a73eabbae)
2024-07-13T20:16:03.972917Z INFO stage{stage="pull"}: gasket::runtime: stage bootstrap ok
2024-07-13T20:16:03.972949Z INFO stage{stage="pull"}: gasket::runtime: switching stage phase prev_phase=Bootstrap next_phase=Working
2024-07-13T20:16:03.973013Z INFO stage{stage="pull"}:execute: dolos::sync::pull: pulling block batch from upstream peer
2024-07-13T20:16:04.041244Z INFO stage{stage="pull"}:execute: dolos::sync::pull: pulling block batch from upstream peer
2024-07-13T20:16:04.042483Z INFO stage{stage="roll"}:execute: dolos::sync::roll: rolling back wal point=Specific(54245659, Hash<32>("c488856f1695fbc
2a73c8c0809e804ff9a15295f29eed20872460b3a73eabbae"))
2024-07-13T20:16:04.501124Z INFO stage{stage="pull"}:execute: dolos::sync::pull: block batch pulled from peer len=3
2024-07-13T20:16:04.503961Z INFO stage{stage="pull"}:execute: dolos::sync::pull: reached tip, waiting for new block
2024-07-13T20:16:04.503976Z INFO stage{stage="roll"}:execute: dolos::sync::roll: extending wal block.slot=54245712 block.hash=5e55089ba23f129d23bc00
92a15cb93880aaaf8fd86e2597a6b6da87c08f397a
2024-07-13T20:16:04.512811Z INFO stage{stage="ledger"}:execute: dolos::sync::ledger: applying block slot=54245712
2024-07-13T20:16:04.512710Z INFO stage{stage="roll"}:execute: dolos::sync::roll: extending wal block.slot=54245759 block.hash=39ffd38eb7ffc0ab982906
10a74124caf32289fa162fa12783ff9b39849c62d0
2024-07-13T20:16:04.519305Z INFO stage{stage="roll"}:execute: dolos::sync::roll: extending wal block.slot=54245761 block.hash=80fe04c697f00399bcc60c
b77aa1887b4f13d5aca03e2c8c90e24b2d1715c4b8
But once delivering a SIGTERM, it reveals what the problem is:
2024-07-13T20:17:17.507180Z WARN dolos::common: SIGTERM detected
2024-07-13T20:17:17.507242Z INFO gasket::daemon: dismissing stage stage="pull" state=Alive(Working)
2024-07-13T20:17:17.507257Z INFO gasket::daemon: dismissing stage stage="roll" state=Alive(Working)
2024-07-13T20:17:17.507264Z INFO gasket::daemon: dismissing stage stage="ledger" state=Alive(Working)
2024-07-13T20:17:17.507270Z INFO gasket::daemon: waiting for stages to end
2024-07-13T20:17:47.373730Z INFO stage{stage="pull"}:execute: dolos::sync::pull: new block sent by upstream peer point=(54245867, 469d14d0625f2185a5
17fda5d498894e5bc4836bee3e857917f89255004db3f4)
2024-07-13T20:17:47.441494Z INFO stage{stage="pull"}: gasket::runtime: stage dismissed
2024-07-13T20:17:47.441647Z INFO stage{stage="pull"}: gasket::runtime: switching stage phase prev_phase=Working next_phase=Teardown
2024-07-13T20:17:47.441692Z INFO stage{stage="pull"}: gasket::runtime: stage teardown ok
2024-07-13T20:17:47.441706Z INFO stage{stage="pull"}: gasket::runtime: switching stage phase prev_phase=Teardown next_phase=Ended
2024-07-13T20:17:47.441620Z INFO stage{stage="roll"}: gasket::runtime: stage dismissed
2024-07-13T20:17:47.441795Z INFO stage{stage="roll"}: gasket::runtime: switching stage phase prev_phase=Working next_phase=Teardown
2024-07-13T20:17:47.441813Z INFO stage{stage="roll"}: gasket::runtime: stage teardown ok
2024-07-13T20:17:47.441824Z INFO stage{stage="roll"}: gasket::runtime: switching stage phase prev_phase=Teardown next_phase=Ended
2024-07-13T20:17:47.442449Z ERROR stage{stage="ledger"}:schedule: gasket::framework: x=error receiving work unit through input port
2024-07-13T20:17:47.442477Z ERROR stage{stage="ledger"}: gasket::runtime: stage should stop retry=Retry(None)
2024-07-13T20:17:47.442491Z INFO stage{stage="ledger"}: gasket::runtime: switching stage phase prev_phase=Working next_phase=Teardown
2024-07-13T20:17:47.442528Z INFO stage{stage="ledger"}: gasket::runtime: stage teardown ok
2024-07-13T20:17:47.442541Z INFO stage{stage="ledger"}: gasket::runtime: switching stage phase prev_phase=Teardown next_phase=Ended
Error: × serve thread
├─▶ serving Ouroboros
╰─▶ server error: Address already in use (os error 48)
One way to solve this is to use a lockfile with a pid to decide whether it is safe to delete the existing node socket and recover.
The text was updated successfully, but these errors were encountered:
If the dolos node is killed with
kill -9
, for example, it is unable to start up and re-establish the socket.In particular, the file already exists, so it will encounter an "address already in use" error, but it won't print this until the node is gracefully shutting down.
The logs on startup don't indicate that anything is amiss:
But once delivering a SIGTERM, it reveals what the problem is:
One way to solve this is to use a lockfile with a pid to decide whether it is safe to delete the existing node socket and recover.
The text was updated successfully, but these errors were encountered: