-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multiple writes to branch ref key when using source_root. #110
base: master
Are you sure you want to change the base?
Conversation
When using source_root paths outside that source_root would be processed and due to errors in the processing loop logic would trigger the callback that would ultimately trigger multiple writes to the branch ref key in consul. It also prematurely releases the lock on the branch. This happens because the logic for skipping paths outside the source_path calls check_pending which decrements the pending_record count. When the record count is at or below zero then it would call the upstream callback and trigger the ref key write and the removal of the locks. Additionally, the callback was wrapped in _.once() implying an attempt to prevent the callback from being called multiple times. But this fails because the code called the cb() function and passed the return to _.once, instead of using _.once to create a new variable. In the handling for the common_properties file, errors in processing the consul updates were prefixed with "Some consul updates failed:", but these errors are appended to the existing errors which would then be prefixed by the same string further up, resulting in the prefix appearing multiple times in the logs. Fix removes the pending_records counter entirely and uses _.after() to wrap the callback so that the callback is only called after all records are processed. This means that it is important that calling check_pending() must be called on every record exactly once. The old logic technically would allow you to call it multiple times as long as you incremented the same number of times. Finally, move the error handler for an unknown git status to also call check_pending and provide the record.path for easier debugging. While we're at it make it obvious which records are skipped due to being outside the source_root when doing trace logging.
Some notes about this. The bug here was causing us serious problems with starting git2consul. We were running it in a container with one repo that had different trees for different consul clusters, and using source_root. The constant writing of the branch reference key was causing serious performance problems with git2consul and consul itself for us since we had rather large numbers of entries and every key regardless of it's path was triggering a write in every repo. If git2consul didn't crash then once it was up would run fine. But with these fixes the problem is now gone. |
On further thought. I've realized this doesn't necessarily just break for people using source_root. Since the existing code runs the loop synchronously and increments as it goes. Each iteration of the loop regardless of source_root is going to increment the pending_records count to 1 and then decrement it to zero and trigger the callback. I haven't bothered to test this. But I thought it was important to point out the impact of the bug fixed by this PR is much greater than I originally thought. |
Can someone review this and at least provide some feedback? It's been a month. We've been running this in production for nearly as long as this Pull Request has been here and it's resolved all of the issues we had with git2consul crashing. |
Sorry, I did glance at this change, but there we no similar reports or any comments that were added to this PR so I had it sit around for a while. Can you provide a test case in |
I'll see what I can do, might take me a bit to figure out your test suite. Something you can do in the meantime to replicate the issue. Setup a repo with a source_root and multiple paths outside the source root. Turn on trace logging. You'll see a write to master.ref for every entry outside the source_root without this change. Correct behavior would be a single write to master.ref after finishing processing all the files in the repo. Another method to demonstrate how messed up the processing logic is by this flaw would be to setup a repo with enough entries that it would take longer to process than the polling period you setup. You should see it process a poll while it's still doing the initial load. This happens because the write to master.ref also triggers the removal of the lock preventing this. I believe this should happen regardless of using source_root because of the way it's handling the determination of when the processing is complete. In increments as it goes and decrements as it goes, so it should hit 0 and trigger the callback on every iteration. Hopefully that helps you consider the problems in the meantime. |
When using source_root paths outside that source_root would be processed
and due to errors in the processing loop logic would trigger the
callback that would ultimately trigger multiple writes to the branch ref
key in consul. It also prematurely releases the lock on the branch.
This happens because the logic for skipping paths outside the
source_path calls check_pending which decrements the pending_record
count. When the record count is at or below zero then it would call
the upstream callback and trigger the ref key write and the removal
of the locks.
Additionally, the callback was wrapped in _.once() implying an attempt
to prevent the callback from being called multiple times. But this
fails because the code called the cb() function and passed the return
to _.once, instead of using _.once to create a new variable.
In the handling for the common_properties file, errors in processing
the consul updates were prefixed with "Some consul updates failed:",
but these errors are appended to the existing errors which would then
be prefixed by the same string further up, resulting in the prefix
appearing multiple times in the logs.
Fix removes the pending_records counter entirely and uses _.after()
to wrap the callback so that the callback is only called after all
records are processed. This means that it is important that calling
check_pending() must be called on every record exactly once. The old
logic technically would allow you to call it multiple times as long
as you incremented the same number of times.
Finally, move the error handler for an unknown git status to also
call check_pending and provide the record.path for easier debugging.
While we're at it make it obvious which records are skipped due to
being outside the source_root when doing trace logging.