Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multiple writes to branch ref key when using source_root. #110

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

breser
Copy link
Owner

@breser breser commented Oct 29, 2016

When using source_root paths outside that source_root would be processed
and due to errors in the processing loop logic would trigger the
callback that would ultimately trigger multiple writes to the branch ref
key in consul. It also prematurely releases the lock on the branch.

This happens because the logic for skipping paths outside the
source_path calls check_pending which decrements the pending_record
count. When the record count is at or below zero then it would call
the upstream callback and trigger the ref key write and the removal
of the locks.

Additionally, the callback was wrapped in _.once() implying an attempt
to prevent the callback from being called multiple times. But this
fails because the code called the cb() function and passed the return
to _.once, instead of using _.once to create a new variable.

In the handling for the common_properties file, errors in processing
the consul updates were prefixed with "Some consul updates failed:",
but these errors are appended to the existing errors which would then
be prefixed by the same string further up, resulting in the prefix
appearing multiple times in the logs.

Fix removes the pending_records counter entirely and uses _.after()
to wrap the callback so that the callback is only called after all
records are processed. This means that it is important that calling
check_pending() must be called on every record exactly once. The old
logic technically would allow you to call it multiple times as long
as you incremented the same number of times.

Finally, move the error handler for an unknown git status to also
call check_pending and provide the record.path for easier debugging.

While we're at it make it obvious which records are skipped due to
being outside the source_root when doing trace logging.

When using source_root paths outside that source_root would be processed
and due to errors in the processing loop logic would trigger the
callback that would ultimately trigger multiple writes to the branch ref
key in consul.  It also prematurely releases the lock on the branch.

This happens because the logic for skipping paths outside the
source_path calls check_pending which decrements the pending_record
count.  When the record count is at or below zero then it would call
the upstream callback and trigger the ref key write and the removal
of the locks.

Additionally, the callback was wrapped in _.once() implying an attempt
to prevent the callback from being called multiple times.  But this
fails because the code called the cb() function and passed the return
to _.once, instead of using _.once to create a new variable.

In the handling for the common_properties file, errors in processing
the consul updates were prefixed with "Some consul updates failed:",
but these errors are appended to the existing errors which would then
be prefixed by the same string further up, resulting in the prefix
appearing multiple times in the logs.

Fix removes the pending_records counter entirely and uses _.after()
to wrap the callback so that the callback is only called after all
records are processed.  This means that it is important that calling
check_pending() must be called on every record exactly once. The old
logic technically would allow you to call it multiple times as long
as you incremented the same number of times.

Finally, move the error handler for an unknown git status to also
call check_pending and provide the record.path for easier debugging.

While we're at it make it obvious which records are skipped due to
being outside the source_root when doing trace logging.
@breser
Copy link
Owner Author

breser commented Oct 29, 2016

Some notes about this. The bug here was causing us serious problems with starting git2consul. We were running it in a container with one repo that had different trees for different consul clusters, and using source_root. The constant writing of the branch reference key was causing serious performance problems with git2consul and consul itself for us since we had rather large numbers of entries and every key regardless of it's path was triggering a write in every repo. If git2consul didn't crash then once it was up would run fine. But with these fixes the problem is now gone.

@breser
Copy link
Owner Author

breser commented Nov 2, 2016

On further thought. I've realized this doesn't necessarily just break for people using source_root. Since the existing code runs the loop synchronously and increments as it goes. Each iteration of the loop regardless of source_root is going to increment the pending_records count to 1 and then decrement it to zero and trigger the callback. I haven't bothered to test this. But I thought it was important to point out the impact of the bug fixed by this PR is much greater than I originally thought.

@breser
Copy link
Owner Author

breser commented Dec 3, 2016

Can someone review this and at least provide some feedback? It's been a month. We've been running this in production for nearly as long as this Pull Request has been here and it's resolved all of the issues we had with git2consul crashing.

@calvn
Copy link
Contributor

calvn commented Dec 3, 2016

Sorry, I did glance at this change, but there we no similar reports or any comments that were added to this PR so I had it sit around for a while. Can you provide a test case in test/ where it would fail without this change, and pass with it?

@breser
Copy link
Owner Author

breser commented Dec 3, 2016

I'll see what I can do, might take me a bit to figure out your test suite.

Something you can do in the meantime to replicate the issue. Setup a repo with a source_root and multiple paths outside the source root. Turn on trace logging. You'll see a write to master.ref for every entry outside the source_root without this change. Correct behavior would be a single write to master.ref after finishing processing all the files in the repo.

Another method to demonstrate how messed up the processing logic is by this flaw would be to setup a repo with enough entries that it would take longer to process than the polling period you setup. You should see it process a poll while it's still doing the initial load. This happens because the write to master.ref also triggers the removal of the lock preventing this. I believe this should happen regardless of using source_root because of the way it's handling the determination of when the processing is complete. In increments as it goes and decrements as it goes, so it should hit 0 and trigger the callback on every iteration.

Hopefully that helps you consider the problems in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants