Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "detached head" issue #8296

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

oron-peled
Copy link

Now we set the correct commit via git reset --hard <rev>:

  • This will move HEAD to correct commit
    Unlike checkout -f <rev> that create "detached head"
  • If <rev> isn't found on <branch> it will fail loudly
    This is the correct behavior when contradicting parameters are provided

Fixes #8295

@oron-peled oron-peled force-pushed the fix-detached-head-issue branch from 9212244 to 68b6f1b Compare January 11, 2025 14:05
@oron-peled
Copy link
Author

As it's my first pull-request, I did all possible mistakes -- forgot updating tests in the beginning, then missing some, finally prepared local test environment (excellent, just a piece of marvel)

Now all tests are working

@p12tic
Copy link
Member

p12tic commented Jan 12, 2025

The problem with the current solution is that it does not check what branch the existing codebase is on. It just blindly does git reset --hard <rev>. In many cases this will cause the branch name from the previous build to be used. See e.g. test_mode_incremental test - it does not switch the branch to the branch of the build.

I think a better idea would be to keep git checkout -f <rev> as is, but add: git branch -D <branch> and git checkout -b <branch> if the branch is provided.

@oron-peled
Copy link
Author

@p12tic -- the problem with your proposal is that it will checkout the tip of the branch and not the wanted <rev>
OTOH, my proposal leave the correct git clone -b <branch> <url> as it is, but the git reset --hard <rev> assures that git "rewind back" to the wanted <rev> (but stay on the wanted <branch>)

@p12tic
Copy link
Member

p12tic commented Jan 15, 2025

the problem with your proposal is that it will checkout the tip of the branch and not the wanted

I don't think this is the case. The sequence of commands which I'm suggesting is this:

# go to <rev>, detached HEAD state
git checkout -f <rev>   

# remove branch in case it exists.
git branch -D <branch> 

# creates a new branch at HEAD, which is the same as <rev>. 
# Also switches to the branch, so HEAD is no longer detached
git checkout -b <branch>  

In step 3 we create branch at HEAD (which is the same as <rev>). We disregard the previous position of the branch entirely.

Here's a list of commands for reproduction:

mkdir gittest
cd gittest
git init
# create 10 commits
for i in $(seq 1 10); do echo $i > file; git add file; git commit -m "$i" ; done
# go back to 5th commit
git checkout -f HEAD^^^^^
git branch -D master
git checkout -b master
# verify branch is on 5th commit, not 10th commit
git log --pretty=oneline

@oron-peled
Copy link
Author

@p12tic -- I agree your sequence result in the same state, so it's a matter of taste.
I don't have any strong objection, except that I think my sequence is simpler:

  • The original git clone -b <branch> <url> already bring us to the correct branch
  • Then a single git reset --hard <rev> rewind back to the wanted <rev>

If you prefer your sequence, are you going to add it? (tons of matching fixes in the tests/...)

@oron-peled
Copy link
Author

oron-peled commented Jan 16, 2025

@p12tic -- btw, there's a small semantic difference between the two sequences in the case that, by mistake, the provided <rev> is not on <branch>:

  • With the git reset --hard <rev> approach, such a mistake will lead to loud failure.
  • With the git checkout -b <branch> approach, a local <branch> will silently be created, which is different from origin/<branch>

@p12tic
Copy link
Member

p12tic commented Jan 16, 2025

Thanks, now I understand why we arrive to different conclusions.

The original git clone -b already bring us to the correct branch

The issue is that Git step supports reusing existing repository. So depending on the options sent to Git step, clone command may have happened in the past with a different branch. This is called incremental mode in buildbot documentation.

This is the reason why just git reset --hard <rev> is not enough.

@p12tic
Copy link
Member

p12tic commented Jan 16, 2025

such a mistake will lead to loud failure.

In incremental mode the failure may not happen because the step reuses existing git repository, so the commit may exist even if the branch does not contain it.

@oron-peled
Copy link
Author

@p12tic -- so maybe we need git checkout <branch> + git reset --hard <rev>
This will still be more deterministic in case of a mistake (see my second comment above)

@oron-peled
Copy link
Author

Re-reading the code, I can see now this delicate dance at the end of the _fetch() method:

        # Rename the branch if needed.
        if res == RC_SUCCESS and self.branch != 'HEAD':
            # Ignore errors
            yield self._dovccmd(['checkout', '-B', self.branch], abandonOnFailure=False)

This is really strange -- why do this at the end, after we already sit on the exact wanted <rev> ?
I'd say the simpler strategy is first making sure we're on the wanted <branch> (and refreshed if needed) and the last thing would be to reset our position back to <rev>

@p12tic
Copy link
Member

p12tic commented Jan 16, 2025

so maybe we need git checkout + git reset --hard

I agree that this is more correct due to the reasons you've outlined.

My concern is that if <rev> is not the tip of the branch, we are doing two checkouts with a lot of potentially unnecessary file operations. Especially if the difference between <rev> and <branch> is large.

I think a better idea would be to have an explicit check if <branch> contains <rev>, though I'm not sure yet if it really makes sense inside a Git step, need to think.

@p12tic
Copy link
Member

p12tic commented Jan 16, 2025

I'd say the simpler strategy is first making sure we're on the wanted (and refreshed if needed) and the last thing would be to reset our position back to

This leads to more file operations. Best to checkout to <rev> doing only minimal set of file operations and then play with the branches.

By the way, the fact that call to self._dovccmd(['checkout', '-B', self.branch] exists mean that correct branch is already checked out, no?

@oron-peled
Copy link
Author

@p12tic -- but the git checkout -B <branch> is at the end, after <rev> was already handled. My point it that the order of handling should be reverse -- first set the <branch> and then rewind to <rev>

About amount of file operations (assuming a big rewind is needed) -- let's assume we do this as you initially proposed:

  • It will first checkout the correct <rev>
  • But then it will branch to a local <branch> which is different from origin/<branch> (only has the same name, but not the same hash -- this is totally misleading

I think that if the tip of <branch> and <rev> are far apart, the file operations are unavoidable, otherwise we do a wrong semantics trying to adapt <branch> to the wanted <rev>

@p12tic
Copy link
Member

p12tic commented Jan 16, 2025

But then it will branch to a local which is different from origin/ (only has the same name, but not the same hash -- this is totally misleading

Isn't git reset --hard <rev> doing that as well anyway? The only difference between our approaches is that my approach does not setup origin/<branch> as the upstream for the created branch.

@p12tic
Copy link
Member

p12tic commented Jan 16, 2025

I think that if the tip of and are far apart, the file operations are unavoidable, otherwise we do a wrong semantics trying to adapt to the wanted

Yes, but the git checkout -f <rev> has fewer operations. Consider how we move from <rev_old> to <rev>:

git checkout -f approach: <rev_old> -> <rev>

git reset --hard approach: <rev_old> -> <branch> -> <rev>.

On clean clone both approaches are the same, I'm concerned with incremental checkouts. git checkout -f avoids moving files to <branch>.

@oron-peled
Copy link
Author

@p12tic -- for my use-case, any approach will work, so you can take your pick.
But I think a git checkout -f <rev_old> <rev> is jumping between revisions that nobody guarantee are on the same branch -- so for optimization it sacrifice correctness.

@p12tic
Copy link
Member

p12tic commented Jan 16, 2025

I would say that git reset --hard <rev> does not guarantee that commit is on branch either. It only guarantees that commit <rev> exists in repository (<rev> actually can be any commit, even in completely unrelated history path). Because we do fetch the commit from remote, we will always have <rev> and using git reset --hard thus does not improve correctness.

The correct way would be to use git branch <branch> --contains <rev> to check if the branch contains commit.

@oron-peled
Copy link
Author

@p12tic -- just tested and you are correct.
So what is the workflow?

  • git branch <branch> --contains <rev>
  • git checkout -f <rev>
  • git checkout -D <branch>
  • git checkout -b <branch>

And what about the final git checkout -B ... chunk at the end? Should it be removed?

@p12tic
Copy link
Member

p12tic commented Jan 16, 2025

I'm wondering if we are observing some kind of a bug, because git checkout -B ... is the same as git checkout -D <branch> and git checkout -b <branch>, so code should already correctly put branch on the current commit. Would it be possible for you to debug what your Buildbot instance does?

@p12tic
Copy link
Member

p12tic commented Jan 16, 2025

Regarding git branch <branch> --contains <rev> let's skip it for now. Including it has the downside that this will break some user configurations that worked perfectly well before. Need to think about best way here.

@oron-peled
Copy link
Author

OK, so the revised workflow would be:

  • git checkout -f <rev>
  • git checkout -D <branch>
  • git checkout -b <branch>

And what about the git checkout -B ... fragment at the end?

@p12tic
Copy link
Member

p12tic commented Jan 17, 2025

I'm saying that git checkout -B ... is equivalent to git checkout -D ... and git checkout -b ... and thus it is unclear why there are issue for you as current code should already work. Which is weird, because you did see some issue. I suggest doing some debugging and figuring out why git checkout -B ... is insufficient.

@oron-peled
Copy link
Author

@p12tic -- my problem is not with git checkout -B ... itself, but about why it happen in the end, after all the previous checkouts already happened. I.e: the resulting proposed workflow is:

  • git checkout -f <rev>
  • git checkout -D <branch>
  • git checkout -b <branch>
  • And now at the end git checkout -B <branch> which you claim is equivalent to the previous 2 commands -- why? what does it achieve? why do the previous 2 commands if git checkout -B <branch> and why it is conditional?

@p12tic
Copy link
Member

p12tic commented Jan 20, 2025

The communication between us broke a bit :)

Initially I suggested git checkout -D <branch> and git checkout -b <branch> to setup branch. You pointed out that git checkout -B <branch> exists. git checkout -B <branch> does the same as git checkout -D <branch> and git checkout -b <branch> . So now it seems that current code already should do what we want to do. I'm thus confused why we would like to change anything and suggest to debug why git checkout -B <branch> is not called. If you find a code path where git checkout -B <branch> is not called, then fix is to simply call it.

@oron-peled
Copy link
Author

oron-peled commented Jan 22, 2025

For my specific use-case, your solution would be equivalent -- so if you prefer it, go ahead.

But generally, there are two points I want to raise:

  • [EDITED - ignore this paragraph, see at the end] The first is that both in my proposal and your proposal -- there's no conditional git checkout -B ... as in the current code (so it seems it should be removed with either proposal)
  • The second is that the behavior is different if by accident the given <rev> is not part of the given <branch>
    • In my proposal, the git reset --hard <rev> create a detached HEAD (just like today's code)
    • In your proposal, a fake local <branch> is created that points to <rev> and has nothing to do with origin/<branch>
    • That's why I don't like the semantics of your proposal (again, for my use-case they are equivalent, so I'll accept anything that solve the detached HEAD situation)

As you said, some miscommunication -- if you intend to use git checkout -B ... instead of your original two git operations (-D + -b) then they are equivalent as you said. I was just wandering why git checkout -B ... is used after a set of git operations already did git checkout -f <rev> and git branch -D <branch> and git checkout -b <branch> -- but if you intend to replace the -D + -b with a single -B then I have no further questions related to this part.

@p12tic
Copy link
Member

p12tic commented Jan 24, 2025

Still some miscommunication :)

What I'm saying is that the current code already has git checkout -B <branch>. This means that in theory #8295 should not exist. There should not be detached HEAD issue.

However, reality is different. This is why I'm saying you should debug your builds and figure out why you have detached HEAD. Maybe git checkout -B <branch> is simply not called for some reason. Maybe other bug.

@p12tic
Copy link
Member

p12tic commented Jan 24, 2025

Regarding doing git reset --hard <rev> to solve the issue when "by accident the given is not part of the given ".

This does not fully solve the issue. Why? Because in incremental Git step, some commits will be already fetched. So doing git reset --hard <rev> does not guarantee that <rev> will be part of any branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Git checkout with "detached head"
2 participants