Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of renamed branches for clone/fetch #1493

Merged
merged 3 commits into from
May 3, 2024

Conversation

dh2i-sam
Copy link
Contributor

@dh2i-sam dh2i-sam commented Apr 7, 2024

The bug that this is intended to fix appears to be documented in #1029. I've been trying to migrate a TFS repository to Git, and need complete history. My repository has maybe five branches that have been renamed, and I've lost patience with working around this bug by manually initializing remotes of the deleted former branch names to ensure that preceding changesets are present, as well as manually enumerating all of the branches.

The fix has three facets:

In GitTfsRemote.FetchWithMerge(), apply special handling to initial branch commits that are renaming, as rename destination branches will have these.

In GitTfsRemote.InitTfsRemoteOfChangeset(), when iterating over branch ancestry, call InitBranch() on the ancestor branch path, not the target branch path. Also only request fetching of parents for the most distant ancestor. Also when handling renamed branches, set the renameResult parameter for the successor.

In GitTfsRemote.InitTfsBranch(), pull the initial commit ID out of renameResult, if provided.

branch commits that are renaming, as rename destinations will have
these.
In GitTfsRemote.InitTfsRemoteOfChangeset(), when iterating over branch
ancestry, call InitBranch() on the ancestor branch, not the target
branch.  Also only request fetching of parents for the most distant
ancestor.  Also when handling renamed branches, assign the renameResult
for successors.
In GitTfsRemote.InitTfsBranch(), pull the initial commit hash out of
renameResult, if provided.
@bramborman
Copy link
Contributor

This seems to at least partially cover #835 as well, am I right?
Thus my #1488 would be suppressed by this..?

@bramborman
Copy link
Contributor

Just checked this on my case and it seems to work 😊👍 My case is a bit different to yours, though:
For some reason the new branch was created empty, deleted right after, then created empty again, then part of another branch with no common predecessor with this new one, also in a different path, was moved to this one.
So something like the following:

C1 Create $/Project/Branch
C2 Create $/Project/NewBranch (not merge/branch from $/Project/Branch, create it from scratch)
C3 Delete $/Project/NewBranch
C4 Create $/Project/NewBranch
C5 Merge $/Project/Branch into $/Project/NewBranch

@dh2i-sam do you use the --gitignore option when importing? I do and in the aforementioned scenario the .gitignore commit is a bit off - in the /Project/NewBranch branch, basically before C2. It should be before C1 in the old branch.

I will try to find and test the gitignore support on a simpler case that doesn't contain a merge from branch with different root, but do worry such scenario might even fail completely 🤔

@pmiossec
Copy link
Member

pmiossec commented Apr 8, 2024

@dh2i-sam Thanks for the PR.
@bramborman Thanks for taking the time to test it, I was going to ask you to do it..

@@ -637,10 +647,12 @@ private IGitTfsRemote InitTfsRemoteOfChangeset(IBranchObject tfsBranch, int pare
var branchesDatas = Tfs.GetRootChangesetForBranch(tfsBranch.Path, parentChangesetId);

IGitTfsRemote remote = null;
bool first = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a better name for the first variable: isFirstBranchChangeset? (or something better)

Comment on lines 668 to 672
if (lastFetch != null &&
lastFetch.IsProcessingRenameChangeset)
renameResult = lastFetch;
else
renameResult = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ternary operator?

Suggested change
if (lastFetch != null &&
lastFetch.IsProcessingRenameChangeset)
renameResult = lastFetch;
else
renameResult = null;
renameResult = lastFetch?.IsProcessingRenameChangeset
? lastFetch
: null;

Comment on lines 989 to 993
if (renameResult != null &&
renameResult.IsProcessingRenameChangeset)
sha1RootCommit = renameResult.LastParentCommitBeforeRename;
else
sha1RootCommit = Repository.FindCommitHashByChangesetId(rootChangesetId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Suggested change
if (renameResult != null &&
renameResult.IsProcessingRenameChangeset)
sha1RootCommit = renameResult.LastParentCommitBeforeRename;
else
sha1RootCommit = Repository.FindCommitHashByChangesetId(rootChangesetId);
sha1RootCommit = renameResult?.IsProcessingRenameChangeset
? renameResult.LastParentCommitBeforeRename;
: Repository.FindCommitHashByChangesetId(rootChangesetId);

@bramborman
Copy link
Contributor

bramborman commented Apr 8, 2024

About the --gitignore issue... While the commit adding .gitignore is only in one branch, the gitignore is still getting applied to the other branch as well. I'm not sure if it was really understandable from the words, so have a look at this visualization, plese 🙂
This is how the Git graph looks like:

graph

The Add .gitignore commit should be a parent of Add $/Proj/B1/File branch. Or at least should be there for every branch that has no parent, or else a .gitignore that's not at all in the /Proj/B1 branch gets applied to it.

Btw, I've tried running this change against a simpler TFVC branch history but had no luck - the rename wasn't followed. I'm just about debugging it right now, so will get back with my findings :)

@bramborman
Copy link
Contributor

So I'm debugging and I have this basically:

C1 Add $/Project/Branch
C2 Add $/Project/Branch/File
C3 Move $/Project/Branch to $/Project/NewBranch
C4 Edit $/Project/NewBranch/File

and I use the $/Project/NewBranch as the root provided to git-tfs.
But in git I only see C3 and C4, not the previous ones 🙁

@dh2i-sam how do you invoke git-tfs? I use this:

git tfs clone ORGANIZATION_URL TFVC_BRANCH_PATH TARGET_DIR --branches=all --all

All in all, I think I know what to do in order to maybe fix both the gitignore and this not working for me issues. Will play with it a bit and get back 🙂

@bramborman
Copy link
Contributor

Well basically I think the issue with the aforementioned scenario is simply that the original branch does not exist in the Git repo yet. It would be fine if the first changeset in $/Project/NewBranch was a merge changeset as for that there is the ProcessMergeChangeset method that fetches the other branch, but there's no such method for if it's just a rename changeset and thus we don't get the history of it.

It would all maybe work if there was the history of the original branch in the Git repo already and then we asked git-tfs to fetch the branch from the new location.

@bramborman
Copy link
Contributor

Let's say we have the following structure:

C1 Add $/Project/Branch
C2 Add $/Project/Branch/File
C3 Move $/Project/Branch to $/Project/NewBranch
C4 Edit $/Project/NewBranch/File

Cloned using:

git tfs clone ORGANIZATION_URL $/Project/NewBranch TARGET_DIR --branches=all --all

The above starts fetching $/Project/NewBranch from its oldest changeset, so C3, and if it doesn't come along a merge changeset that would bring them in, git-tfs never goes back to pick up C1 - C2 as they are "out of the NewBranch scope".

To fix this we first have to detect that the NewBranch history actually started on a different branch - $/Project/Branch, fetch that one first, and just after when we have C2 in place, build on that.

I was looking into that a bit yesterday. There is an ITfsChangeset.IsRenameChangeset property, but unfortunately it isn't set a value until ITfsChangeset.Apply is called. And I'm not really sure about this, but it seems to me that the Apply method actually commits the changes to Git.
So I'm gonna try to extract the functionality out of it and get the IsRenameChangeset set before the changeset is committed to the Git repo.

@bramborman
Copy link
Contributor

I got some time to look at this today and managed to get the value of IsRenameChangeset before the commit is made and am now working on fetching the branch in its previous location before the current branch - the one in new location - is fetched.

@bramborman
Copy link
Contributor

So I found a method VersionControlServer.GetBranchHistory. When I feed it with the renamed branch path, it gets whole history following all previous renames:

VersionControlServer.GetBranchHistory(new[] { new ItemSpec("$/Project/RenamedBranch", RecursionType.None) }, VersionSpec.Latest)

Returns a structure like (simplified a lot):

$/Project/Branch
- $/Project/RenamedBranch
  - $/Project/BranchCreatedFromTheRenamedBranch

with all other branches ever created from the descendants of the original $/Project/Branch.

So maybe I will use that - when starting the clone, will use this to find the oldest branch and work bottom-up, following all renames and everything.

@bj185066
Copy link

bj185066 commented May 1, 2024

Just wanted to pitch in to comment that this PR fixed a problem with a customer TFS library I was having because of a large number of branches the customer had renamed. Built @dh2i-sam's repo and ran using that git-tfs.exe and went from the clone of their whole repo going from taking 9.5 hours and having several orphaned branches and merge changesets being treated as having no parent branch to a perfectly run clone that finished in just under 2 hours.

@bramborman
Copy link
Contributor

@bj185066 hi,can you please show me the command you ran this tool with? I haven't had such luck, unfortunately, so I built other changes on top of these (and will submit a PR hopefully this or next week) to fix my use case but I really wonder what do you do differently that it works for you 😁 Or whether it's just the way the TFS repo was used...

@bj185066
Copy link

bj185066 commented May 1, 2024

@bj185066 hi,can you please show me the command you ran this tool with? I haven't had such luck, unfortunately, so I built other changes on top of these (and will submit a PR hopefully this or next week) to fix my use case but I really wonder what do you do differently that it works for you 😁 Or whether it's just the way the TFS repo was used...

My command (stripped of customer specifics) was as follows
git-tfs clone http://our.tfs.server/Collection $/old_trunk c:\migration\customer-test --branches=all --ignore-branches-regex=^\$\/PATH_THEY_SHOULDN'T_SEE.+

The short explanation on the ignore regex is that there are merges to the customer's repos from our main repos that they don't have read access to anymore by their customer agreement, so we're skipping those branches. Everything else we want. The general layout along the problem areas was as follows

$/old_trunk
\_$/updated/old_trunk
  \_$/current/trunk
   |\_$/bugfix/branch1
   |  \_$/bugfix/branch1_renamed - this gets merged back to $/current/trunk
    \_$/bugfix/branch2
      \_$/bugfix/branch2_renamed - this gets merged back to $/current/trunk

Every time it would hit those, it was able to successfully get the branch and rename commit on $/bugfix/branch1 and $/bugfix/branch2, but would then hit $/bugfix/branch1_renamed and $/bugfix/branch2_renamed and would throw out warning: root commit not found corresponding to changeset 1421523, and then when hitting the merge changeset would throw out warning: this changeset 1421560 is a merge changeset. But git-tfs failed to find and fetch the parent changeset 1421523. Parent changeset will be ignored.... Using the code as of this PR, those warning are gone and it properly handled those renamed branches.

@bramborman
Copy link
Contributor

Thanks a lot @bj185066!! That's what I thought, you go bottom-up so to say - you tell git-tfs to clone the original location, not the new one as I do. I haven't tried this myself actually, but I always started with the renamed location, going up-to-bottom basically, so that's why these changes didn't work for me and why I developed on them further! Thanks a lot for clarification, though, I was really wondering if I'm just missing something here 😅

@pmiossec
Copy link
Member

pmiossec commented May 2, 2024

Thanks @bj185066 and @bramborman for your feedback that helps me evaluate the value of this PR.
I think I will merge it.

I was waiting for @dh2i-sam feedback on code review few nit-pickings but we get no feedback ( @dh2i-sam ?)
I have pushed the little adjustments and I think we are good to go... @dh2i-sam, it's the good moment if you want to share your point of (especially on the fixup commits I pushed). Feedback on the source code from all of you is much welcomed 😉

@bramborman
Copy link
Contributor

Agree on merging, but please don't close the related issues as they are imho not completely solved - depends on how you clone and I have a fix for that locally :)

@pmiossec
Copy link
Member

pmiossec commented May 2, 2024

Agree on merging, but please don't close the related issues as they are imho not completely solved

I was not planning to do it but thanks for the head up: I will be careful about what GitHub will do automatically...

@pmiossec pmiossec merged commit 16fa7a3 into git-tfs:master May 3, 2024
1 check passed
@pmiossec
Copy link
Member

pmiossec commented May 3, 2024

Thanks all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants