-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Git on Windows client corrupts files > 4Gb #2434
Comments
First, you can confirm that BitBucket is sending the data correctly by checking the file in $ cd .git/lfs/objects/aa/6d
$ shasum -a 256 aa6d2a8e9acbb78895b3d2c6ae3cb0db737344aa82b2859d31f757deec931049
aa6d2a8e9acbb78895b3d2c6ae3cb0db737344aa82b2859d31f757deec931049 aa6d2a8e9acbb78895b3d2c6ae3cb0db737344aa82b2859d31f757deec931049 Next, the fact that it works with So, this leaves the git filters. There are two modes that could be causing problems:
$ GIT_LFS_SKIP_SMUDGE=1 git clone https://git-server/your/repo
$ cd repo
$ cat path/to/file
version https://git-lfs.github.com/spec/v1
oid sha256:98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
size 3
$ cat path/to/file | git lfs smudge > smudged-file.bin
$ shasum -a 256 hi.txt
98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4 hi.txt Based on that, I think one of the following could be happening:
Some questions:
It'd be really helpful if we could get a sample file that exhibits this behavior. I imagine that's a no-go, so we may have to come up with a special build of LFS with special tracing powers. @ttaylorr, any thoughts? Did I miss any debugging questions or trial commands to run? |
@technoweenie that looks pretty comprehensive. My hunch is that it's related to one of the three issues you described as being process filter-related. @obe1line do you have a copy of the file or repository that you could share? I think that would be the easiest way for me to debug this going forward. |
Hey @obe1line, I ran this by a Git core dev, and he mentioned that Git on Windows does not support files over 4GB. Unfortunately, this is not something we can fix in LFS. The best we can do now add a warning when large objects are added. As a workaround, I think you should disable smudging completely: $ git lfs install --skip-smudge
$ git lfs env
... snip
git config filter.lfs.process = "git-lfs filter-process --skip"
git config filter.lfs.smudge = "git-lfs smudge --skip -- %f" After that, you'll have to run @ttaylorr I think we should add an early warning in the filter smudge and process code, perhaps linking to a page offering this workaround. |
@technoweenie Thanks for the detailed comment.
I used the GIT_TRACE_PACKET, GIT_TRACE and GIT_CURL_VERBOSE to produce output previously and yes, it is rather large (~12Gb from memory). @ttaylorr It can be reproduced with any file >4Gb, nothing special about the repository. The file is pushed with the Linux client, and fetched with the Windows client - I assume if I had used Windows to push, then the file may not have transferred fully into the repository. @technoweenie Is the workaround only applicable to fetch? i.e. would uploading a file via "git lfs push" work correctly even with the Git 4Gb limit? |
It probably won't work on Windows. Added files are passed through the LFS |
Hi @ndebard -- thanks for commenting here. I think what your experiencing is the correct behavior, even though the output can be a bit confusing. This message appears when a file greater than 4.0 GB is copied into your working tree (it looks like Though the message isn't pertinent to you on Ubuntu (?), it will have relevancy for any colleagues of yours using Windows. |
Hi @ttaylorr |
It is safe to ignore the message for yourself -- since you're not on Windows, your repository should work as expected even though it contains large files in the working copy. The warning is to remind you that a checkout of your repository may not work on a Windows machine. |
Could someone please clarify what this |
Hi @luckydonald, thanks for asking! The
I don't think that turning off the smudge filter would make Windows able to store bigger files. The issue is rather that Git has a limitation on Windows of not being able to correctly smudge files when the size of the outgoing content is larger than 4GB. This isn't an inherent limit of the file system, rather an implementation detail of Git. |
So there's no viable workaround for this? Basically, if you have 4GB files in your repo, you can clone it if you disable the smudge filter, but you can't commit or push? |
Not quite. The issue is with 4 GiB files of any source, them coming from Git LFS is only one half of the problem. If another filter puts them there, or that's how they're stored in your Git repository, then it is not guaranteed that it will be checked out correctly by Git on Windows. With regards to the Git LFS-part of that problem, if you have a >4 GiB LFS object (read: not a Git object, but an LFS one), you can avoid introducing that into your local copy by passing One thing that I think is important to remember, is that this issue does not cause problems on Unix, macOS, or other platforms that don't have the >4 GiB file-size limitation. So, if you have a >4 GiB file in your repository (LFS or otherwise), it should work fine on platforms other than Windows. If you're on Windows, we are stuck with this behavior, so |
For the record, I have been using Git LFS on Windows by disabling the smudge filter and the process filter. Files >4GB seem to work fine. It just means you need to manually |
First: my colleges an me encounter that problem on windows 10 with newest NTFS filesystem which is without any doublt capable of handling files > 4 GB and even files of size up to 16Exabytes (see here http://www.ntfs.com/ntfs_vs_fat.htm). Second: the newest git (including lfs feature) is great. Third: Thanx for all those proposals for avoiding the problem, but at the end we don't want to avoid to use/download/clone/pull files > 4 GB. Is there a plan and timeframe to fix that problem on windows? |
So as I understand this issue, it's due to Git on Windows not supporting files greater than 4 GB properly. The issue is that the smudge and clean filters are invoked by Git, and Git itself doesn't handle this gracefully. Git LFS does handle this gracefully, but because it's invoked by Git (unless you disable the filters), the data is corrupted before it makes it to Git LFS. To explain the issue with Git, it's because the Git codebase uses Git for Windows is already tracking this issue as git-for-windows/git#1063. The good news is that when this is fixed in Git, everything should automatically work with any version of Git LFS. In the mean time, there isn't anything we as Git LFS developers can do to fix it. |
@bk2204 That sounds correct to me. I just wanted to reiterate that the workaround from @technoweenie does work:
So it is possible to still work with Git LFS on Windows for large files, but you must disable |
Thank you very much to clarify the situation. I think it was necessary to state quite clear what we "Big File Users" are waiting for :-) |
@ttaylorr @technoweenie Unless I'm mistaken, the workaround of disabling smudge only sortof works. If you have a nice big new file upstream, and you git pull/git lfs pull, everything is great. But then if you go To recover from this, you basically need to nuke your .git/lfs directory, as the presence of the file in the local lfs objects folder means smudge runs for it even though the skip options are set. One important caveat: I'm on an older git/git-lfs version. I'll upgrade tomorrow, but just based on comparing the old and current source of git-lfs, I don't expect different behavior. If what I'm talking about sounds wacky/not the behavior you currently expect, perhaps it has already been fixed. |
Hey. This issue should not be closed as the windows client still corrupts files > 4GB. |
If I recall correctly, this is not related to a bug in Git, but rather is an inherent limitation of the Windows filesystem. |
The issue as described in the Git for Windows bug mentioned earlier in this thread (git-for-windows/git#1063) points to a problem with incorrect datatypes in the Git code, not with Windows filesystem limitations:
|
The limitation on files larger than 4 GiB is for FAT, but not NTFS. NTFS is capable of large files, but you're correct that if you're using a flash drive for your LFS-using Git repository, then you probably have a file system limitation. I think that most Windows users are using NTFS for their systems, though. I believe in this case the issue is as @shabbyrobe quoted: we use There are a handful of patches going into Git 2.21 to address some of these issues, although they may not be complete, so it may be useful to keep an eye out for improvements in that regard. |
I'm not sure why the post-merge hook isn't running. Also note that even if it does run, there may be other scenarios (adding a new lfs file, doing a checkout of an older commit hash which has a previous version of a large file) that may still cause problems and not be solved with a post-merge hook. At minimum those scenarios should at least be checked before rolling an automated workaround out to developers with the hopes that the devs won't have to worry about the >4GB issue. |
@aggieNick02, the post-merge hook seems to be omitted when |
I just really want to know what we're supposed to do to use 4GB+ files in Git. |
Can |
Well, if you're using Git LFS, you can either use
No, because the problem isn't Git LFS. The problem is that Git itself will truncate these files, so Git LFS reassembling them will still result in Git truncating them. |
Thank you for the follow up. |
Even when using the workarounds, you will still get warnings from Git LFS about the possible problem. But it typically doesn't actually happen with the workarounds. |
@marbx Apologies for not responding months ago - saw and thought about your question with the recent activity. |
Hi @aggieNick02 , thank you for your reply. For me, sadly, foolproof is mandatory. I recommended to stop using skip-smudge after it corrupted a ps1 file, maybe because Windows (Server 2012R2) assumed UTF16. Even before, skip-smudge was perceived as process risk. Until someone (or a group) fixes (or obsoletes) the "Git wrongly assumes that long in C has always 64bit" root cause, 4GB is unavailable for us. |
If you (or your team, or where you work) are/is in any able to help then that would be useful. Even the project scope isn't sufficiently fleshed out to identify a sub-MVP (smallest demonstrable progress - SDP?). I have lot's of personal notes, but it's quite a big job.. |
Hi @PhilipOakley , I guess the team "at hand" is more suitable than that "at work". I believe I read your rejected PR in which you replaced long with long long. I tried to identify a smaller change set by running a debugger with stop points on any long, but I don't found a suitable command to debug. What do you think of that? Do you have a note about a planned "direct file to file" method to copy files from lfs to git? I only have a vague memory. |
Hi @marbx, My 'at work' aside, was for those readers who might have a work place that would benefit from a >4GB resolution who could maybe get their local management to let them do a few hours on company time, on a win-win basis. Often there is more flexibility than one may imagine (companies cost things funnily;-) Which PR were you looking at (so we are talking about the same one). The idea, as per the C89/99 standard, is to use I don't have any notes on '"direct file to file" method to copy files from lfs to git'. I had been just focussed on the internals of Git/Git-for-Windows. Having been a Systems Engineer I tend to work from outside in (big picture), while git patching tends to work from inside to out (fine details picture). So I tend to want to be able to know when the job is complete, rather than small items started. One thought of an approachable activity is to look at commands which should NOT involve the 4GB limit and check that it's actually true, or just annotate(split) the list of commands into that same two groups (e.g. shouldn't, maybe, and probably, for 4GB testing - yes that's three, but it should be just two ;-) |
Yes, I meant your PR with size_t |
Has anybody else a memory about a direct file to file transfer? |
Perhaps it refers to |
So that's git-for-windows/git#2179 |
I'll happily fix this stuff. Since I live in the here and now I would probably just merge your patch if it passes tests, and otherwise fix the issues the tests show. Based on my cursory reading it sounds like there are some style issues and the patch splitting thing. You don't have to repeat / rehearse for me though.
My team installs the git for windows SDK for a good development environment. If it's as easy as a |
That would be a useful step. Even if you just pull out the zlib changes and (using the git CI) confirm that regular git still works OK, allowing those parts to be [ready for] upstream early.
That's excellent.
Yep, that's part of the long-run issue. But if you start somewhere it is still valuable. |
One command that may be worth starting with, as it has a core element to it, is the Pick the most simple of options to limit the areas of the code base that it could explore (e.g. tweak the |
Is this fixed in Git 2.34? |
It is fixed in Git for Windows 2.34, but not in Git 2.34. The patch was specifically applied to Git for Windows, but has not been released in upstream Git yet. |
Oh I thought that this was a Windows only bug, sorry! Cheers |
It is indeed a Windows-only bug, but in the event you're building your own Git on Windows from the Git source, not the Git for Windows source, then there's a difference. Probably 99% of users on Windows use Git for Windows, so for them, this issue will be fixed in 2.34. |
Since this is now fixed upstream, I'm closing this issue. |
When cloning a file larger than 4Gbyte from a BitBucket server repository (LFS enabled), the file is not reconstructed correctly E.g. a 6Gb file results in a 700Mb file. The lfs/objects folder contains the correct file however.
Steps to reproduce:
Server: Basic install of Debian with Bitbucket Server 4.6, Git 2.13 (64-bit)
Client 1: Ubuntu 16.02 64 bit, Git 2.13 and Git-LFS 2.2.1 (both 64-bit)
Client 2: Windows 64 bit (2012), Git 2.13 and Git-LFS 2.2.1 (both 64-bit)
Client 1 works correctly
Client 2 pulls down the file correctly into .git/lfs/objects/aa/6d/aa6d2a8e9acbb78895b3d2c6ae3cb0db737344aa82b2859d31f757deec931049 but does not reconstruct/copy it correctly to the destination folder (it results in a 1.82Gb file).
Atlassian have looked into the problem and believe that the BitBucket server is working correctly, due to the fact that the correct content is retrieved over the network into the temporary object file (the CRCs match the original file).
Note that no Git configuration has changed (smudge filters etc are the default).
If the file is removed and "git lfs pull" performed, the file is created correctly. Using "git lfs clone" also works.
The text was updated successfully, but these errors were encountered: