I have special filenames with escape characters stored in Git repository on Debian 10 Linux.
Problem: it is not possible to git checkout files on Windows, which have incompatible characters in the filename.
Example:
git log --all --name-only -m --pretty= '*\*'
"systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
"systemd/system/multi-user.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
"systemd/system/snap-git\x2dfilter\x2drepo-7.mount"
I get following Git errors at Windows checkout:
C:Gitbingit.exe reset --hard "5ef1cac3a03304c35b455edf32bd1bb78060c5b9" --
error: invalid path 'systemd/system/default.target.wants/snap-gitx2dfilterx2drepo-7.mount'
fatal: Could not reset index file to revision '5ef1cac3a03304c35b455edf32bd1bb78060c5b9'.
Done
Problem reproducing steps:
# Clone repository, to be executed on a safe repo:
git clone --no-local /source/repo/path/ /target/path/to/repo/clone/
# Cloning into '/target/path/to/repo/clone'...
# remote: Enumerating objects: 9534, done.
# remote: Counting objects: 100% (9534/9534), done.
# remote: Compressing objects: 100% (4776/4776), done.
# remote: Total 9534 (delta 4215), reused 8043 (delta 3136), pack-reused 0
# Receiving objects: 100% (9534/9534), 7.41 MiB | 16.78 MiB/s, done.
# Resolving deltas: 100% (4215/4215), done.
cd /target/path/to/repo/clone/
# List the files with escape from repo history into a list file:
git log --all --name-only -m --pretty= '*\*' | sort -u >/opt/git_repo_files_w_escape.txt
# Remove the files with escape from repo history:
git filter-repo --invert-paths --paths-from-file /opt/git_repo_files_w_escape.txt
Parsed 592 commits
New history written in 0.25 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 71128f3 .gitignore: ADD snap-git to be ignored
Enumerating objects: 9354, done.
Counting objects: 100% (9354/9354), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3694/3694), done.
Writing objects: 100% (9354/9354), done.
Total 9354 (delta 4085), reused 9354 (delta 4085), pack-reused 0
Completely finished after 0.55 seconds.
# List files with escape to check result:
git log --format="reference" --name-status --diff-filter=A '*\*'
# "systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
# "systemd/system/multi-user.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
# "systemd/system/snap-git\x2dfilter\x2drepo-7.mount"
# Unfortunately it seems filter-repo was executed, but log still lists filenames with escape :-(
Question:
1) How to remove all files from Git repo history with path having at least one escape character in filename?
(reason: it is not possible to checkout those files on Windows, which have incompatible characters in the filename)
UPDATE1:
Tried to replace \x2d
string to – in input file list as suggested, but git history remove was still unsuccessful:
# List the files with escape from repo history into a list file:
git log --all --name-only -m --pretty= '*\*' | sort -u >/opt/git_repo_files_w_escape.txt
# Replace \x2d string to - in git_repo_files_w_escape.txt:
sed -i 's/\\x2d/-/g' /opt/git_repo_files_w_escape.txt
# Remove the listed files from repo history:
git filter-repo --invert-paths --paths-from-file /opt/git_repo_files_w_escape.txt
Parsed 592 commits
New history written in 0.25 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 71128f3 .gitignore: ADD snap-git to be ignored
Enumerating objects: 9354, done.
Counting objects: 100% (9354/9354), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3694/3694), done.
Writing objects: 100% (9354/9354), done.
Total 9354 (delta 4085), reused 9354 (delta 4085), pack-reused 0
Completely finished after 0.55 seconds.
# List files with escape to check result:
git log --format="reference" --name-status --diff-filter=A '*\*'
# "systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
# "systemd/system/multi-user.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
# "systemd/system/snap-git\x2dfilter\x2drepo-7.mount"
# Unfortunately log still lists filenames with \x2d :-(
UPDATE2:
Tried to replace \x2d
in git_repo_files_w_escape.txt to \\x2d
or x2d
but none of them resulted to remove the files having \x2d
in filename from Git history.
UPDATE3:
I’m looking for a working solution based on git filter-repo.
Any more idea?
3
Answers
fwiw, this worked on a linux system, this allowed me to rewrite the HEAD commit without having the files checked out on disk:
The same commands should work on
git-bash
for windows.Assuming you have many files that you want to fix scattered in the hierarchy, a solution with
git filter-repo
looks tedious. You can instead use a combination ofgit fast-export
andgit fast-import
to modify file names in the whole history.Now delete the file entries containing a backslash:
Instead of removing the files, you can also modify the file names. For example, to replace the backslash by a dash
-
, you could try this:You may now investigate the difference between the two files to ensure that no commit messages were modified:
Now attempt to import the modified history:
This will stop with an error that tells you that the branches will not be modified because the old branch heads are not subsets of the new heads. If there are no other errors, you can now force the modification:
You fed bad input into filter-repo, based on a common but incorrect assumption about how git log works.
Look at your own output:
Let’s look at the first line as an example. If you were to store that in a file, which you pass to –paths-from-file, then git-filter-repo is going to be looking for a file named
"systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
to remove. You have no such file in your repository. Instead you have one namedsystemd/system/default.target.wants/snap-gitx2dfilterx2drepo-7.mount
. (Note that I have removed both"
characters and two of thecharacters.)
The problem here is that you assumed git log would list filenames as-is, which it won’t do whenever there are special characters. You can often get around this by setting core.quotepath=false (this particularly helps when you have non-ascii characters), but even that is ignored when you have backslashes.
Here’s something that might work better for you for generating the list of filenames to exclude:
but it assumes you do not have filenames with newline characters. (If you do have files with newline characters, though, then –paths-from-file won’t work for you.)
Even simpler would be bypassing creating a list of files with bad names and just programatically removing them by pattern: