skip to Main Content

I have special filenames with escape characters stored in Git repository on Debian 10 Linux.

Problem: it is not possible to git checkout files on Windows, which have incompatible characters in the filename.

Example:

git log --all --name-only -m --pretty= '*\*'
"systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
"systemd/system/multi-user.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
"systemd/system/snap-git\x2dfilter\x2drepo-7.mount"

I get following Git errors at Windows checkout:

C:Gitbingit.exe reset --hard "5ef1cac3a03304c35b455edf32bd1bb78060c5b9" --
error: invalid path 'systemd/system/default.target.wants/snap-gitx2dfilterx2drepo-7.mount'
fatal: Could not reset index file to revision '5ef1cac3a03304c35b455edf32bd1bb78060c5b9'.
Done

Problem reproducing steps:

# Clone repository, to be executed on a safe repo:
git clone --no-local /source/repo/path/ /target/path/to/repo/clone/
# Cloning into '/target/path/to/repo/clone'...
# remote: Enumerating objects: 9534, done.
# remote: Counting objects: 100% (9534/9534), done.
# remote: Compressing objects: 100% (4776/4776), done.
# remote: Total 9534 (delta 4215), reused 8043 (delta 3136), pack-reused 0
# Receiving objects: 100% (9534/9534), 7.41 MiB | 16.78 MiB/s, done.
# Resolving deltas: 100% (4215/4215), done.

cd /target/path/to/repo/clone/

# List the files with escape  from repo history into a list file:
git log --all --name-only -m --pretty= '*\*' | sort -u >/opt/git_repo_files_w_escape.txt

# Remove the files with escape  from repo history:
git filter-repo --invert-paths --paths-from-file /opt/git_repo_files_w_escape.txt
Parsed 592 commits
New history written in 0.25 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 71128f3 .gitignore: ADD snap-git to be ignored
Enumerating objects: 9354, done.
Counting objects: 100% (9354/9354), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3694/3694), done.
Writing objects: 100% (9354/9354), done.
Total 9354 (delta 4085), reused 9354 (delta 4085), pack-reused 0
Completely finished after 0.55 seconds.


# List files with escape  to check result:
git log --format="reference" --name-status --diff-filter=A '*\*'
# "systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
# "systemd/system/multi-user.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
# "systemd/system/snap-git\x2dfilter\x2drepo-7.mount"

#  Unfortunately it seems filter-repo was executed, but log still lists filenames with escape  :-( 

Question:

1) How to remove all files from Git repo history with path having at least one escape character in filename?

(reason: it is not possible to checkout those files on Windows, which have incompatible characters in the filename)

UPDATE1:

Tried to replace \x2d string to – in input file list as suggested, but git history remove was still unsuccessful:

# List the files with escape  from repo history into a list file:
git log --all --name-only -m --pretty= '*\*' | sort -u >/opt/git_repo_files_w_escape.txt

# Replace \x2d string to - in git_repo_files_w_escape.txt:
sed -i 's/\\x2d/-/g' /opt/git_repo_files_w_escape.txt

# Remove the listed files from repo history:
git filter-repo --invert-paths --paths-from-file /opt/git_repo_files_w_escape.txt
Parsed 592 commits
New history written in 0.25 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 71128f3 .gitignore: ADD snap-git to be ignored
Enumerating objects: 9354, done.
Counting objects: 100% (9354/9354), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3694/3694), done.
Writing objects: 100% (9354/9354), done.
Total 9354 (delta 4085), reused 9354 (delta 4085), pack-reused 0
Completely finished after 0.55 seconds.


# List files with escape  to check result:
git log --format="reference" --name-status --diff-filter=A '*\*'
# "systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
# "systemd/system/multi-user.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
# "systemd/system/snap-git\x2dfilter\x2drepo-7.mount"

#  Unfortunately log still lists filenames with \x2d :-(

UPDATE2:

Tried to replace \x2d in git_repo_files_w_escape.txt to \\x2d or x2d but none of them resulted to remove the files having \x2d in filename from Git history.

UPDATE3:

I’m looking for a working solution based on git filter-repo.

Any more idea?

3

Answers


  1. fwiw, this worked on a linux system, this allowed me to rewrite the HEAD commit without having the files checked out on disk:

    git ls-files | grep -a -e '\' | while read f; do
        f=$(echo $f | sed -e 's|"||g')
        new=$(echo "$f" | sed -e 's|\\x2d|-|g')
        git show "@:$f" > $new
        git rm --cached "$f"
        git add "$new"
    done
    
    git status
    git commit --amend
    

    The same commands should work on git-bash for windows.

    Login or Signup to reply.
  2. Assuming you have many files that you want to fix scattered in the hierarchy, a solution with git filter-repo looks tedious. You can instead use a combination of git fast-export and git fast-import to modify file names in the whole history.

    git fast-export --no-data --all > exported
    

    Now delete the file entries containing a backslash:

    grep -v '^[DM] .*\' exported > fixed
    

    Instead of removing the files, you can also modify the file names. For example, to replace the backslash by a dash -, you could try this:

    sed -e '/^[DM] /s,\,-,g' < exported > fixed
    

    You may now investigate the difference between the two files to ensure that no commit messages were modified:

    diff -u exported fixed | less
    

    Now attempt to import the modified history:

    git fast-import < fixed
    

    This will stop with an error that tells you that the branches will not be modified because the old branch heads are not subsets of the new heads. If there are no other errors, you can now force the modification:

    git fast-import --force < fixed
    
    Login or Signup to reply.
  3. You fed bad input into filter-repo, based on a common but incorrect assumption about how git log works.

    Look at your own output:

    $ git log --format="reference" --name-status --diff-filter=A '*\*'
    "systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
    "systemd/system/multi-user.target.wants/snap-git\x2dfilter\x2drepo-7.mount"
    "systemd/system/snap-git\x2dfilter\x2drepo-7.mount"
    

    Let’s look at the first line as an example. If you were to store that in a file, which you pass to –paths-from-file, then git-filter-repo is going to be looking for a file named "systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount" to remove. You have no such file in your repository. Instead you have one named systemd/system/default.target.wants/snap-gitx2dfilterx2drepo-7.mount. (Note that I have removed both " characters and two of the characters.)

    The problem here is that you assumed git log would list filenames as-is, which it won’t do whenever there are special characters. You can often get around this by setting core.quotepath=false (this particularly helps when you have non-ascii characters), but even that is ignored when you have backslashes.

    Here’s something that might work better for you for generating the list of filenames to exclude:

    git log -z --all --name-only -m --pretty= '*\*' | tr '' 'n' | sort -u >/opt/git_repo_files_w_escape.txt
    

    but it assumes you do not have filenames with newline characters. (If you do have files with newline characters, though, then –paths-from-file won’t work for you.)

    Even simpler would be bypassing creating a list of files with bad names and just programatically removing them by pattern:

    git filter-repo --filename-callback 'return None if b'\' in filename else filename'
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search