Using git clean filters to remove local path information.

Error messages in many programming languages (including R) often have the full paths to files included in the error message. (e.g. “/Users/jhester/projects/pkgname/R/pkg.R”). This feature is very helpful when working with a package locally as it makes it easy to find exactly what file has an issue. However if these errors are included in documentation examples or in check output the local directory information is extraneous and possibly sensitive.

These output files could be modified by hand before committing them, e.g. changing the above path to “…/R/pkg.R”. A script could be written to do it automatically as well, but this still requires diligence to remember to run it before committing files to the repository.

Instead, using git clean and smudge filters we can have git automatically sanitize our files before they are committed to the repository.

Setting up a clean filter

First we need to write a script that takes input from standard input and writes the cleaned output to standard output. In our case we want to substitute any text that matches the current directory with ..., which will remove the local directory information. A simple perl script which does this is below (sanitize.pl)

#!/usr/bin/env perl

use Cwd;

my $cwd = getcwd();

while(<>) {
  s{$cwd}{...}g;
  print;
}

Once we have this script we next need to tell git what file types we want to run our filter on by adding it to a .gitattributes file. In this case we are defining the filter for both markdown and HTML files.

*.md filter=sanitize-paths
*.html filter=sanitize-paths

If we want to use this only on the current repository it should be defined in a .gitattributes file in the repository root. If however we want to define this globally by setting the git config option core.attributeFile (by default this will be $HOME/.config/git/attributes).

We then need to define the sanitize-paths filter above to use our sanitize.pl file. In this case we are not using a smudge filter, so we do not need to set it explicitly. If you only want this to be set for the local repository remove --global from the command.

git config --global filter.sanitize-paths.clean sanitize.pl

With that we are done, Git will now automatically remove the path information before we commit markdown and html files to any of our repositories!

Avatar
Jim Hester
Software Engineer

I’m a Senior Software Engineer at Netflix and R package developer.

Related