Changing your SVN repository address in a git-svn setup

I use git-svn to interoperate with out Subversion repository at Sourceforge. I did the original checkout via HTTP and continued working like this until Sourceforge's recent site-wide SVN upgrade (from version 1.3 to 1.5). After the upgrade, I could no longer do Subversion commits (that is, git svn dcommit failed).

I had two options; I could:

  • reclone the SVN repository via HTTPS, or
  • hack the git-svn metadata so that git-svn would access the SVN repo via HTTPS.

I was loathe to take the first route given how long it takes to clone the entire history of an SVN repository, so I opted for the second choice.

The hack

To hack the git-svn metadata, we need to know that git-svn keeps its meta-data in three locations:

  1. the directory .git/svn (in your git repository) which  contains internal git-svn data; if you delete this directory, git-svn will automatically rebuild its contents,
  2. various refs such as refs/remotes/trunk and refs/remotes/trunk@1234,
  3. per-commit SVN identification strings, such as git-svn-id: http://translate.svn.sourceforge.net/svnroot/translate/src/trunk/Pootle@... 54714841-351b-0410-a198-e36a94b762f5.

To perform the hack, the following things must happen:

  1. the commit messages must all be rewritten so that URLs in the git-svn-id: strings which start with http, are replaced with URLs starting with https,
  2. a changed commit message will lead to a changed SHA1 value for the commit, which means that all refs (such as refs/remotes/trunk) which pointed to the old commits have to be updated to point to the new commits,
  3. the git-svn data under .git/svn must be updated,
  4. the git-svn entries in .git/config must be updated.

Steps 1 and 2 are performed by the magical tool git-filter-branch, while step 3 is performed by a simple rm -rf .git/svn. With git-filter-branch you can rewrite a git repository in interesting ways, including:

  1. removing all commits made by a particular author,
  2. removing certain files from the repository,
  3. modifying commit messages.

Of interest is the last of these points. The git-filter-branch command line for modifying commit messages has the following form:

git filter-branch --msg-filter <text filter command> <refs to rewrite>

The flag --msg-filter tells git-filter-branch to enumerate commit messages; it passes every commit message to <text filter command> via stdin and takes the stdout of <text filter command> to be the new commit message. <refs to rewrite> is a list of git refs whose histories are to be modified.

To replace "http" with "https" in every string resembling "git-svn-id: http://translate.svn.sourceforge.net/svnroot/translate/src/trunk/Pootle@... 54714841-351b-0410-a198-e36a94b762f5" a sed command suffices:

sed "s/git-svn-id: http/git-svn-id: https/g

I wanted to rewrite the histories of every reference in my git repository. Therefore, <refs to rewrite> contained all of the refs in my repository.  Traditionally, git maintains refs files under .git/refs; each files is 41 bytes long (a SHA1 values followed by a newline). If  you run git-gc, it creates a file called .git/packed-refs containing all of your current references and removes the files under .git/refs. Thus, a quick way to get hold of all your refs is to run git-gc and then to extract the refs from .git/packed-refs. I used a combination of awk and grep to do the extraction:

cat .git/packed-refs | awk '// {print $2}' | grep -v 'pack-refs'

cat outputs the contents of .git/packed-refs to stdout; awk prints the second column of each line; grep removes the first line.

Putting this all together gives:

git filter-branch --msg-filter 'sed "s/git-svn-id: http/git-svn-id: https/g' \
$(cat .git/packed-refs | awk '// {print $2}' | grep -v 'pack-refs')

Finally, to get git-svn to recreate its internal data, I simply executed rm -rf .git/svn. The next time git svn rebase is executed, git-svn will rebuild its internal data.

The summary

  1. Back up your git repo. If you make a mistake with repository rewriting you'll be in for a lot of fun.
  2. git-gc
  3. git filter-branch --msg-filter 'sed "s/git-svn-id: http/git-svn-id: https/g' $(cat .git/packed-refs | awk '// {print $2}' | grep -v 'pack-refs')
  4. rm -rf .git/svn
  5. edit .git/config and change "http" in all the git-svn URLs to "https"
  6. git svn rebase (to update your repo and to let the git-svn data be rebuilt)