Sparkleshare and dvcs-autosync are tools to automatically commit your changes to git and keep them in sync with other repositories. Unlike git-annex, they don't store the file content on the side, but directly in the git repository. Great for small files, less good for big files.

Here's how to use the git-annex assistant to do the same thing, but even better!


First, get git-annex version 4.20130329 or newer.


Let's suppose you're delveloping a video game, written in C. You have source code, and some large game assets. You want to ensure the source code is stored in git -- that's what git's for! And you want to store the game assets in the git annex -- to avod bloating your git repos with possibly enormous files, but still version control them.

All you need to do is configure git-annex to treat your C files as small files. And treat any file larger than, say, 100kb as a large file that is stored in the annex.

git config annex.largefiles "largerthan=100kb and not (include=*.c or include=*.h)"

Now if you run git annex add, it will only add the large files to the annex. You can git add the small files directly to git.

Note that in order to use git add on the small files, your repository needs to be in indirect mode, rather than direct mode. If it's in direct mode, git add will fail. You can fix that:

git annex indirect

A less manual option is to run git annex assistant. It will automatically add the large files to the annex, and store the small files in git. It'll notice every time you modify a file, and immediately commit it, too. And sync it out to other repositories you configure using git annex webapp.


It's also possible to disable the use of the annex entirely, and just have the assistant always put every file into git, no matter its size:

git config annex.largefiles "exclude=*"
I think you probably meant at least version 4.20130323 ;-)
Comment by hands Sun Mar 31 13:30:34 2013
I meant 4.20130329
Comment by joey Sun Mar 31 14:50:35 2013

I just gave this feature a try, but it seems it doesn't work as expected or maybe I don't understand it:

~/annex/largefilestest % git init
~/annex/largefilestest (git)-[master] % git annex init "test repo"
~/annex/largefilestest (git)-[master] % git config annex.largefiles "not include=*.txt"

Now I copy two files to this directory and add both to the annex

~/annex/largefilestest (git)-[master] % ll
total 100
-rw-rw-r-- 1 tobru tobru 93709 Oct 19 16:14 dpkg-get-selections.txt
-rw-rw-r-- 1 tobru tobru  7256 Jan  6 15:52 x3400.jpg
~/annex/largefilestest (git)-[master] % git annex add .
add x3400.jpg (checksum...) ok
(Recording state in git...)
~/annex/largefilestest (git)-[master] % git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#       new file:   x3400.jpg
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       dpkg-get-selections.txt
~/annex/largefilestest (git)-[master] % ll
total 96
-rw-rw-r-- 1 tobru tobru 93709 Oct 19 16:14 dpkg-get-selections.txt
lrwxrwxrwx 1 tobru tobru   192 Jan  6 15:52 x3400.jpg -> .git/annex/objects/vf/QX/SHA256E-s7256--60e5b69ade5619e37f7fcaa964626da9c415959d861241aa13e2516fffc2dddf.jpg/SHA256E-s7256--60e5b69ade5619e37f7fcaa964626da9c415959d861241aa13e2516fffc2dddf.jpg

So the picture is added to the annex as expected. But the .txt file is not added to git. Do I have to manually add this to git? And why is the picture seen as new file by git?

The second question could be answered by: "run git annex sync". Is this correct? Because after running this command, git does not see this file as a new file anymore:

~/annex/largefilestest (git)-[master] % git annex sync
commit  
[master (root-commit) a0afb14] git-annex automatic sync
 1 file changed, 1 insertion(+)
 create mode 120000 x3400.jpg
ok
git-annex: no branch is checked out
~/annex/largefilestest (git)-[master] % git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       dpkg-get-selections.txt
nothing added to commit but untracked files present (use "git add" to track)
Comment by Tobias Sun Apr 14 09:04:55 2013

Like it says in the tip, git annex add will add the large files to git. You can add the small files with git add; git-annex won't do that for you.

To automatically add both sorts of files, you can use the git annex watch or git annex assistant daemons. The latter also keeps files in sync between repositories automatically.

(Why did the picture show up as a new file in git? Because you hadn't committed it. This is the same as when you git add a file; it's only staged in the index; git status will show it is new until you git commit)

Comment by joey Sun Apr 14 14:37:50 2013
Does annex.largefiles support mimetypes? F.e. git config annex.largefiles "not mimetype=text/plain"
Comment by Tobias Wed May 1 16:37:33 2013
I was wondering if the annex.largefiles feature was compatible with direct mode?
Comment by binet Mon Sep 16 18:50:48 2013

annex.largefiles does not support mime types. I agree it would be a useful addition.

annex.largefiles can be used with direct mode. I would only recommending using it this way using the assistant, which will keep straight which files are which and commit them appropriately.

Comment by joeyh.name Thu Sep 19 14:03:29 2013

I've tried this with version 5.20131130, but my files disappear if I modify them on the remote end.

My setup: - A local repository, direct mode, client group, annex.largefiles "exclude=.txt" - A remote one, also direct mode, backup group, annex.largefiles "exclude=.txt" Both are running the assistant.

If I create a .txt file locally, it gets committed and pushed to the remote as described. But, if I then modify that file on the remote end, the file gets deleted from both repositories. Also, if I create a file on the remote end, it's pushed to the local one (according to the log) but it never appears in the directory.

Changing the remote from 'backup' to 'client' group doesn't seem to make any difference.

Is there a 'best practice' on using git-annex like SparkleShare? I mean, syncing changes on all repositories but keeping a history of changes in git.

Thanks!

Comment by Marc Sun Dec 8 08:16:37 2013
After further testing, it seems that the setup I wanted works when both repos are set as indirect, instead of direct as comment 7 recommends. With both repos in indirect mode, the changes are propagated correctly and the files not selected by annex.largefiles are kept in git.
Comment by Marc Sun Dec 8 16:41:41 2013
The behavior is direct mode is a bug: when syncing a direct repository, git annex delete non annexed new git files. Hopefully it will be fixed soon.
Comment by joeyh.name Thu Dec 12 13:35:28 2013
The abovementioned bug is fixed in git, and will be in a release tomorrow.
Comment by joeyh.name Thu Dec 12 15:58:53 2013
Comments on this page are closed.