Sorry, I should have perhaps put a disclaimer in my original comment. I work for a company called StorReduce and built our replication feature* (an intelligent, continuous "sync" effectively). We currently have a patent pending for our method, so I'm not sure if I can offer any real insight unfortunately.
I haven't looked at your project, but based on what you've said I agree the way you're doing it is conceptually as fast as it can be (massively parallel and leveraging metadata) whilst being a general purpose tool that "just works" and has no external dependencies or constraints.
I haven't looked at your project, but based on what you've said I agree the way you're doing it is conceptually as fast as it can be (massively parallel and leveraging metadata) whilst being a general purpose tool that "just works" and has no external dependencies or constraints.
* http://storreduce.com/blog/replication/