By default grep is going through all of the .git directly, which is the part `git grep` filters out. `ag` also filters them out (and is only slightly slower than git grep by default, with all the colors and stuff), or you can tell grep to only check relevant files with e.g. `grep $PATTERN $(git ls-tree --full-tree --name-only -r HEAD)`.
On my machine, using postgres's repository, I get the following:
> time git grep foo > /dev/null
0.22s user 0.25s system 151% cpu 0.312 total
> time ag foo > /dev/null
0.85s user 0.19s system 174% cpu 0.596 total
> time grep foo $(git ls-tree --full-tree --name-only -r HEAD) > /dev/null
0.13s user 0.10s system 93% cpu 0.255 total
grep's faster than git grep. In fact, grep is already as fast as git grep just ignoring .git:
> time grep foo -r . --exclude-dir=.git > /dev/null
0.15s user 0.16s system 91% cpu 0.338 total
You need to take the effect of the page cache into account. Since you are not clearing the page cache after each test, the test after it benefits from the contents. So running 'git grep' first disadvantages it, compared to everything else.
I ran a test on a large repository and here are my results. The repository was Hadoop, and is available from git://github.com/apache/hadoop-common.git.
This is because the data is all in the page cache at that point, so we're not actually accessing the disk.
I was curious about the true source of the speedup, and so I checked the output of the 'perf' tool. git had 1,922 CPU migrations, whereas grep had 52. Following up on this, you can see that git is spawning a bunch of threads, whereas grep only has one thread.
On my machine, using postgres's repository, I get the following:
grep's faster than git grep. In fact, grep is already as fast as git grep just ignoring .git: