Which is similarly terrible; it should be a flag on the `branch` command, not `s...

bacon_waffle · on July 12, 2020

Maybe this depends on your mental model of what a git checkout is? To me, "git checkout X" means: update the working directory to reflect X and update HEAD to point at X. (edit: and HEAD means "the branch to update when a new commit is made"). I use "git checkout" a lot more often than "git branch" and "git checkout -b" combined. So, to me, the fact that X is a to-be-created branch is secondary to the primary "checkout" operation.

That said, I do vaguely recall this seeming weird at first.

smichel17 · on July 13, 2020

I'm going to discuss mental models in a separate comment because they're actually more interesting and this comment is getting too long.

---Succinct argument---

What happens if you remove the flag? Put differently, how much does the flag change the semantics?

`git branch newbranch` will gracefully fall back to creating a new branch at HEAD.

`git checkout newbranch` will fail (error: pathspec 'newbranch' did not match any file(s) known to git), because the regular version of the command involves operating on something that already exists.

---Less succinct further argument---

checkout's `-B` variant. Here's how the man page explains it:

> If -B is given, <new_branch> is created if it doesn’t exist; otherwise, it is reset. This is the transactional equivalent of > > $ git branch -f <branch> [<start point>] > $ git checkout <branch>

If it were a flag on branch, you wouldn't need a separate flag, just `git branch -fs newbranch` (-s for --switch).

Actually, looking through the man page for `checkout` now, I see 4 other options I didn't know about (-t|--track, --no-track, --guess, --no-guess), apparently for the sole purpose of being used with -b.

---

I don't think I thought of this argument when I originally chatted in #git-devel, maybe I'll give it another try.

smichel17 · on July 13, 2020

> HEAD means "the branch to update when a new commit is made

Thank you for the edit; this is where our mental models diverge. Yours is the technically correct version, but when I'm working, I introduce a small abstraction on top. I've never written this particular part out explicitly, so please bear with me and feel free to suggest better names or phrasing.

I'll call it, "the checked-out branch" (or other <tree-ish>, but let's leave out detached HEAD for simplicity). It's like your model, but also includes the state of the repo at that commit. This is a really subtle difference, but the key is that checkout is one atomic operation.

I think the easiest way to explain is by analogy[0] to editing a regular text file. You've got the file on disk, and the version in your editor's buffer (working tree), and every time you save, you're making a commit (with `--amend`, on most filesystems). When you decide to edit a different file, and open it, technically there's two independent operations -- switch which file handle you're writing to, and replace the contents of the buffer with the contents of the new file -- but I seldom think of it this way (and I can't think of an editor that separates them).

This is also why I don't like overloading with `git checkout <commit> -- <files>...`, which does not update the HEAD. It shatters the "open file" analogy; it's like changing the contents of your buffer without changing which file you're writing to. In my mind, that's a totally separate function (typically accomplished by opening the other file separately and then copying its contents into the buffer, an inefficient operation that I'm grateful git provides a better alternative to).

Aside, I also prefer to think of each branch (not commit!) as having its own persistent working tree and staging area; I'd prefer not to have to push and pop a stash each time I check out a new branch if I already had some work in progress. `git worktree` comes close, but lacks the ability to navigate between worktrees but stay in the same directory relative to the root, like `checkout` does. Also you have to initialize other worktrees with a different root, which pollutes the file system or forces you to use an extra level of nesting.

It's been on my list of projects for a while to write a git porcelain that works the way I'd like, but it's pretty low priority (despite my complaints, git works pretty well for me), so I doubt I'll get to it for a few years.

[0] Come to think of it, maybe "analogy" is not the best word. It's more like, I prefer to view git as an extension of my filesystem, so I want a consistent mental model across the whole thing.

bacon_waffle · on July 16, 2020

Thanks for the post! I should mention that I'm not a git expert by any stretch, just a git user who's recently been herding some folks at work from perforce to git (so have brushed up on git explanations/internals).

It seems like the conflict is that there actually are some separate things going on "under the hood", and you're not satisfied with the way that git has combined them? To be explicit, some distinct steps we're discussing are:

1) Update the working tree to match the tree-ish

2) Create a new branch

3) Update HEAD to point at the branch (or other tree-ish)

As is currently implemented, "git checkout somebranch" does 1 and 3, "git checkout somebranch -- <files>" does just 1, "git branch somebranch" does just 2, and "git checkout -b newbranch" does 2 and 3. AFAIK, there's not a "git branch" argument that causes it to update HEAD, but the standard version of "git checkout" does exactly that. From the perspective of a new git user, maybe "git branch" is the obvious command to look at for making and using a new branch, but I think "git checkout" is the obvious command for using a branch.

So, perhaps "checkout" could have a better name; maybe "use" or "work" instead?

I must admit that I don't understand what you mean by "state of the repo at that commit" - is that related to the idea of each branch having a persistent working tree and staging area? When I run in to a situation where the second would be relevant, I tend to do "git commit -am WIP" then on return to that branch, "git reset HEAD~1". It very rarely happens that I'm in the middle of composing a commit (staging things) but need to switch to a different branch in the same project, so it doesn't really matter that the staging area and working directory all got munged together.

smichel17 · on July 25, 2020

I've been quite busy; hopefully you'll see this. Like above, I agree that you're technically correct.

First, I would like to re-order those steps (and I am curious whether you intended the order to be meaningful). Then, I'll try to explain how they, while technically correct (again) to the best of my knowledge, don't match my mental model (particularly "update working tree", which is not part of "checkout" in smichel17-land). Yes — this is related to the "persistent working tree" (or maybe "working tree as a concept that doesn't exist in my mental model" would be better phrasing — but I'm getting ahead of myself).

---

Note that without flags, you'd have to "git branch" first, then "git checkout". Also, it's easier for me to think about if each command only performs consecutive steps, in order. Fortunately, we can achieve that:

1) Create a new branch

2) Update HEAD to point at the branch (or other tree-ish)

3) Update the working tree to match the tree-ish

"git branch somebranch" does just 1, "git checkout -b newbranch" does 1 and 2, "git checkout somebranch" does 2 and 3, "git checkout somebranch -- <files>" does just 3. I think this helps clarify my issues with both -b and --. They both change which step "checkout" starts on (and "-b" changes the ending, too!).

By analogy: if these commands are like functions, adding a step afterward is like a flag that modifies the output, while adding a step before is a flag that modifies the expected input. Sure, you could organize your code around the outputs it produces, and sometimes we do that (eg, serialization/parsing you have things like JSON.stringify, String.fromInt, different constructors for classes, etc). But typically it makes more sense to group based on what you want to do with a given object/class/data-type — it's nice to be able to answer, "I have an X, what are all the things I can do with it?"

Maybe that's stretching the analogy a little. But it ties in with my original comment mentioning the "primary action", which I guess I could rephrase as "first action in the chain".

---

What's the primary action of "checkout"? Well, "updating" the HEAD. So here's where we get back to abstractions / mental models — I would say, moving the HEAD. But first, let's take a short detour.

What's a commit? Technically, it's a diff and some metadata (author, parent commit(s), ...) with a deterministic name. But in terms of actual use, it's a snapshot of the repository at a certain point in time. (Or, if you'll permit a little snark, a version, as in "version control".) Zoom out to the whole repository, and you can visualize it as a tree of commits, ordered along the axis of time. Visualization:

    git log --graph --format="%C(yellow)%h%Cgreen%d"

That includes branches, so what about them? Technically, they're tags. But in my mental model, they're boxes which contain mutable state that I might want to commit. They're the buffer in my text editor, where I make changes before saving to disk. As a crude visualization of making a commit, I choose a subset of those changes and put them in the bottom of the box (stage them), then chop the box in half. The bottom becomes the new commit, and the branch remains on top, still holding any unstaged changes.

Finally, back to the HEAD (our detour is over). Technically, it's just another tag. But to me, it's a camera, through which I look at a given box (or snapshot). It's the location where I am. It's $PWD.

To bring it all together:

- Branches sit on top of a commit and stay there.

- When I check out a different branch, they remain on top of the same commit, still holding the unsaved changes in their box. This is what I meant by "persistent working tree".

- Checking out a branch, conceptually, is just moving my view (HEAD) to a different branch. It doesn't involve changing files. It's the equivalent of "cd".

So this is why it barely makes sense for me to talk about "the" working tree. I have a bunch of boxes/folders/"working trees", called branches. That I have to copy them to "the working tree" in order to edit them is an implementation detail. Step 3 above (your 1, "update the working tree") doesn't exist in my mental model. You just move the HEAD. One atomic operation.

"-b" and "--" both break my mental model because they add side effects to an operation that's otherwise just "look around." "-b" isn't quite as bad, because I could imagine a flag on "cd" that makes it run "mkdir -p" first ("Go here, even if it doesn't exist yet."), but it still makes more sense as a flag on "branch" ("Create a new box, and look at it").

> So, perhaps "checkout" could have a better name; maybe "use" or "work" instead?

I think the new "switch" fits pretty well. And, "git branch [-s|--switch]" aren't taken yet :)

Aside, "git reset" is about moving which commit a branch sits on top of, and its various hardness flags are about what to do with the contents of both the box and the commits it was sitting on top of. There used to not be a command for "copy some stuff into my box from a different box or commit", so I had to use checkout, but now there's "git restore".

bacon_waffle · on July 27, 2020

Thanks! No, there wasn't any reason behind that ordering, as far as I remember.

> What's the primary action of "checkout"? Well, "updating" the HEAD.

This might be where we diverge - I'd say that the primary action of "git checkout" is to update the working tree, secondary action is to update HEAD.

> What's a commit? Technically, it's a diff and some metadata (author, parent commit(s), ...) with a deterministic name. But in terms of actual use, it's a snapshot of the repository at a certain point in time.

This isn't actually the case; git commits contain a "tree", which is not a diff. The tree is a table of file paths, attributes, and hashes that each identify a "blob" - the (compressed) contents of a file. The commit represents a state of the committed files (and links); to me "snapshot of the repository" would include things like the state of the branches in the repo, HEAD, and commits that are present but outside the history of the commit in question.

---

Thanks for the explanation, I think I understand what you mean and like the idea. Maybe this is what you alluded to a few posts up, but it seems like a new subcommand could roll up the working directory and staging area in to a temporary commit or two, do the normal "git checkout otherbranch", then unroll any temporary commit(s) that were present in otherbranch.

At a previous employer, I used mercurial including a feature (perhaps provided by an extension, I can't find it at the moment) that allowed a commit to be marked as local-only, so it wouldn't be pushed. Something like this could probably be done with git hooks to prevent the temporary working/staging commits being shared unintentionally.