Well, considering that the long term idea is to have AGI, general intelligence, it seems that the goal as also to only have a single product in the end.
There may be different ways to access it, but the product is always the same.
Not easy to make the connection to Palmer being fired here. The case was decided much later and it equally involve John Carmack, who stayed at Meta much longer, until he left out of his own accord.
Generally, the VAE is mapping from a small latent space to a large image space. This means that there must be a large number of images for which no reverse mapping exists.
It should be possible to identify images that have not been generate by the VAE since they are not part of the set images that the VAE can generate. The other way round is a bit more difficult as there may be images that can be mapped to the latent space and back without loss but have been generated in another way
This logic has a key flaw: just the fact that the size of the space is different doesnt mean that every representable thing in the larger space is a thing we care about. E.g. a person with three hands may not have a representation in the smaller space, but we would never care about that. The actual question is: what is the difference in the amount of information encoded in a large image vs the small latent space and compare that to the difference in information between a large image and a small image. If those two differences are close enough together, being able to determine a legitimate difference between SD generated vs not becomes near impossible.
The logic is still the same. If the VAE is trained so that it is biased toward human preference, then the probability of false positives in real world images would increase.
From what I gather, this project started out as an implementation of a code-interpreter using a local LLM. Basically your instructions are used to write code by the LLM, which is then executed. The idea is that it can be much more powerful having access to your native systems shell instead of only sandboxed python.
In the meantime, it seems that also models with vision capability have been added, that can be used to access GUI based applications, not only the shell.
It's a very exciting concept that lives in a space where open source software should have a significant advantage due to its transparency. (Or would you give a black box device access to everything on your computer?).
It also seems to one of several emerging projects that try to sketch out a path for ideas of how LLMs could change the way we interact with computers.
Is it uncommon to dislike the latent smell associated with cooking? I love cooking, but I go as far as changing my clothes after it to get rid of the smell. I could never imagine having an open kitchen.
There may be different ways to access it, but the product is always the same.