The VSCode GitLab extension now supports getting code completions from FauxPilot

john_cogs · on Oct 15, 2022

GitLab team member here.

This work was produced by our AI Assist SEG (Single-Engineer Group)[0]. The engineer behind this feature recently uploaded a quick update about this work and other things they are working on to YouTube[1].

[0] - https://about.gitlab.com/handbook/engineering/incubation/ai-...

[1] - https://www.youtube.com/watch?v=VYl0dg8xyeE

remram · on Oct 15, 2022

Why do you call this a group? Why don't you say "one of our engineers did this"? I read the linked article [1] and that seems to be the accurate situation: not a group of people including one engineer, not one engineer at a time on a rotation, but literally one person. In what way is that a group?

[1]: https://about.gitlab.com/company/team/structure/#single-engi...

john_cogs · on Oct 15, 2022

Great question. I'm not entirely clear on the origin of the name and it would probably be hard for me to find the folks behind this decision on Friday evening/Saturday so I'll share my interpretation.

At GitLab, we have a clearly defined organizational structure. Within that stucture, we have Product Groups[0] which are groups of people aligned around a specific category. The name "Single-Engineer Groups" reflects that this single engineer owns the category which they're focusing on.

I'll be sure to surface your question to the leader of our Incubation Engineering org. Thanks.

[0] - https://about.gitlab.com/company/team/structure/#product-gro...

cercatrova · on Oct 15, 2022

I imagine it's similar to working groups [0], but keeping the term the same when only one person is in the working group, hence single engineer (working) group. Basically, parse `group` as in `working group` as a single term of art, rather than literally meaning a colloquial group, ie 2 or more individuals.

[0] https://en.wikipedia.org/wiki/Working_group

remram · on Oct 15, 2022

I have nothing against a working group that happens to be one person at the moment, but this SEG is defined by being a single person. It would cease being a SEG if it got more members. Hence don't call it a group...

sytse · on Oct 15, 2022

It is different since working groups are not full time roles.

joshxyz · on Oct 15, 2022

jesus christ john, i cant wrap my head around the concept of single-person group!! haha

ebonet · on Oct 15, 2022

If it makes it easier, our title is Incubation Engineering, because it reflects better what we do: finding new areas to explore and kickstart, searching for new markets, or how to expand existing markets. We need to choose initiatives within our scope (mine is MLOps) that fit the scope of a person in a short period of time, and test the idea by putting stuff in production. Ideas that are too uncertain or risky to dedicate a full team on, but would make sense for a single person initially.

Other recent outputs of the team are CloudSeed[0], improved GitLab pages setup [1] and Code Reviews for Jupyter Notebooks [2]

- [0] CloudSeed https://about.gitlab.com/handbook/engineering/incubation/clo... - [1] https://about.gitlab.com/releases/2022/09/22/gitlab-15-4-rel... - [2] https://docs.gitlab.com/ee/user/project/repository/jupyter_n...

sitkack · on Oct 15, 2022

Gitlab is 600Kloc of JS and 1.8Mloc of Ruby. Of course a SEG would make sense to them.

westurner · on Oct 15, 2022

OpenAPI, tests, and {Ruby on Rails w/ Script.aculo.us built-in back in the day, JS, and rewrite in {Rust, Go,}}? There's dokku-scheduler-kubernetes; and Gitea, a fork of Gogs, which is a github clone in Go; but Gitea doesn't do inbound email to Issues, Service Desk Issues, or (Drone,) CI with deploy to k8s revid.staging and production DNS domains w/ Ingress.

Traubenfuchs · on Oct 15, 2022

> single engineer group

oxymoron

falcor84 · on Oct 15, 2022

The definitions I see for group all use "a number",e.g. "a number of people or things that are located, gathered, or classed together". So if you're willing to accept 1 as a possible value for a number, it's not an oxymoron.

remram · on Oct 15, 2022

Is 0 people a group by the unusual definition you use?

falcor84 · on Oct 15, 2022

It is, at least in principle. As I see it, there's nothing that immediately discontinues a group when the last person leaves; it only makes it inactive. But a group could also become inactive in many other ways (e.g. deciding not to meet for a long time). It's only if whatever organization that found a benefit at having that group, no longer does and decides to discontinue it, then it stops being a group.

remram · on Oct 16, 2022

What a weird world you live in. What about "the organization"? What if it has 0 people?

falcor84 · on Oct 16, 2022

The world I live in is one which accepts both usual and unusual circumstances.

In the extreme case, consider a small company whose few owners and employees suddenly pass away (e.g. in a plane crash). The fact that the company suddenly has 0 employees and even 0 shareholders does not immediately terminate the company. It still continues to exist as a business and legal entity, until the government / lawyers either find new shareholders (typically heirs) who are willing to run the company, or appoint a director to initiate foreclosure.

thanhhaimai · on Oct 15, 2022

A single item list is still a list.

lastdong · on Oct 15, 2022

1+1=1

jjgreen · on Oct 15, 2022

0 × 0 = 0

lbotos · on Oct 15, 2022

John nailed it. A single engineer group is something being explored by one person, but could eventually represent a "product group" in the future. See https://about.gitlab.com/direction/#single-engineer-groups for more things that could be explored by such a person.

If you think there is a better name to capture that I'm sure it would be considered.

pookeh · on Oct 15, 2022

Tiger (team)

sidlls · on Oct 15, 2022

I’m always suspicious of “one engineer did this” projects. There are probably quite a few engineers, product people and other employees who are simply unattributed for this.

moyix · on Oct 15, 2022

Well, on the FauxPilot side I did do the initial implementation personally (over the course of a few weeks this summer) – but of course I'm building on the work the Salesforce team put into training the CodeGen models, and the work NVIDIA did in creating the FasterTransformer inference library and Triton inference server that FauxPilot uses.

I don't know how things are organized within GitLab, but it's pretty plausible to me that one engineer could put together the completion portion of the VSCode extension – you can see the code here: https://gitlab.com/gitlab-org/gitlab-vscode-extension/-/blob...

Fred has also been providing pull requests to improve FauxPilot, which I very much appreciate! :)

ebonet · on Oct 15, 2022

Another SEG here (our title is Incubation Engineer now, my area is MLOps). It is true that we build on top of what others did. What we mean by single engineer group is that is a one-person-team: we are responsible from the end-to-end project, from defining the vision, to talking to customers, to implementing the solution, and delivering a product. This is a relative new team (a bit more that one year). https://about.gitlab.com/handbook/engineering/incubation/

stefan_ · on Oct 15, 2022

Well, the guy that wrote this fibonacci function that the model reproduced verbatim from some "learn to program" site sure went unattributed. That's AI for you.

ksaj · on Oct 15, 2022

To try it out, I wrote a Hello World function, and it literally named the tutorial it was taken from. (ie: Hello, Tutorialname!) So it definitely happens often enough, although usually it is probably harder to tell when straight up plagiarism has occurred.

RobotToaster · on Oct 15, 2022

"one of our engineers did this" isn't a technical sounding acronym.

remram · on Oct 15, 2022

Does it have to be an acronym? You could totally say "Fred de Gier at GitLab is working on this".

wharfjumper · on Oct 15, 2022

Classic off by one error.

dang · on Oct 15, 2022

cercatrova · on Oct 15, 2022

Good to see the OSSification (no pun intended) of prorpietary VSCode parts such as Copilot and their extension server.

Havoc · on Oct 15, 2022

Extension server has a oss now?

commoner · on Oct 15, 2022

Yes, it's called Open VSX and it's developed by the Eclipse Foundation.

- Open VSX: https://open-vsx.org

- Source: https://github.com/eclipse/openvsx

VSCodium and Eclipse Theia use Open VSX by default.

- VSCodium: https://github.com/VSCodium/vscodium#extensions-and-the-mark...

- Eclipse: https://www.eclipse.org/community/eclipse_newsletter/2020/ma...

nextaccountic · on Oct 15, 2022

Open VSX allows anyone to upload an extension there with the same name and description from the VSCode marketplace, but silently change code or make new releases, maybe introducing misfeatures or malicious code. Users typically don't notice because they think they are installing "the same" extension from the VSCode marketplace.

https://github.com/hediet/vscode-drawio/issues/141

> Hi ;)

> I never published a version v999.0! It seems like you are using the unofficial open vsx marketplace (where, apparently, anyone can upload anything). You can find an issue here in this repository about it.

> Unfortunately, someone uploaded the extension in that version which blocks any further updates with that name.

> For now I believe in Microsofts vision. I don't think a secondary marketplace is good for the community - It just causes confusions like this.

> If you setup a github action that automatically publishes this extension to open vsx, please open a PR! ;)

The established practice of having random individuals set up ad-hoc mirrors of VSCode extensions is a serious security issue.

If Open VSX wants to mirror VSCode extensions, that's okay - but they should do so with an automated process that mirrors ALL extensions and do not allow for random people to silently change the code of an extension with no clear indication to people installing it.

If, however, someone want to copy the code of an existing VSCode extension, change some things and upload it to Open VSX, that's super okay too (and in the spirit of open source), but please fork it and clearly indicate in the description that the extension is a fork, linking to the source code of the original extension. The currently situation is unacceptable.

nextaccountic · on Oct 15, 2022

And I want to add that it's Microsoft that is in the wrong here. Their policy of only allowing the usage of their package repository if you are using their proprietary build of VSCode is absurd. It's as if npm disallowed the use of their repositories by yarn and pnpm. We shouldn't tolerate this behavior, specially not from a company that claims to "love open source".

But, Open VSX could and should do more for people to verify the provenance of their packages. There are many ways to do this. Perhaps one way is to have two kinds of packages (readily apparent in the description): one automatically imported from the VSCode marketplace (and guaranteed to match the upstream package exactly), and another kind published specifically for Open VSX.

Right now it seems to be a better security practice to simply ignore the VSCode marketplace terms of use and use it anyway on open source builds (either Code - OSS or VS Codium), instead of using Open VSX. And that's a shitty situation to be in.

Havoc · on Oct 15, 2022

Thank you!

speedgoose · on Oct 15, 2022

Shouldn’t this be in a separate extension?

eddyg · on Oct 15, 2022

Came here to say this.

I’d much rather see effort being put into other areas of the GitLab Workflow extension (like being able to approve / view approvals of MRs).

avip3d · on Oct 15, 2022

That would make sense... but them most people wouldn't install the base gitlab extension.

syntaxing · on Oct 15, 2022

This is super awesome, too bad a A6000 cost $4500 or I would try this out myself.

moyix · on Oct 15, 2022

The smaller models run on smaller GPUs too! :) You can see how much VRAM is required for various models in the Documentation:

https://github.com/moyix/fauxpilot/blob/main/documentation/s...

And we have an initial GPU support matrix page here:

https://github.com/moyix/fauxpilot/wiki/GPU-Support-Matrix

A planned feature is to implement INT8 and INT4 support for the CodeGen models, which would let the models run with much less VRAM (~2x smaller for INT8 and ~4x for INT4) :)

daviding · on Oct 15, 2022

The 7GB runs great on a 3080Ti. I am getting a lot of 'ValueError: Max tokens + prompt length' errors with larger files. Can this Gitlab client also replace the vocab.bpe and tokenizer.json config like Copilots? Thanks for your work on Fauxpilot, really enjoying playing with it.

moyix · on Oct 15, 2022

I believe right now the VSCode extension just passes along the entire file up to your cursor [1] rather than trying to figure out how much will fit into the context limit – it's definitely still very early stages :)

It would be pretty simple to run the contents through the tokenizer using e.g. this JS lib that wraps Huggingface Tokenizers [2] and then keep only the last (2048-requested_tokens) tokens in the prompt. If they don't get to it first I may try to throw this together soon.

[1] https://gitlab.com/gitlab-org/gitlab-vscode-extension/-/blob...

[2] https://www.npmjs.com/package/tokenizers

daviding · on Oct 15, 2022

Understood - thanks again. (plus note to self; have individual source files with less in them ;)

Namidairo · on Oct 15, 2022

> A planned feature is to implement INT8 and INT4 support for the CodeGen models, which would let the models run with much less VRAM (~2x smaller for INT8 and ~4x for INT4) :)

That would certainly put the larger models within reach of most users. Part of my reluctance to deploy this myself was I'd have to either redline the VRAM on a 8GB card or choose the smallest model.

Would this require a retrain of the models?

moyix · on Oct 15, 2022

It shouldn't require retraining, nope. I believe for INT4 there is a small adapter layer added that needs to be trained, but it is small and wouldn't require much data or computation to do so. Will know more once we actually start implementing :)

KaoruAoiShiho · on Oct 15, 2022

Ahh, 3090/4090 conspicuously missing, those affordable 24gbs are very common in the ML community, please add them to the matrix.

moyix · on Oct 15, 2022

I don't have any to test on unfortunately! But it's a wiki; if you get some of the models running on those (it should be as easy as running ./setup.sh) please add a line saying that it works!

teruakohatu · on Oct 15, 2022

Two 3090 GPUs could probably handle it, making the cost slightly cheaper.

KaoruAoiShiho · on Oct 15, 2022

Is it possible to do 2x 4090s or is 3090s the only option putting 2 together?

moyix · on Oct 15, 2022

Yes, you can do 2x4090s as well. NVLink is not required (though it will make things a bit faster).

KaoruAoiShiho · on Oct 15, 2022

Sweet great to hear.

daviding · on Oct 15, 2022

This seems to work ok on a 3080Ti, even under WSL2. Nice! For larger files I'm getting errors for maximum token + prompt length - is there an easy way to tweak these limits for the client? I think the Copilot client needed some overrides for Fauxpilot for this, so hoping Gitlab client has that too.

ddevault · on Oct 15, 2022

You'd think that GitLab would pause from chasing after GitHub's tail for at least long enough to question whether or not these features are ethical.

doctoboggan · on Oct 15, 2022

I know there seem to be a few different models to choose from, does anyone have a resource that collects them and shows the strengths and weaknesses of the various options? I’d be particularly interested in models optimized for apple silicon.

Also, how do they compare to copilot?

Vetch · on Oct 15, 2022

Unfortunately, Copilot is a lot more capable. Most important is that it works with many more languages out of the box, is continuously updated, has more mathematical plus scientific knowledge and is better at understanding your comments.

As for which to use, you'd want to take the largest model that can you can fine-tune (not necessarily on your machine) and fits comfortably in your machine. The small models aren't as flexible and are basically just a cleverer autocomplete. Copilot is a lot smarter than that and smarter than all models I've tested up to 6B. It can often cleverly work out a lot of local patterns for what you're doing and this is true surprisingly often for novel languages, math, understanding graph notation, categorization policies and much, much more.

I'd gone in hoping I could remove my dependence on Copilot but left disappointed. The small models up to 6B just don't compare. I don't have the resources to run the 13B model which is probably behind anyways due to not having trained as long on as much as copilot (which is continuously updated). I also suspect Copilot has a great deal of hand-coded hacks and caching tricks to improve the user experience.

moyix · on Oct 15, 2022

Yep, I definitely agree that the 6B and below models are worse than Copilot. The 16B ones are pretty good! But IMO still undertrained compared to Copilot, and of course much less accessible (though see elsewhere in this comment section; I think INT8 and even INT4 should be doable – this won't help much with inference latency but it should help most people fit the 16B model locally).

I have high hopes for the BigCode project too; they will get to take advantage of a lot of things that have been learned about how to train code models effectively.

One last note – I think that many of the improvements to Copilot since its release are attributable to "prompt engineering" and client-side smarts about what context should be included; I'm not certain, but I can believe the underlying Codex model used hasn't changed much (if at all) since code-davinci-002.

croes · on Oct 15, 2022

>I'd gone in hoping I could remove my dependence on Copilot

You already depend on Copilot?

abirch · on Oct 15, 2022

This is so awesome!!! Your team rocks. Too bad it wasn't named after your cat: FelinePilot or CatPilot.

arvonle · on Oct 15, 2022

The Gitlab mascot is a raccoon dog, closely related to the true fox. FauxPilot.

moyix · on Oct 15, 2022

I will inform Andrei and Katya that they have a fan :) Although they already have a pretty high opinion of themselves!

sva_ · on Oct 15, 2022

From https://github.com/moyix/fauxpilot

> if you have two NVIDIA RTX 3080 GPUs, you should be able to run the 6B model by putting half on each GPU.

Thats quite excessive, but nice that it can be ran locally.

> Note that the VRAM requirements listed by setup.sh are total -- if you have multiple GPUs, you can split the model across them.

This might not work with the upcoming RTX 40x0 GPUs unless they have enough memory, as they abolished NVLink in that series.

Edit: I'm now seeing there are smaller models too. Going to give it a try tomorrow.

woodson · on Oct 15, 2022

You don’t need NVlink for that to work.

sva_ · on Oct 15, 2022

Apologies for my ignorance, but don't you need NVLink to pool the memory of the cards together?

moyix · on Oct 15, 2022

Nope. The way it works is FasterTransformer splits the model across the two GPUs and runs both halves in parallel. It periodically has to sync the results from each half, so it will go faster if you have a high-bandwidth link between the GPUs like NVLink, but it will work just fine even if they have to communicate over PCIe peer-to-peer or even communicating via the CPU.