It's fairly easy to pay OpenAI or Mistral money to use their API's. Figuring out...

Deathmax · on June 27, 2024

Gemini models on Vertex AI can be called via a preview OpenAI-compatible endpoint [1], but shoving it into existing tooling where you don't have programmatic control over the API key and is long lived is non-trivial because GCP uses short lived access tokens (and long-lived ones are not great security-wise).

Billing for the Gemini models (on Vertex AI, the Generative Language AI variant still charges by tokens) I would argue is simpler than every other provider, simply because you're charged by characters/image/video-second/audio-second and don't need to run a tokenizer (if it's even available cough Claude 3 and Gemini) and having to figure out what the chat template is to calculate the token cost per message [2] or figure out how to calculate tokens for an image [3] to get cost estimates before actually submitting the request and getting usage info back.

[1]: https://cloud.google.com/vertex-ai/generative-ai/docs/multim...

[2]: https://platform.openai.com/docs/guides/text-generation/mana...

[3]: https://platform.openai.com/docs/guides/vision/calculating-c...

luke-stanley · on June 27, 2024

Good to know about this API preview. Hopefully the billing problem and UI maze of Vertex AI can be sorted too?

Flumio · on June 27, 2024

Google does plenty of ux studies on gcp. I took part in at least 3 of them.

I'm also not sure if I understand your problem with pricing? Depending on what you do with it, it's not just an LLM. It actually started before llms.

Pricing for image classification and other features are completely different products like an LLM.

luke-stanley · on June 27, 2024

They should do a whole lot more then! Ideally they'd have effective impact. It's a busy mess on GCP. If they wanted to compete well, they should do much better with UX design, especially for onboarding. Compare how easy setting up a Mistral account is with GCP to do some generative LLM in a Python script. GCP is a maze. Did you make an account to reply to this? I'm curious what you do with GCP? Are you a heavy user?

Flumio · on June 28, 2024

I create new accounts because I use hn too much.

I use gcp professional every day and always found it quite intuitive.

Did plenty of image classification with vertex ai too

luke-stanley · on June 28, 2024

Why would you make new accounts because you use HN too much? Doesn't make sense to me. Anyhow if you use GCP every day, you're going to have learned it's weird clunky behaviour. GCP's main problem is that they've steadily become a sprawling mess of complexity, which is in big contrast to quite a few LLM specific cloud services that are happy to take peoples money without extra complexity?

Flumio · on June 28, 2024

Not being logged in feels like a bigger hurdle to comment and check if someone responded to it.

It's a shitty solution to a stupid problem ;)

But I did mention that vertex AI is more than just hosting llms though

ankeshanand · on June 27, 2024

If you're an individual developer and not an enterprise, just go straight to Google AIStudio or GeminiAPI instead: https://aistudio.google.com/app/apikey. It's dead simple getting an API key and calling with a rest client.

luke-stanley · on June 27, 2024

Interesting but when I tried it, I couldn't figure out the billing model because it's all connected to Google projects, and there can be different billing things for each of them.

Each thing seems to have a bunch of clicks to setup that startup LLM providers don't hassle people with. They're more likely to just let you sign in with some generic third party oAuth, slap on Stripe billing, let you generate keys, show you some usage stats, getting started docs, with example queries and a prompt playground etc.

What about the Vertex models though? Are they all actually available via Google AI Studio?

lhl · on June 27, 2024

Sadly, while gemma-2-27b-it is available (as a Preview model) on the AI Studio playground, it didn't show up via API on list_models() for me.

bapcon · on June 27, 2024

I have to agree with all of this. I tried switching to Gemini, but the lack of clear billing/quotas, horrible documentation, and even poor implementation of status codes on failed requests have led me to stick with OpenAI.

I don't know who writes Google's documentation or does the copyediting for their console, but it is hard to adapt. I have spent hours troubleshooting, only to find out it's because the documentation is referring to the same thing by two different names. It's 2024 also, I shouldn't be seeing print statements without parentheses.

logankilpatrick · on June 28, 2024

We are working hard to improve this across ai.google.dev (Gemini API), Hang tight!

hnuser123456 · on June 27, 2024

I plan on downloading a Q5 or Q6 version of the 27b for my 3090 once someone puts quants on HF, loading it in LM studio and starting the API server to call it from my scripts based on openai api. Hopefully it's better at code gen than llama 3 8b.

alekandreev · on June 27, 2024

Happy to pass on any feedback to our Google Cloud friends. :)

anxman · on June 27, 2024

I also hate the billing. It feels like configuring AWS more than calling APIs.

luke-stanley · on June 27, 2024

Thank you!