Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, it'll certainly be more expensive in any conceivable model that handles all three modalities, but if the model uses an architecture like current autoregressive, token-based multimodal LLMs/VLMs, tokens will make just as much sense as the basis for pricing, and be similarly straightforward, as with text and images.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: