Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
adastra22
4 months ago
|
parent
|
context
|
favorite
| on:
Qwen3-Omni: Native Omni AI model for text, image a...
Sure but all of these find some way of mapping inputs (any medium) to state space concepts. That's the core of the transformer architecture.
ludwigschubert
4 months ago
[–]
The user you originally replied to specifically mentioned > without going to text first
adastra22
4 months ago
|
parent
[–]
Yeah, and that's my understanding. Nothing goes video -> text, or audio -> text, or even text -> text without first going through state space. That's where the core of the transformer architecture is.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: