Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>As much as I would love to waste my time replying again to your nonsense, instead I'll just politely chuckle and move on. Good luck.

You have your head so far up your ass even direct confirmation from the model builders themselves won't sway you. The comment wasn't for you. The comment is linked sources for the original poster and for the curious.

You see I don't have to hide behind a veneer of "Trust me bro. It works like this".



>even direct confirmation from the model builders themselves

Linking papers that you clearly haven't read and can't contextually apply -- as with the ViT or your misunderstanding of image tiling -- is not the sound strategy you hope it is. It doesn't confirm your claims.

I'm not asking anyone to "Trust me bro". So...have you called the Gemini Pro 1.5 API and tokenized an image or a video yet?

There is a certain element of this that is just spectacularly obvious to anyone who spent even a moment of critical thought -- if they're so capable -- on it. Your claim is that a high resolution image is tiled to a 16x16 array...and the magic model can at some later point magically on demand extract any and all details, such as OCR, from that 16x16. This betrays a fundamental ignorance of even the most basic of information theory.

Again, I would love to just block you and avoid the defensive insults you keep hurling, but this site lacks the ability. Stop replying to me, however many more contextually nonsensical citations you think will save face. Thanks.


>So...have you called the Gemini Pro 1.5 API and tokenized an image or a video yet?

You continue to blow my mind. Have you...have you even used the gemini pro api before ? You can't use the api to get the image tokens.

>This betrays a fundamental ignorance of even the most basic of information theory.

Wow, something else you don't understand. Go figure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: