Hacker Newsnew | past | comments | ask | show | jobs | submit | dfbrown's commentslogin

Their collaborator's report includes a more significant issue, an RCE on a mintlify server: https://kibty.town/blog/mintlify/


How real is it though? This blog post says

In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.

which makes it sound like they used text + image prompts and then acted them out in the video, as opposed to Gemini interpreting the video directly.

https://developers.googleblog.com/2023/12/how-its-made-gemin...


After reading this blog post, that hands-on video is just straight-up lying to people. For the boxcar example, the narrator in the video says to Gemini:

> Narrator: "Based on their design, which of these would go faster?"

Without even specifying that those are cars! That was impressive to me, that it recognized the cars are going downhill _and_ could infer that in such a situation, aerodynamics matters. But the blog post says the real prompt was this:

> Real Prompt: "Which of these cars is more aerodynamic? The one on the left or the right? Explain why, using specific visual details."

They narrated inaccurate prompts for the Sun/Saturn/Earth example too:

> Narrator: "Is this the right order?"

> Real Prompt: "Is this the right order? Consider the distance from the sun and explain your reasoning."

If the narrator actually read the _real_ prompts they fed Gemini in these videos, this would not be as impressive at all!


Out of curiosity I've asked GPT-4V the same questions:

    Prompt: [attached image of the post-it notes with sun and planets] Is this right order?
    GPT-4: The image you've uploaded shows three drawings on individual pieces of paper. From left to right, there's a drawing of the sun, then what appears to be Saturn, and lastly a drawing that resembles Earth. If the intention is to display these celestial objects by their proximity to the Sun, then the order should be the Sun, Earth, and then Saturn, as Earth is closer to the Sun than Saturn. Therefore, the current order is not correct. To correct the order, you should place the Earth drawing second, between the Sun and Saturn.

    Prompt: [attached the image of the two car drawings] Based on their design, which one would go faster?
    GPT-4: The image shows two drawings of vehicles on slopes. The one on the left appears to have a more traditional box-like design, while the one on the right has a streamlined design with a pointed front. Based on principles of physics and aerodynamics, the vehicle with the streamlined design (on the right) would likely go faster down the slope. Streamlining reduces air resistance (drag), allowing the vehicle to move more efficiently. The pointed front helps to cut through the air and minimize the drag force, which would result in higher speeds compared to the boxier design on the left, which would encounter more air resistance.
I'm actually pretty impressed how well it did with such basic prompts.


What do you mean "Real Prompt"? Nowhere does it say these are the real prompts, it says

> In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.

Not "here are the full prompts used in the video" or something like that.

None of the entries match up 1:1. And the response to the car example in the video doesn't even make sense in response to the prompt in the post (no mention of speed), and certainly isn't a trimmed portion of the response in the post.

The video has the disclaimer "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity". It would be weird to write that but not mention that neither the prompts nor responses shared even the same set of words in the same order with the "Real" prompts and responses.

I think your assumption is wrong on this one.


Wow I was blown away when I watched this video.

Now that I learned how fake it is, that is more evidence that Google is in really bad shape with this.


>If the narrator actually read the _real_ prompts they fed Gemini in these videos, this would not be as impressive at all!

It's crazy that this is where we are now. This is obviously still crazy impressive even if hadn't done those edits.


It might still be crazy impressive, but none-the-less, going forward we now know that we cannot trust Google's videos about it, as they're heavily edited to look a lot more impressive than it is.


Those prompts aren't far off, but I still don't know how realistic the demo is. Until a product is in my hands, as far as I'm concerned it doesn't exist.


A lesson in how to commit securities fraud and get away with it.


Boo! Complete marketing garbage. May as well have been a Flash demo.


Yeah I think this comment basically sums up my cynicism about that video.

It's that, you know some of this happened and you don't know how much. So when it says "what the quack!" presumably the model was prompted "give me answers in a more fun conversational style" (since that's not the style in any of the other clips) and, like, was it able to do that with just a little hint or did it take a large amount of wrangling "hey can you say that again in a more conversational way, what if you said something funny at the beginning like 'what the quack'" and then it's totally unimpressive. I'm not saying that's what happened, I'm saying "because we know we're only seeing a very fragmentary transcript I have no way to distinguish between the really impressive version and the really unimpressive one."

It'll be interesting to use it more as it gets more generally available though.


You can see the cracks in the feature early on:

"What do you think I'm doing? Hint: it's a game."

Anyone with as much "knowledge" as Gemini aught to know it's roshambo.

"Is this the right order? Consider the distance from the sun and explain your reasoning."

Full prompt elided from the video.


I’ve heard of roshambo mostly from South Park

https://www.urbandictionary.com/define.php?term=roshambo

I’ve vaguely heard the term before, but I don’t know what regions of the world actually use that term.


Never heard it called that. Curious where you are from?


For what it's worth, I was born and raised in the Bay Area (in the 90s), and we called it ro-sham-bo growing up. Although it's incredibly strange to see that word in writing, I would always call it rock paper scissors if I were to write it.


It's always like this isn't it. I was watching the demo and thought why ask it what duck is in multiple languages? Siri can do that right now and it's not an ai model. I really do think we're getting their with the ai revolution but these demos are so far from exciting, they're just mundane dummy tasks that don't have the nuance of everything we really interact and would need help from an ai with


How do you know though? The responses in the video were not the same as those in the blog post.


"The orders instruct him to not make public any information that would identify A.B., or the medical professionals involved, to call A.B. by the child’s preferred name and gender pronoun, and to not share his opinions of the case publicly."

"In June 2020, C.D. gave an interview to a YouTube channel, where he’s alleged to have identified health-care providers, revealed information about A.B.’s mental health, medical status or treatments, and gave out information that could reveal C.D., A.B. and the mother’s identity."

"“This order should not restrict C.D.’s right to express his opinion in his private communications with family, close friends and close advisors, provided none of these individuals is part of or connected with the media or any public forum, and provided C.D. obtain assurances from those with whom he shares information or views that they will not share that information with others,” the court said."

It sounds like his arrest is more due to him repeatedly discussing the case in public rather than the pronouns he used.

https://nationalpost.com/news/b-c-father-arrested-held-in-ja...


There are two separate things here, interpenetration due how contacts/constraints are solved, and whether you do continuous collision sub-stepping or not.

For stability and performance reasons physics engines usually have parameters that soften or add compliance to contacts/constraints. A bit of compliance is almost always better than infinitely stiff collisions/constraints. There are cases where infinitely stiff systems either have no solution, are very expensive to solve, or would produce very extreme impulses (causing things to explode). Including some compliance fixes these issues. It is also often required to produces more realistic looking results since objects in real life aren't infinitely stiff, they either flex or break.

For performance reasons most physics engines also do not completely solve their constraints. They either use a fixed number of iterations (most common, including the demos here) or solve up to some specified error threshold. This tends to add some additional compliance to complex scenes (stacks/piles of objects for example).

With the right parameters a good rigid body physics engine should be able to prevent noticeable interpenetration in most situations, though the performance cost may not be worth it. In these demos if you max out Position iterations, velocity iterations, and frequency you should see significantly less interpenetration.

As for continuous collision detection/sub-stepping, this is a very common feature to prevent fast moving objects from clipping into or tunneling through other objects. However resting/continuous contact cannot be handled by stepping to the next intersection time and so have to be handled differently. Also multiple simultaneous or near simultaneous collisions can grind things to a halt in degenerate cases (such as multiple stacked objects that are almost, but not quite in resting contact). This is why physics engines that support continuous collision usually let you set a maximum number of sub-steps.


All but two state's electoral votes are decided by the result of the popular vote in that state, congressional districts have no effect on that.

Local and proportional representation is the main reason for having the house of representatives, while the reason for Gerrymandering is to disproportionately favor one party or another.


No, ranges in C++ are analogous to the iterator trait in rust


I'm not an expert in the topic, but my understanding is RGB is a poor color space for computing color difference. This could be why your mosaics end up so washed out. [1] suggests using a CIELAB color space [2].

Edit: Looking at the code more closely it looks like you were using Lab at one point but commented it out[3], so I'm guessing you're already aware of this.

1: https://stackoverflow.com/a/9019461/185171

2: https://en.wikipedia.org/wiki/CIELAB_color_space#CIELAB

3: https://github.com/worldveil/photomosaic/blob/bb720efda11383...


It didn’t make a ton of difference empirically when I tried it.

But you’re correct, generally that’s the space you want to be in.


It is quite noticeable when you are using a limited selection of tiles or image has desaturated colors.

I've done some similar work[1], but issue with L*ab color is it's terrifically slow to calculate diff, at least in JS.

[1] https://imgur.com/a/g3EzcSV


Totally. Everything improves with more images since that’s your palette for painting, so to speak. The ones I posted were with less than 100 images so you can definitely do better.

L*b was very slow, yes. It’s all done offline but I tend to like quicker feedback...


there is no perfect way to measure color distances, mostly because it needs to account for human perception of color and there are individual differences in color perception among people.


In addition, in Tim's post on the python mailing list he says he developed the algorithm after reading through a number of papers on sorting[1]. So while the algorithm was developed with the goal of being practical, plenty of academic research was used to develop it.

[1]: https://mail.python.org/pipermail/python-dev/2002-July/02683...


Reading the mailing list end-to-end actually shows how much research Tim did before designing it. To suggest that the "academics" had no clue is extremely disingenuous. Worse, it misleads the next generation of great would-be algorithm designers into thinking that studying the research is foolish. Study of both the theoretical and the practical are hugely important; ignore either at your peril.


I believe Tim Peters was trained as a mathematician. He is good at analyzing algorithms. Often he will post on the Python lists about the subtleties of floating point behavior. As you say, he read a much of recent research on sorting before coming up with TimSort. He special ability I think is making the "engineering" tradeoffs to come up with an algorithm that works well on real-world data.


You believe or you know? Nothing I've read suggests that. Please don't advance ideas like that without giving proper justification; this is exactly how myths start.


Worth noting that ucontext is quite slow (at least on linux): https://www.boost.org/doc/libs/1_67_0/libs/context/doc/html/...


It's fast enough for many applications.

I think ucontext is an excellent starting point for a general implementation. You just abstract it with a thin veneer and adopt faster implementations as needed where applicable.


It's slower than Boost Context for sure, but still around 10x faster than (ab)using Pthreads on Linux. Until someone releases a faster standalone C library it's still the fastest portable solution for projects like Cixl that can't afford dragging C++ around.


There's already many fast, portable, standalone C libraries for this. For instance, see table 1 of https://www.gnu.org/software/pth/rse-pmt.ps. The assembly code for a context switch is pretty minimal (on most platforms, it's just setjmp and longjmp) if you don't try manage the signal state. I would be surprised if there was significant variance in performance (other than whether it chooses to switch the signal mask). Additionally, many language runtimes directly support it without any extra effort on your part. So if you choose to instead use one of those language, you get a fast portable solution without needing to do any extra work to pick a support library (for example, D-Lang LDC, PyPy, Go, Julia).


I implemented coroutines for C with assembly [1] (x86 32 and 64 bit). I took advantage of the calling convention to cut down on the amount of state to save (4 registers for x86 32b and six for x86 64b). Mixing this with signals is probably unwise [2]. So far I've tested the code on Linux and Mac OS-X and it works (although I might not use it for C++ either).

[1] https://github.com/spc476/C-Coroutines

[2] In my not-so humble opinion, using signals at all is not wise.


The shortest contest switch sequence I could come up on x86-64 is three instructions:

  xchg  %rsp, %rdx
  leaq  1f(%rip), %rax
  jmp   *%rsi
1:

It it expect the target stack ptr/ip pair to be in rdx/rsi and saves the current stack ptr and ip in rdx/rax. It does not save any register and uses gcc asm clobbers to instruct the compiler to save any other register.

Code at [1]. The comments about hacks and ub is because I'm trying to transparently propagate exceptions across coroutines, otherwise the stack switching us fairly robust (although GCC specific).

[1] https://github.com/gpderetta/delimited/blob/master/delimited...


Signals, done correctly, are hard. I agree with your NSHO for the most part. Neat code!


Not faster than dealing directly with ucontext from what I've seen; many wrap it directly and the rest tend to emulate using signals, setjmp and prayers.

I would love to be wrong though...


ucontext modifies the signal mask, requiring a syscall. That’s very expensive (as shown by the boost benchmark above), and usually unnecessary. “emulate” is a rather negative sounding way to describe running effective the same code as ucontext does (which also happens to typically be the same as sigsetjmp) - it’s not like ucontext has some privledged permissions. It’s “just” a context switch.


I was under the impression that Boost Context is more than that; at least that's what the amount of assembler code tells me; but I'll be the first to admit I don't have much patience for deciphering modern C++. I get that it's possible to invoke the same functionality as ucontext minus signal masks manually, but I'm not convinced it would save enough cycles to pay for the added complexity.


Saving the signal mask easily cost hundreds to thousand cycles. One or two order of magnitude more than the rest of contex switching.


For the record: I just benchmarked GnuPth with NULL sigmask against ucontext for the example in the post and it's slightly (3.4/3.2s) slower.


C library public headers are commonly two levels deep to so that projects using them can add the "include" directory to their header search path and in their code have #include "<libname>/header.h". It helps avoid filename clashes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: