Hacker Newsnew | past | comments | ask | show | jobs | submit | FiniteIntegral's commentslogin

Yet at the same time "towards" does not equate to "nearing". Relative terms for relative statements. Until there's a light at the end of the tunnel, we don't know how far we've got.


I think part of this is due to the AI craze no longer being in the wildest west possible. Investors, or at least heads of companies believe in this as a viable economic engine so they are properly investing in what's there. Or at least, the hype hasn't slapped them in the face just yet.


Agreed. Security is a task that not even a group of humans can perform with upmost scrutiny or perfection. 'Eternal vigilance is the price of liberty' and such. People want to move fast and break things without the backing infrastructure/maintenance (like... actually checking what the AI wrote).


Ah yes... Move face and break things. Well Facebook didn't overpromise on that one...


Agreed, it's cherry picked and strange since a lot of western developers already profess a lot of these practices -- especially the "extensive comments" and "descriptive naming" points.

It reminds me a lot about "innovative Japanese management solutions" which consists of MBAs learning what a bottleneck is and that DevOps is just sensible basic business practice.


I think you need to spend some more time testing this service if you are advertising this as a service that inherently interfaces with humans. I see that others in this thread like the applications for scambaiting, but I don't fully understand the use case you have here. If it's AI on both ends of the phone... whats the point of the call in the first place? It's not that hard to get a human on the other line who is able to help me far better than any robotic agent could.

If the agent has trouble solving "complex verification or (providing) documents" I doubt that a monthly fee for simple tasks doesn't sound like a viable and sustainable business model. It sounds like the anti-social bunch would like it but past that it's going to be hard drumming up a lot of support.


> It's not that hard to get a human on the other line who is able to help me far better than any robotic agent could

Are we living in different universes?


My favorite is giant megacorps that literally make it impossible. One (recently) even told me, after wandering through the menu options, that they were going to text me a link to their app - and then hung up on me.

I already tried the app, their system was broken - that’s why I was trying to call and talk to a human!

Bonus - they didn’t text me either


can think of all sort of use cases - imagine you integrating it with an automated agentic workflow - where at some websites you need to talk to a bot or real human to get the job done in realtime - because email takes a while and may not be available (for e.g. at a restaurant) - this service can do the job as instructed by the LLM and get back to you for status. For e.g. if you want to call 10 restaurants to find out if a seat for 20 is available - you can just instruct it via an agent or so..


I think you underestimate the willingness of people to pay to troll, it may filter out people but an app that was (in theory) meant to be secure shouldn't think of a problem as filtering rather than securing. Admins knowing peoples' identities simply moves the weakest link in the chain to the admins. I think an app like this was doomed from the start and 4chan simply pulled the plug on an already leaking bathtub.


I've thought about buying throwaway phone numbers just to troll linkedin. I'd be surprised if people weren't finding ways to get accounts on apps like this for trolling.

The only reason I haven't is because it feels like LinkedIn may have already jumped the shark and I wouldn't really get the value for my money.


> The only reason I haven't is because it feels like LinkedIn may have already jumped the shark and I wouldn't really get the value for my money.

You'd get the lulz. That in itself can be mentally satisfying.

Tbh I've thought about trolling LinkedIn myself. It honestly needs to die.


> Admins knowing peoples' identities simply moves the weakest link in the chain to the admins.

And now you have a better chance at pointing a finger at someone, at the very least. And the thought of that finger pointing would be enough to keep an admin on top of things.


Are there any premium troll Sites?


Twitter with check mark


Great point.


Apple released a paper showing the diminishing returns of "deep learning" specifically when it comes to math. For example, it has a hard time solving the Tower of Hanoi problem past 6-7 discs, and that's not even giving it the restriction of optimal solutions. The agents they tested would hallucinate steps and couldn't follow simple instructions.

On top of that -- rebranding "prompt engineering" as "context engineering" and pretending it's anything different is ignorant at best and destructively dumb at worst.


That's one reading of that paper.

The other is that they intentionally forced LLMs to do the things we know are bad at (following algorithms, tasks that require more context that available, etc) without allowing them to solve it in a way they're optimized to do (write a code that implements the algorithm).

A cynical read is that the paper is the only AI achievement Apple has managed to do in the past few years.

(There is another: they managed not to lose MLX people to Meta)


> On top of that -- rebranding "prompt engineering" as "context engineering" and pretending it's anything different is ignorant at best and destructively dumb at worst.

It is different. There are usually two main parts to the prompt:

1. The context.

2. The instructions.

The context part has to be optimized to be as small as possible, while still including all the necessary information. It can also be compressed via, e.g., LLMLingua.

On the other hand, the instructions part must be optimized to be as detailed as possible, because otherwise the LLM will fill the gaps with possibly undesirable assumptions.

So "context engineering" refers to engineering the context part of the prompt, while "prompt engineering" could refer to either engineering of the whole prompt, or engineering of the instructions part of the prompt.


I'm getting on in years so I'm becoming progressively more ignorant on technical matters. But with respect to something like software development, what you've described sounds a lot like creating a detailed design or even pseudocode. Now I've never found typing to be the bottle neck in software development, even before modern IDEs, so I'm struggling to see where all the lift is meant to be with this tech.


> But with respect to something like software development, what you've described sounds a lot like creating a detailed design or even pseudocode.

What I described not only applies to using AI for coding, but to most of the other use cases as well.

> Now I've never found typing to be the bottle neck in software development, even before modern IDEs, so I'm struggling to see where all the lift is meant to be with this tech.

There are many ways to use AI for coding. You could use something like Claude Code for more granular updates, or just copy and paste your entire code base into, e.g., Gemini, and have it oneshot a new feature (though I like to prompt it to make a checklist, and generate step by step).

And that is also not only about just typing, that is also about debugging, refactoring, figuring out how a certain thing works, etc. Nowadays I not only barely write any code by hand, but also most of the debugging, and other miscellaneous tasks I offload to LLMs. They are simply much faster and convenient at connecting all the dots, making sure nothing is missed, etc.


Let's just call all aspects of LLM usage 'x-engineering' to professionalise it, even while we're barely starting to figure it out.


It’s fitting, since the industry is largely driven by hype engineering.


It’s not good for engineering with the dilution of the term. We don’t really have many backup terms to switch to.

Maybe we should look to science and start using the term pseudo-engineering to dismiss the frivolous terms. I don’t really like that though since pseudoscience has an invalidating connotation whereas eg prompt engineering is not a lesser or invalid form of engineering - it’s simply not engineering at all, and no more or less ”valid”. It’s like calling yourself a ”canine engineer” when teaching your dog to do tricks.


We used to call both of these "being good with the Google". Equating it to engineering is both hilarious and insulting.


It is a stretch but not semantically wrong. Strictly, engineering is the practical application of science; we could say that the study of the usage of a model is indeed science and so by applying this science it is engineering.


Context engineering isn’t a rebranding. It’s a widening of scope.

Like how all squares are rectangles, but not all rectangles are squares; prompt engineering is context engineering but context engineering also includes other optimisations that are not prompt engineering.

That all said, I don’t disagree with your overall point regarding the state of AI these days. The industry is full of so much smoke and mirrors these days that it’s really hard to separate the actual novel uses of “AI” vs the bullshit.


Context engineering is the continual struggle of software engineers to explain themselves, in an industry composed of weak communicators that interrupt to argue before statements are complete, do not listen because they want to speak, and speak over one another. "How to use LLMs" is going to be argued forever simply because those arguing are simultaneously not listening.


I really don’t think that’s a charitable interpretation.

One thing I’ve noticed about this AI bubble is just how much people are sharing and comparing notes. So I don’t think the issue is people being too arrogant (or whatever label you’d prefer to use) to agree on a way to use.

From what I’ve seen, the problem is more technical in nature. People have built this insanely advanced thing (LLMs) and now trying to hammer this square peg into a round hole.

The problem is that LLMs are an incredibly big breakthrough, but they’re still incredibly dumb technology in most ways. So 99% of the applications that people use it for are just a layering of hacks.

With an API, there’s generally only one way to call it. With a stick of RAM, there’s generally only one way to use it. But to make RAM and APIs useful, you need to call upon a whole plethora of other technologies too. With LLMs, it’s just hacks on top of hacks. And because it seemingly works, people move on before they question whether this hack will still work in a months time. Or a years time. Or a decade later. Because who cares when the technology would already be old next week anyway.


It's not a charitable opinion. It is not people being arrogant either. It's the software industry's members were not taught how to effectively communicate, and due to that the attempts by members of the industry to explain create arguments and confusion. We have people making declarations, with very little acknowledgement of prior declarations.

LLMs are extremely subtle, they are intellectual chameleons, which is enough to break many a person's brain. They respond as one prompts them in a reflection of how they were prompted, which is so subtle it is lost on the majority. The key to them is approaching them as statistical language constructs with mirroring behavior as the mechanism they use to generate their replies.

I am very successful with them, yet my techniques seem to trigger endless debate. I treat LLMs as method actors and they respond in character and with their expected skills and knowledge. Yet when I describe how I do this, I get unwanted emotional debate, as if I'm somehow insulting others through my methods.


That's interesting and a unique perspective. Like to hear more.


Ouija boards with statistical machinery :)


The paper in question is atrocious.

If you assume any kind of error rate of consequence, and you will get that, especially if temperature isn't zero, and at larger disk sizes you'd start to hit context limits too.

Ask a human to repeatedly execute the Tower of Hanoi algorithm for similar number of steps and see how many will do so flawlessly.

They didn't measure "the diminishing returns of 'deep learning'"- they measured limitations of asking a model to act as a dumb interpreter repeatedly with a parameter set that'd ensure errors over time.

For a paper that poor to get released at all was shocking.


At this point all of Apple's AI take-down papers have serious flaws. This one has been beaten to death. Finding citations is left to the reader.


It really says something when the instability of the dollar is (relatively) as bad as when Nixon took us off the Gold Standard in 1973. Trump's policies certainly have caused a large amount of instability.



It's easy to blame an individual administration but the reality is pure fiat currencies will always end in this way. When was the last time the US had a balanced budget? Clinton? If you don't have a constraint on printing new currency you will always print more.

A good example I heard today was this. Imagine if you have a legit money printer. Show me the most pure human and eventually they will hit that button and print new money. That's what we've been doing for a long time now to finance all the wars and bailouts.

https://fred.stlouisfed.org/series/M2SL

A good book: https://www.lynalden.com/broken-money/


It isn't just one administration. There's quite a bit of consistency over which administrations are "good" for the economy and the people, and which are "bad".


Oh we can happily blame every administration that did this. It may be the natural conclusion of this behavior but that doesn't mean we need to continually rush head first into trouble. The current administration absolutely needs to shoulder more scrutiny than the past ones because they are actively making decisions. They don't get a pass just because the others did it too.


It's not surprising that responses are anecdotal. An easy way to communicate a generic sentiment often requires being brief.

A majority of what makes a "better AI" can be condensed to how effective the slope-gradient algorithms are at getting the local maxima we want it to get to. Until a generative model shows actual progress of "making decisions" it will forever be seen as a glorified linear algebra solver. Generative machine learning is all about giving a pleasing answer to the end user, not about creating something that is on the level of human decision making.


At risk of being annoying, answers that feel like high quality human decision making are extremely pleasing and desirable. In the same way, image generators aren't generating six fingered hands because they think it's more pleasing, they're doing it because they're trying to please and not good enough yet.

I'm just most baffled by the "flashes of brilliance" combined with utter stupidity. I remember having a run with early GPT 4 (gpt-4-0314) where it did refactoring work that amazed me. In the past few days I asked a bunch of AIs about similar characters between a popular gacha mobile game and a popular TV show. OpenAI's models were terrible and hallucinated aggressively (4, 4o, 4.5, o3-mini, o3-mini-high), with the exception of o1. DeepSeek R1 only mildly hallucinated and gave bad answers. Gemini 2.5 was the only flagship model that did not hallucinate and gave some decent answers.

I probably should have used some type of grounding, but I honestly assumed the stuff I was asking about should have been in their training datasets.


SoftBank Group is not always known for the most sound funding, they did invest in the "Stargate" program that hasn't seen a whole lot of action.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: