Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I just tried Grok 4 and it's insanely good. I was able to generate 1,000 lines of Java CDK code responsible for setting up an EC2 instance with certain pre-installed software. Grok produced all the code in one iteration. 1,000 lines of code, including VPC, Security Groups, etc. Zero syntax errors! Most importantly, it generated userData (#!/bin/bash commands) with accurate `wget` pointing to valid URLs of the latest software artifacts on GitHub. Insane!


The problem is that code as a 1-off is excellent, but as a maintainable piece of code that needs to be in source control, shared across teams, follow standard SLDC, be immutable, and track changes in some state - it's just not there.

If an intern handed me code like this to deploy an EC2 instance in production, I would need to have a long discussion about their decisions.


How do you know without seeing the code?

How do you know the criteria you mention hasn't (or can't) be factored into any prompt and context tuning?

How do you know that all the criteria that was important in the pre-llm world still has the same priority as their capabilities increase?


Anyone using Java for IaC and Configuration Management in 2025 needs to reconsider their career decisions.


What does this have to do with anything? The Java constraint was supplied by a user, not the model.


Why? Modern Java - certainly since Java 8 - is pretty decent.


[flagged]


I find this comment very ironic in the context of this thread. Let's agree to disagree.


There's a chunk of the programming population who label everything they themselves didn't write as junk.


How do you know? Have you seen the code GP generated?


No, have you? They always seem to be missing from these types of posts. Personally I am skeptical, as AI has been abysmal at 1 shot provisioning actual quality cloud infrastructure. I wish it could, because it would make my life a lot less annoying. Unfortunately I have yet to really see it.


No, they're not. People talk about LLM-generated code the same way they talk about any code they're responsible for producing; it's not in fact the norm for any discussion about code here to include links to the code.

But if you're looking for success stories with code, they're easy to find.

https://alexgaynor.net/2025/jun/20/serialize-some-der/


> it's not in fact the norm for any discussion about code here to include links to the code.

I certainly didn't interpret "these types of posts" to mean "any discussion about code", and I highly doubt anyone else did.

The top-level comment is making a significant claim, not a casual remark about code they produced. We should expect it to be presented with substantiating artifacts.


I guess. I kind of side-eyed the original one-shotting claim, not because I don't believe it, but because I don't believe it matters. Serious LLM-driven code generation runs in an iterative process. I'm not sure why first-output quality matters that much; I care about the outcome, not the intermediate steps.

So if we're looking for stories about LLMs one-shotting high-quality code, accompanied by the generated code, I'm less sure of where those examples would be!


I could write a blog post exactly like this with my chatGPT history handy. That wasn't the point I was making. I am extremely skeptical of any claims that say someone can 1 shot quality cloud infrastructure without seeing what they produced. I'd even take away the 1-shot requirement - unless the person behind the prompt knows what they're doing, pretty much every example I've seen has been terrible.


I mean, I agree with you that the person behind the prompt needs to know what they're doing! And I don't care about 1-shotting, as I said in a sibling comment, so if that's all this is about, I yield my time. :)

There are just other comments on this thread that take as axiomatic that LLM-generated code is bad. That's obviously not true as a rule.


How do you know?


But isn't that just a few refactoring prompts away?


<3


I'd love to hear how grok works inside agentic coders like cursor or copilot for production code bases.


Please share your result if possible. So many lines in a single shot with no errors would indeed be impressive. Does grok run tools for these sorts of queries? (linters/sandbox execution/web search)


Out of curiosity, why do you use Java instead of typescript for CDK? Just to keep everything in one language?


Why not, I would say? What's the advantage of using Typescript over modern Java?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: