More

jmtulloss · 2026-02-03T01:26:43 1770082003

The comments so far seem focused on taking a cheap shot, but as somebody working on using AI to help people with hard, long-term tasks, it's a valuable piece of writing.

- It's short and to the point

- It's actionable in the short term (make sure the tasks per session aren't too difficult) and useful for researchers in the long term

- It's informative on how these models work, informed by some of the best in the business

- It gives us a specific vector to look at, clearly defined ("coherence", or, more fun, "hot mess")

kernc · 2026-02-03T02:22:14 1770085334

Other actionable insights are:

- Merge amendments up into the initial prompt.

- Evaluate prompts multiple times (ensemble).

sandos · 2026-02-03T10:36:59 1770115019

Sometimes when I was stressed, I have used several models to verify each others´ work. They usually find problems, too!

This is very useful for things that take time to verify, we have CI stuff that takes 2-3 hours to run and I hate when those fails because of a syntax error.

xmcqdpt2 · 2026-02-03T13:13:07 1770124387

Syntax errors should be caught by type checking / compiling/ linting. That should not take 2-3 hours!

nth21 · 2026-02-03T17:39:41 1770140381

There’s not a useful argument here. The article is using current AI to extrapolate future AI failure modes. If future AI models solve the ‘incoherence’ problem, that leaves bias as a primary source of failure (according to the author these are the only two possible failure modes apparently).

toroidal_hat · 2026-02-03T21:15:48 1770153348

That doesn't seem like a useful argument either.

If future AI only manages to solve the variance problem, then it will have problems related to bias.

If future AI only manages to solve the bias problem, then it will have problems related to variance.

If problem X is solved, then the system that solved it won't have problem X. That's not very informative without some idea of how likely it is that X can or will be solved, and current AI is a better prior than "something will happen".

nth22 · 2026-02-03T22:33:44 1770158024

> That's not very informative without some idea of how likely it is that X can or will be solved

Exactly, the authors argument would be much better qualified by addressing this assumption.

> current AI is a better prior than "something will happen".

“Current AI” is not a prior, its a static observation.

jmtulloss · 2026-02-01T05:27:22 1769923642

This is what I'm saying. Chipmunks are not squirrels. I will do my best on this hill.

eutropia · 2026-02-01T17:33:00 1769967180

is "do my best" some kind of weird censorship-speak euphemism for "die"???

pinkmuffinere · 2026-02-01T21:43:06 1769982186

I think it's meant to convey "I'm not willing to _die_, but I do feel strongly"

jmtulloss · 2026-02-02T05:20:50 1770009650

Yes, it’s not a hill I’m willing to die on. It’s a hill I’m willing to defend until the cause is lost.

jmtulloss · 2025-12-20T04:08:01 1766203681

Obviously novel problems require novel solutions, but the vast majority of software solutions are remixes of existing methods. I don’t know your work so I may be wrong in this specific case, but there are a vanishingly small number of people pushing forward the envelope of human knowledge on a day-to-day basis.

ewoodrich · 2025-12-20T04:50:28 1766206228

My company (and others in the same sector) depends on certain proprietary enterprise software that has literally no publicly available API documentation online, anywhere.

There is barely anything that qualifies as documentation that they are willing to provide under NDA for lock-in reasons/laziness (ERPish sort of thing narrowly designed for the specific sector, and more or less in a duopoly).

The difficulty in developing solutions is 95% understanding business processes/requirements. I suspect this kind of thing becomes more common the further you get from a "software company” into specific industry niches.

jmtulloss · 2025-12-12T03:33:33 1765510413

The reason for this is Rivian and Tesla bet big on software defined platforms… ie every piece of hardware talks to a small number of central computers instead of many independent systems. This gives them a huge leg up in developing software than can actually take all the available input and use it to control all aspects of the vehicle.

Downside is all the buttons are on a screen. But I’ve grudgingly decided it’s worth it for software upgrades.

jmtulloss · 2025-12-12T03:29:11 1765510151

The current Gen 1s will start beeping at you if they can’t see the lines. If you don’t take over quickly it will start slowing down and beeping very insistently.

jmtulloss · 2025-12-12T03:16:51 1765509411

Not only is Rivian betting on an integrated platform being important for their own cars long term, they’ve also essentially sold that portion of their business to VW. They are investing in the software platform for a lot more cars than just the rivian branded ones.

jmtulloss · 2025-11-01T04:49:40 1761972580

Linear is a venture funded company

jmtulloss · 2025-10-31T04:07:43 1761883663

A lot of commenters are focusing on the legalities and likelihood of backpay, which is relevant but I tend to agree with you… it’ll get paid because it’s in the interest of both parties to pay their employees what they’re owed.

We’re staring down the barrel of two missed paychecks though. If you're living paycheck to paycheck you’re getting desperate. If you’re living with about 1 month of emergency buffer… that buffer is one paycheck away from gone. It’s a cash flow issue

jmtulloss · 2025-10-31T03:42:12 1761882132

And the republicans could just vote to change the rules of the senate.

The out of power party gets a little veto power here. The republicans know the day will come they want that, so they won’t change the rules even though they have the power to do so (theoretically… there are republicans that will never compromise on this). Unfortunately they can’t get on the same page with their lame duck leader

jmtulloss · 2025-10-16T18:51:35 1760640695

My interpretation of the parent comment was that they were loading specific curl calls into context so that Claude could properly exercise the endpoints after making changes.