If you are interested in (2026-)internet scale data engineering challenges (e.g. 10-100s of petabyte processing) challenges and pre-training/mid-training/post-training scale challenges, please send me an email to d+data@krea.ai !
It's interesting to think that—independently of what you think of Cursor's browser implementation being truly "from scratch" or not—the fact that people are implementing browsers from scratch with agents happened because of Cursor's post. In other words, in a twisted and funny way, this browser exists because of Cursor's agent.
This is how we should be thinking about AI safety!
I mean I wanted to demonstrate further how wrong and misleading I think their initial blog post was so yeah, I made this because of what they said and marketed :)
Care is the most important trait of people who make great things; it's not money or time. Is not even skill.
I was interviewing a candidate yesterday and I noticed that a project inside their personal website was not working. I told him my opinion on care and he said that he hasn't had the time to deploy it, since he's been working on it for 2 weeks already and it was working on his local machine.
A few hours after the interview, the project was online.
The bitter pill of realizing the importance of care is that this applies not just to literary works, like Gwern's case, but it also applies to any creative endeavor: writing, music, drawing, and yes, software engineering.
That CLI tool without a tutorial. That product with a confusing sign-up flow. The purchase without a confirmation dialog such that I don't feel I was just scammed.
It's all the same. Lack of care.
I've also noticed that when caring is there, skills follow.
Dang, you should change it to "Lopadotemachoselachogaleokranioleipsanodrimhypotrimmatosilphiokarabomelitokatakechymenokichlepikossyphophattoperisteralektryonoptekephalliokigklopeleiolagoiosiraiobaphetraganopterygon" via your admin superpowers!
I doubt that can happen because that would go over the length limit, probably it should be "The Longest Word In Literature"
as for it screwing with mobile site width, on desktop FF putting width small seems to work fine as the word seems to have soft hyphens in it? Because it splits at the window edge with a hyphen in place.
I wrote here or maybe elsewhere that on using opera browser on my phone, it allows word wrap automatically. My mobile experience is almost never broken
A good test might be to provide it only about a third of the tests, then when it says it's done, run it on the holdout 2/3 of tests and see how well it did. Of course it may have already seen the other tests during training, but that's not relevant here since the goal is to find whether or not it's just "brute force bumbling" its way through the task relying heavily on the test suite as bumper rails for feedback, or if it's actually writing generalizable bug-free code with active awareness of pitfalls and corner cases. (Then again it might be invalidated if this specific project was part of the RL training process. Which it may well have been, it's low hanging fruit to convert any repo with comprehensive test suite into training data).
Either way, most tasks don't have the luxury of a thorough test suite, as the test suite itself is the product of arduous effort in debugging and identifying corner case.
reply