It's fascinating that browsers are one of the most robust and widely available sandboxing system and we are yet to make a claude-code/gemini-cli like agent that runs inside the browser.
Browsers as agent environment opens up a ton of exciting possibilities. For example, agents now have an instant way to offer UIs based on tech governed by standards(HTML/CSS) instead of platform specific UI bindings. A way to run third party code safely in wasm containers. A way to store information in disk with enough confidence that it won't explode the user's disk drive. All this basically for free.
My bet is that eventually we'll end up with a powerful agentic tool that uses the browser environment to plan and execute personal agents or to deploy business agents that doesn't access system resources any more than browsers do at the moment.
But there is! ChatGPT.com has a canvas feature, and that can be used to render HTML and javascript, including UI controls. It's pretty neat, albeit limited.
Generated via ChatGPT, this canvas shows a basic pyramid and has sliders that you can use to change the pyramid, and download the glTF to your local machine. You can also click the edit w/ ChatGPT and tweak the UI however you're able to prompt it into doing.
> It's fascinating that browsers are one of the most robust and widely available sandboxing system and we are yet to make a claude-code/gemini-cli like agent that runs inside the browser.
It's easily explained by the fact that all the javascript code is exposed in a browser and all the network connections are trivially inspectable and blockable. It's much harder to collect data and do shady things with that level of inspectability. And it's much harder to ban alternative clients for the main paid offer. Especially if AI companies want to leave the door open to pushing ads to your conversations.
I use my phone when I want to measure stuff. Not an app, just the physical phone as a ruler. Almost always the dimensions of whatever phone I've got is published on the internet. It's a quick hack and better than carrying around A4 papers ;)
This. A language that doesn't adapt (accumulate shitpile of baggage from other languages over changing times) will be a dead language eventually.
English will always have my respect for being open/inclusive and adaptive.
Interesting fact: If you are looking for a spoken language with the cleanest/composable grammar - it's Sanskrit. The panini grammar is actually like a programming language where sentences are just compositions of lower level similar units.
But like I said it's practically dead (not used as a spoken language). But interestingly used as a proxy language for translation and other nlp tasks due to it's clean grammar :)
It'd be great if it supports a wasm/web backend as well.
I bet a lot of trivial text capabilities (grammar checking, autocomplete, etc) will benefit from this rather than sending everything to a hosted model.
It's possible right now with onnx / transformers.js / tensorflow.js - but none of them are quite there yet in terms of efficiency. Given the target for microcontrollers, it'd be great to bring that efficiency to browsers as well.
You can compile to wasm, I have done so via the XNNPACK backend - you might have to tweak the compilation settings and upgrade the XNNPACK submodule/patch some code. But this only supports CPU, not a WebGPU or WebGL backend.
Nice one. How does this relate to/differ from Chrome Devtool's inbuilt network throttling feature? I usually use this Chrome feature to test apps over bad/slow network.
A while back I did this experiment where I asked ChatGPT to imagine a new language such that it's best for AI to write code in and to list the properties of such a language. Interestingly it spit out all the properties that are similar to today's functional, strongly typed, in fact dependently typed, formally verifiable / proof languages. Along with reasons why such a language is easier for AI to reason about and generate programs. I found it fascinating because I expected something like typescript or kotlin.
While I agree formal verification itself has its problems, I think the argument has merit because soon AI generated code will surpass all human generated code and when that happens we atleast need a way to verify the code can be proved that it won't have security issues or adheres to compliance / policy.
AI generated code is pretty far from passing professional human generated code, unless you're talking about snippets. Who should be writing mission critical code in the next 10 years, the people currently doing it (with AI assistance), or e.g. some random team at Google using AI primarily? The answer is obvious.
I also find that sticking to a single file makes coding agents perform better (fewer surgical edits, faster outputs, sensible changes, etc).
Not sure why, but the moment the file is split into files and subfolders, coding agents tend to do a lot more changes that what is absolutely necessary. That way a single html file wins!
At this point Microsoft office suite is practically a monopoly. Governments around the world rely on it. Every big enterprise and every business needs it.
The spec for office documents was authored by Microsoft( and approved by Microsoft!). The spec is basically the docx datastructure published publicly as a standard - which makes building competing office suites even harder.
Given the situation there isn't much customers can do if Microsoft decides to hike the prices anyhow they like.
Note: Indian Government recently adopted Zoho office suite to insulate themselves from Microsoft.
But I don't think many other governments or businesses have the guts to make such move.
> At this point Microsoft office suite is practically a monopoly.
There are loads of competitors in the space. Google Docs, LibeOffice, OnlyOffice, WPS Office, and I'm sure there are many others in the space that are lesser known. All of these are compatible with Office formats.
It's more a ton of inertia than some sort of monopoly. A lot of new companies immediately start on an alternative these days. They don't see a reason to pay the higher price.
I agree that it's a lot of work to build something that can render and edit their complex format, but quite a few companies have managed now.
> Note: Indian Government recently adopted Zoho office suite to insulate themselves from Microsoft.
India's central government didn't adopt Zoho just to insulate against Microsoft. It was done when Trump imposed a 50% tariff on imports from India. It was targeted against US IT companies in general, though the most mentioned one was Google. Zoho is an Indian company.
I had switched to Zoho about 6 months before them and it has provided a rather decent experience so far. The biggest attractions for me though, are that it's very economical and it has transactions in local currency using local payment systems. They also have a good selection of apps.
Honestly, this was a wasted opportunity for GoI. Indian domestic IT market is an untapped gold mine that they didn't promote much until recently. But better late than never, I guess.
Another relevant point here is that India is one of the countries that voted against Microsoft OOXML document format in favor of ODF at ISO. There are several central and state level government agencies that adopted ODF officially.
I remember when Microsoft Office truly felt like a monopoly. In the 90s, nothing could really read/write Microsoft formats reliably. People weren't using PDFs as much and teachers, jobs, etc. all expected you to be sending them .doc files.
Yes, Microsoft wrote the spec fox .docx, but submitted it as an ECMA standard and that meant that people could create alternatives that could read/write .docx quite well. Sure, Microsoft has a little bit of a leg up, but it's nothing like the monopoly they had on .doc.
Today, we expect programs to be able to read and write Microsoft Office formats. In the 90s, we truly didn't. Yes, there might be some advanced things that don't always work, but it's so different today.
I got a bad grade in a highschool English class because the teacher didn't like the doc file generated by StarOffice. My dad came round the school raising hell and got her to grade the paper on contents, saying if they wanted me to have office they could buy a copy of it. I got an A- after that
The criticisms here surprise me. "Programmatic Tool Calling" is a huge leap when you want AI to work with your app - like a human would.
I've been trying to get LLMs to work in our word processor documents like a human collaborator following instructions. Writing a coding agent is far more straightforward (all code are just plain strings) than getting an agent to work with rich text documents.
I imagined the only sane way is to expose a document SDK and expect AI to write programs that call those SDK APIs. That was the only way to avoid MCPs and context explosion. Claude has now made this possible and it's exciting!
Open-source has many technical advantages over closed-source, in addition to the moral ones (which are quite powerful themselves).
Being able to inspect the software you use makes you able to trust house it works, and fix it at points where it's not working; those were the first motivators for creating the FLOSS movement.
There's also the advantage that in the long term you don't depend on the company developing the software; if the company goes under, or simply stops supporting the software, you can hire a different batch of developers to carry on maintaining it. That's the reason why many big contracts require that the software vendor puts the source code under escrow.
In reality, closing the source of software only benefits the seller; everybody else benefits from having it available. With FLOSS, you get that for free.
Browsers as agent environment opens up a ton of exciting possibilities. For example, agents now have an instant way to offer UIs based on tech governed by standards(HTML/CSS) instead of platform specific UI bindings. A way to run third party code safely in wasm containers. A way to store information in disk with enough confidence that it won't explode the user's disk drive. All this basically for free.
My bet is that eventually we'll end up with a powerful agentic tool that uses the browser environment to plan and execute personal agents or to deploy business agents that doesn't access system resources any more than browsers do at the moment.
reply