The author makes no effort to explain why AI :isn’t: a commodity as Apple and Amazon says. I was looking forward to that. I think the article is weak for not defending its premise. Everything else is fluff.
I agree - and if the article is correct and Apple and Amazon are the losers, I fail to glean who the winners will be or how their business model will be different.
That's fair, but it wasn't the point of the article because it's messy. Many would argue that core LLMs are 'trending' toward commodity, and I'd agree.
But it's complicated because commodities don't carry brand weight, yet there's obviously a brand power law. I (like most other people) use ChatGPT. But for coding I use Claude and a bit of Gemini, etc. depending on the problem. If they were complete commodities, it wouldn't matter much what I used.
A part of the issue here is that while LLMs may be trending toward commodity, "AI" isn't. As more people use AI, they get locked into their habits, memory (customization), ecosystem, etc. And as AI improves if everything I do has less and less to do with the hardware and I care more about everything else, then the hardware (e.g. iPhone) becomes the commodity.
Similar with AWS if data/workflow/memory/lock-in becomes the moat I'll want everything where the rest of my infra is.
I think you are conflating the Closure Library with the Closure Compiler. They are related but not identical. The Compiler, I think, is what makes it difficult to use externs; its “advanced optimizations” can and often does break libraries that weren’t written with the Compiler’s quirks in mind. But advanced optimizations is an option; if you don’t need aggressive minification, function body inlining, etc. you can opt out.
Shadow CLJS has made working with external libraries quite easy and IIRC it lets you set the compilation options for your libraries declaratively.
Yes and yes; in the past, prior to ECMAScript providing first-class inheritance, module ex/imports etc, the Library supplied methods to achieve these in development, and the Compiler would identify these cases and perform the appropriate prototype chaining, bundling, etc. See, eg, goog.provide
For the most part, I would guess people still use the Closure Compiler because of its aggressive minification or for legacy reasons. I think both are probably true for ClojureScript, as well as the fact that the Compiler is Java-based so it has a Java API that (I am guessing here) made it easier to bootstrap on top of the JVM Clojure tooling / prior art.
I've been doing frontend development for over 10 years, and obviously it's anecdotal but I never heard anyone use the Closure Compiler outside of ClojureScript, and I imagine that in practice most people doing frontend development are using Webpack, Vite, Parcel, etc. The idea of really small bundles sounds nice, but in practice because the advanced optimizations require manual tweaking in many cases to get it to work (externs) it's something few people would want to deal with and the small bundle size improvement isn't worth it compared to the standard tools like UglifyJS/Terser.
There may be other reasons, but I assume the main reason that the Closure Compiler was chosen for ClojureScript was because it's Java based, so it was straightforward to get working. Moving away from it now would be a huge breaking change, so it's unlikely to happen in the official compiler anytime soon or ever. I think the only way it would actually happen is if an alternative like Cherry got enough traction and people moved to using mainly the alternative.
Yeah nowadays I think non-ClojureScript people use it mostly for legacy reasons or the aggressive minification. Back in the day, aside from the pre-ES5 conveniences I mentioned surrounding inheritance and module bundling, it was also a way for developers to do some basic type enforcement (via JSDoc annotations that the Compiler would check). TypeScript essentially rendered that obsolete.
* Express is free and will take you a very long way.
* SSMS is great.
* T-SQL is great.
* Integration with .Net is great.
* It’s cross platform (I’ve only ever done Windows though).
* Windows auth is pretty sweet, no passwords in your configs/repos.
* It Just Works™ for real. You can have multiple instances on the same system, different versions and editions, and never worry about anything. Backup and restore are a breeze. Installation, uninstall, updates and upgrades are a breeze. Everything is a breeze. It’s unbelievable how little you need to worry about MSSQL instances.
> It’s cross platform (I’ve only ever done Windows though).
I've tried it on Linux and simply couldn't get it working. The Microsoft package manager repos are out of date or contain buggier versions of the software. I wanted all the other benefits you've listed, but ultimately Postgres has been easier for me.
The tooling, the JIT compiler, having the CLR available in the DB engine, enterprise features like OLAP, failover, cluster management, distributed transactions, packaged DB apps, integration with Active Directory, for starters.
Similar feature offerings like Oracle, DB2, and co.
I'd like to answer for myself (I'm the one that opened the issue and reposted here for some show-and-shame in case MS reconsiders and starts supporting the project):
We have a 10+ years old desktop project (.exe) in C# that uses MS SQL Server as a Database and we need to change it to be a proper web-app. We are heavy Django users and now we have stumbled upon a wall. Unfortunately because of the complexity of the project it's not feasible to change the DB.
That is basically never true any more, even in large government and large enterprise.
Microsoft has dialled up the pricing to match Oracle, which means that now everyone has to be so frugal with cores assigned to their DB servers that any software performance benefits are simply lost. Cheaper or open source database engines can be assigned 10x or even 100x the compute capacity at the same cost.
One “trick” Microsoft pulled was to quietly change per-core licensing to per-vCPU (hyper-thread) if you use SQL in the cloud. This means that it costs 2x as much as it used to on-prem.
Then they have the nerve to publish marketing about how you can “save money” by migrating to Azure.
In Microsoft Azure the HT-off feature has had a bunch of previews that all quietly disappeared without ever becoming generally available. I'm guessing management noticed that this capability would eat into Microsoft SQL Server (and Windows Server) licensing revenue.
Similarly, I've noticed that all of the managed Azure SQL products lag behind on the latest CPU generations by many years. "You can just scale up at your expense and our profit!" is the response when you read about this in the forums.
In those cases I tell them that I store everything in a file(sqlite) and IT can easily backup that file. If IT needs data access, its available in the application with csv/spreadsheet export.
I promise you, they will be super happy with that!
But you are not supposed to tell them that you use another SQL db, you use a file as it simplies things and saves money. For example, you do not need to expose anything over the network, you do not need to setup service account and password and data access is embedded in your application which improves latency.
And backup is a lot easier as you just create a daily dump from your application that writes to a backup folder and tell IT to backup that folder. People have been saving things to files for decades, and IT shouldn't worry about the data structure in that file.
This is not a lie, its about avoiding politics and fights. If they ask you to use MSSQL instead of a file, you politely ask them; why they want to overengineer and delay application development.
If you’re on a Mac or iOS you could try creating a Shortcut where you input Markdown, convert to rich text, then output as a PDF. I use Shortcuts regularly. It’s pretty easy to set up. I haven’t tried it on something as larger as 500 pages, though. YMMV
Anthropic uses a ton of TPU in addition to GPU, so presumably has the expertise to use both, and shift workloads as needed. Note that large scale TPU pretty much means Jax and not just "platform independent" flavor of Jax but Jax with TPU-specific optimizations.
Anthropic are the only (?) heavy users of Amazon's chips. Or maybe they aren't heavy users. It's hard to say, they use NVIDIA too. Amazon is a big investor.
Amazon's chips at this point are marketing for Amazon. I've seen the benchmarks, they're not quite ready for serious use yet. I suspect Anthropic got a good discount on GPUs in return for using Amazon's own chips in any possible capacity (or maybe just for the press release claiming such use). The only real alternative to NVIDIA on the inference side that you can actually buy hardware for is Intel Gaudi which costs less and performs rather well, but everyone seems to have written it off, along with Intel itself, and it's not available in any cloud last I checked. On the training side there's really no alternative at all - PyTorch is the de-facto standard, and while there is PyTorch XLA, it's even less popular than Jax, which is already like 20x less popular than PyTorch. Bottom line: capable Jax engineers able to optimize distributed Jax programs on TPUs are unobtainable unicorns for anyone but the top labs and Google itself. Note that the training side has significantly different requirements than inference side. Inference side is much simpler to optimize and wring the performance out of.
Yes I've been expecting AMD to eventually get inference working because it's so much simpler. Supposedly Meta do use some AMD for inference. It's sad that you can implement llama inference on the CPU in a few thousand lines of Java yet somehow AMD isn't cleaning up there.
“Decline I” is an instruction for the student to provide the first person pronoun in all cases: I (nominative), me (accusative/dative/ablative), my (genitive), mine (genitive substantive). (I have borrowed the case names from Latin, with which I am more familiar. I think the English cases are nominative, objective, possessive.)
I believe the misspellings in the spelling section are intentional so that the student will identify them—I am guessing that’s the point.
This case is about whose interpretation gets to fill in the gaps.
The statute (APA) requires courts to form an independent judgment about the gaps.
The Chevron doctrine required courts in certain cases to set this judgment aside in favor of an agency’s judgment—-basically on the basis that the agencies are closer to the problems and know better.
This setting aside may be the better outcome, however it is not explicitly specified in the statute (APA).
Ultimately, if Congress wants this to be the case, they /can/ amend the statute (APA), effectively enshrining the Chevron doctrine.
At the end of the day, the court’s decision here rests on statutory interpretation (not constitutional doctrine) so Congress could change the outcome by amending the statute (APA) to explicitly codify Chevron. This would be achieved with its ordinary legislative power (Article 1 Section 7 of the Constitution).
The court’s decision does effectively put the ball back in Congress’ court.
It struck me that Jepsen has identified clear situations leading to invariant violations but Datomic’s approach seems to have been purely to clarify their documentation. Does this essentially mean the Datomic team accepts that the violations will happen, but don’t care?
From the article:
> From Datomic’s point of view, the grant workload’s invariant violation is a matter of user error. Transaction functions do not execute atomically in sequence. Checking that a precondition holds in a transaction function is unsafe when some other operation in the transaction could invalidate that precondition!
As Jepsen confirmed, Datomic’s mechanisms for enforcing invariants work as designed. What does this mean practically for users? Consider the following transactional pseudo-data:
[
[Stu favorite-number 41]
;; maybe more stuff
[Stu favorite-number 42]
]
An operational reading of this data would be that early in the transaction I liked 41, and that later in the transaction I liked 42. Observers after the end of the transaction would hopefully see only that I liked 42, and we would have to worry about the conditions under which observers might see that 41.
This operational reading of intra-transaction semantics is typical of many databases, but it presumes the existence of multiple time points inside a transaction, which Datomic neither has nor wants — we quite like not worrying about what happened “in the middle of” a transaction. All facts in a transaction take place at the same point in time, so in Datomic this transaction states that I started liking both numbers simultaneously.
If you incorrectly read Datomic transactions as composed of multiple operations, you can of course find all kinds of “invariant anomalies”. Conversely, you can find “invariant anomalies” in SQL by incorrectly imposing Datomic’s model on SQL transactions. Such potential misreadings emphasize the need for good documentation. To that end, we have worked with Jepsen to enhance our documentation [1], tightening up casual language in the hopes of preventing misconceptions. We also added a tech note [2] addressing this particular misconception directly.
To build on this, Datomic includes a pre-commit conflict check that would prevent this particular example from committing at all: it detects that there are two incompatible assertions for the same entity/attribute pair, and rejects the transaction. We think this conflict check likely prevents many users from actually hitting this issue in production.
The issue we discuss in the report only occurs when the transaction expands to non-conflicting datoms--for instance:
[Stu favorite-number 41]
[Stu hates-all-numbers-and-has-no-favorite true]
These entity/attribute pairs are disjoint, so the conflict checker allows the transaction to commit, producing a record which is in a logically inconsistent state!
On the documentation front--Datomic users could be forgiven for thinking of the elements of transactions as "operations", since Datomic's docs called them both "operations" and "statements". ;-)
In order for user code to impose invariants over the entire transaction, it must have access to the entire transaction. Entity predicates have such access (they are passed the after db, which includes the pending transaction and all other transactions to boot). Transaction functions are unsuitable, as they have access only to the before db. [2]
Use entity predicates for arbitrary functional validations of the entire transaction.
Datomic transactions are not “operations to perform”, they are a set of novel facts to incorporate at a point in time.
Just like a git commit describes a set of modifications, do you or should you want to care about which order or how the adds, updates, and deletes occur in a single git commit? OMG no, that sounds awful.
The really unusual thing is that developers expect intra-transaction ordering to be a thing they accept from any other database. OMG, that sounds awful, how do you live like that.
Yeah, this basically boils down to "a potential pitfall, but consistent with documentation, and working as designed". Whether this actually matters depends on whether users are writing transaction functions which are intended to preserve some invariant, but would only do so if executed sequentially, rather than concurrently.
Datomic's position (and Datomic, please chime in here!) is that users simply do not write transaction functions like this very often. This is defensible: the docs did explicitly state that transaction functions observe the start-of-transaction state, not one another! On the other hand, there was also language in the docs that suggested transaction functions could be used to preserve invariants: "[txn fns] can atomically analyze and transform database values. You can use them to ensure atomic read-modify-update processing, and integrity constraints...". That language, combined with the fact that basically every other Serializable DB uses sequential intra-transaction semantics, is why I devoted so much attention to this issue in the report.
It's a complex question and I don't have a clear-cut answer! I'd love to hear what the general DB community and Datomic users in particular make of these semantics.
As a proponent of just such tools I would say also that "enough rope to shoot(?) yourself" is inherent in tools powerful enough to get anything done, and is not a tradeoff encountered only when reaching for high power or low ceremony.
It is worth noting here that Datomic's intra-transaction semantics are not a decision made in isolation, they emerge naturally from the information model.
Everything in a Datomic transaction happens atomically at a single point in time. Datomic transactions are totally ordered, and this ordering is visible via the time t shared by every datom in the transaction. These properties vastly simplify reasoning about time.
With this information model intermediate database states are inexpressible. Intermediate states cannot all have the same t, because they did not happen at the same time. And they cannot have different ts, as they are part the same transaction.
When we designed Datomic (circa 2010), we were concerned that many languages had better support for lists than for sets, in particular list literals and no set literals.
Clojure of course had set literals from the beginning...
An advantage of using lists is that tx data tends to be built up serially in code. Having to look at your tx data in a different (set) order would make proofreading alongside the code more difficult.
Yes. Perhaps this is a performance choice for DataScript since DataScript does not keep a complete transaction history the way Datomic does? I would guess this helps DataScript process transactions faster. There is a github issue about it here: https://github.com/tonsky/datascript/issues/366
I think the article answers your question at the end of section 3.1:
> "This behavior may be surprising, but it is generally consistent with Datomic’s documentation. Nubank does not intend to alter this behavior, and we do not consider it a bug."
When you say, "situations leading to invariant violations" -- that sounds like some kind of bug in Datomic, which this is not. One just has to understand how datomic processes transactions, and code accordingly.
I am unaffiliated with Nubank, but in my experience using Datomic as a general-purpose database, I have not encountered a situation where this was a problem.
This is good to hear! Nubank has also argued that in their extensive use of Datomic, this kind of issue doesn't really show up. They suggest custom transaction functions are infrequently written, not often composed, and don't usually perform the kind of precondition validation that would lead to this sort of mistake.
Yeah, I've used a transaction functions a few times but never had a case where two transaction functions within the same d/transaction ever interacted with each other. If I did encounter that case, I would probably just write one new transaction function to handle it.
Sounds similar to the need to know that in some relational databases, you need to SELECT ... FOR UPDATE if you intend to perform an update that depends on the values you just selected.
> things get complicated with virtual threads, they shouldn't be pooled, as they aren't a scarce resource
Why not pool virtual threads, though? I get that they’re not scarce, but if you’re looking to limit throughput anyway wouldn’t that be easier to achieve using a thread pool than semaphores?
(author here) From what I've read, other than documentation saying they shouldn't be pooled, is that by disign they are meant to run and then get garbage collected. There's also some overhead in managing the pool. If someone has a deeper understanding of virtual threads I'd love to know why in more detail.
As to why use a semaphore over a thread pool for this implementation? A thread pool couples throughput to the number of running threads. A semaphore lets me couple throughput to started tasks per second. I don't care how many threads are currently running, I care about how many requests I'm making per second. Does that make more sense?
Pooling virtual threads has no upside and potentially a bit of downside: 1. You hang on for unused objects for longer instead of returning them to the more general pool that is the GC; 2. You risk leaking context between multiple tasks sharing the thread which may have security implications. Because of these and similar downsides you should only ever pool objects that give you benefit when they're shared -- e.g. they're expensive to create -- and shouldn't pool objects otherwise.
Thank you! You incur this risk when pooling any kind of thread, too, but with platform threads at least pooling makes sense because they're costly, so you just need to be careful with thread locals on a shared thread pool. Not needing to share threads and potentially leak context is a security advantage of virtual threads.
Aren't "virtual threads" built on a thread pool themselves? I suppose there would be no advantage in pooling an already pooled resource since presumably the runtime would manage pooling better than user code.