Why does Mill use Scala?

toprerules · 2025-02-13T21:00:59 1739480459

The best config system I've ever seen used plain old Python to generate static configs. Everyone knows Python. Python is easy to do data munging in, as demonstrated by it's popularity as the #1 data science tool. There's boundless libraries to make Python more functional, use stricter typing, or reduce the amount of side effects it can cause. Even Starlark is just a dialect of Python.

You can spend decades building a complicated configuration language, use a bespoke functional language as Mill does, but if you're a single company that can enforce code quality and just wants to get the job done, I feel like everything else is just unnecessary and over-engineered to scratch some academic itch for a "better system" that enforces "purity" at the cost of velocity.

I also think that now that LLMs are on the rage, how much context do you think they have for bespoke config language vs Scala vs Python? I think we know the answer to that one.

IshKebab · 2025-02-13T22:48:12 1739486892

Python is a terrible choice for that sort of thing. Who really wants to have to set up a venv and deal with pip nonsense just to write a config file? Hell even installing Python is sometimes difficult.

have-a-break · 2025-02-14T06:18:20 1739513900

You could say the same thing about setting up a JVM...

pjmlp · 2025-02-14T08:43:26 1739522606

JVM doesn't have everything all over the place, JAVA_HOME and PATH suffices.

It doesn't require reaching out to tons of C libraries when performance is called for, and there are at least two free beer options to AOT compile to native code, if required.

IshKebab · 2025-02-14T07:18:37 1739517517

I definitely would!

fastasucan · 2025-02-15T12:35:03 1739622903

You don’t need any of that now that we have uv (https://github.com/astral-sh/uv)

IshKebab · 2025-02-15T15:01:48 1739631708

Not really. Uv is great but it doesn't really help here. If an app uses Python as its configuration file format then the app is running Python. It's going to do `python3 config.py` or similar. It doesn't know anything about uv.

So you would still need to create a uv project, run `uv sync` and `uv activate` or whatever and then run your app. Not practical.

The only option if you use Python as a config file format is to stick to old features (Python 3.6) and not use any third party libraries. But op was saying third party libraries are one of the benefits of using Python...

MathMonkeyMan · 2025-02-13T23:01:36 1739487696

Stick to the standard library as of an oldish version of python (3.6?) and it's pretty much zero-install zero-config.

threeseed · 2025-02-13T23:45:48 1739490348

On a Mac, Python has always been a challenge.

Up until recently Apple only included Python2 and so developers used Homebrew to install Python3. Now it’s very common to find two versions of Python3 installed on a Mac developer’s laptop that conflicts with each other.

You really want to be using virtualenv.

tom_ · 2025-02-14T00:15:29 1739492129

Hence the advice to stick to the standard library - because then it doesn't matter all that much. I'm not sure what the full set of environments I've tested my Python 3 scripts actually is, but they've run OK on all the various cloud CI systems I've tried, plus my laptops, VMs, and work PCs.

iforgot22 · 2025-02-14T00:22:08 1739492528

All a Mac user has needed to do was install from https://www.python.org/downloads/ and then run python3 in the shell. Even if you use MacPorts or brew or conda or whatever, there's a distinct command to run Python 3 instead of 2.

I get that Python's package manager situation is terrible, but like the other user said, you only need built-in packages to spit out a config json or whatever.

threeseed · 2025-02-14T02:43:35 1739501015

So you would then you end up with three Python3 installations.

And if you install from the website it doesn’t override the path. So will still be using the Apple or Homebrew one.

iforgot22 · 2025-02-15T01:04:52 1739581492

If you install it 3 times then yeah, but even then, all 3 of them will still work.

But I could've sworn the python.org installer set the PATH. If not, that's kinda annoying.

iforgot22 · 2025-02-14T00:31:17 1739493077

If only Python had the equivalent of npm.

aiiizzz · 2025-02-16T14:30:18 1739716218

Thought that was pdm. Never saw it used so far.

MathMonkeyMan · 2025-02-13T21:12:00 1739481120

This does work well. A team I was on at a past job did exactly this. On Unix the service literally ran `std::system("python config.py >config.json")` on startup.

The problem with this is that the answer to the question "what kind of configuration can I expect?" is "simulate the script and find out."

If the script is written well, and is short, then the parameters that are filled in by the runtime environment are apparent. Over time, though, there is a risk that the script will not remain written well, and it almost certainly won't remain short.

siriusfeynman · 2025-02-14T00:20:58 1739492458

An approach I use is splitting my config tools into 2 stages

Stage 1 creates a "explicit" config that can be exported to plaintext that contains exactly what is going to be created/modified with no abstraction/simplification

Stage 2 applies the "explicit" config

You get to be as clever as you want in stage 1 to avoid excessive copy pasting or not being able to know what your tool is going to do because all you have to go on is some homegrown DSL

iforgot22 · 2025-02-14T00:23:19 1739492599

You run into the same problem with config DSLs, except now you're dealing with a DSL. Config is almost never going to be static.

MathMonkeyMan · 2025-02-14T01:24:56 1739496296

True. One advantage I can imagine for a DSL is that it constrains what is possible and optimizes (syntactically) what it's supposed to be for. I think that the author of Nix justified its language that way.

The counterargument is "eventually you'll need every facility provided by a programming language, so just start with a programming language."

I'm not sure how I feel about it. The YAML templating situation in Kubernetes is a [shit show][1]. Then again, I did once cave into the temptation of writing a [lisp-like XML preprocessor][2] to make my configurations less verbose. It doesn't have any access to the environment, though, so it's not a general purpose configuration language, just a shorthand for static XML.

[1]: https://www.davidgoffredo.com/no-string-templates

[2]: https://github.com/dgoffredo/llama

iforgot22 · 2025-02-15T01:12:33 1739581953

What constraints are needed? I've used DSLs that are almost Python but not quite, I think because they were hermetic and deterministic. Even those ended up being produced dynamically using some higher level config DSL or just regular code. Like once you're doing RPCs, it's general programming language territory (though there are also DSLs that do this, which is cursed).

And yes, I have very bad memories of Kubernetes YAML, also YAML itself.

lmm · 2025-02-13T23:38:50 1739489930

Scala is hardly some obscure bespoke language. It's a top-20, maybe top-10 programming language, that's been around for 20+ years (and had far fewer breaking changes than Python over that time). Most Python translates directly into Scala, but with the benefit of a proper sound type system and full IDE support. And it's a great language for data munging.

makeitdouble · 2025-02-14T00:40:11 1739493611

The claim sounded outlandish, but Scala looks indeed to be around the top 10~20 languages in hiring for instance:

https://www.devjobsscanner.com/blog/top-8-most-demanded-prog...

Scala is only in 0.5% of the scanned job offerings, and is far far behind the major languages in numbers, but I was surprised there's more demand than Rust or even Perl to be honest.

iforgot22 · 2025-02-14T00:56:24 1739494584

I'm not surprised it's above Rust and Perl, but it's below Dart?! Ouch.

camdenreslink · 2025-02-14T01:05:12 1739495112

Might as well just list it as Flutter (Dart).

bdangubic · 2025-02-13T23:50:22 1739490622

this “top 10” your LLM hallucinating? :)

threeseed · 2025-02-13T22:19:36 1739485176

> Everyone knows Python

No they don’t. Just like everyone doesn’t know Cobol, Fortran, Scala etc.

But by having a programming language as your build tool you now make it harder for new people to onboard. As in order to build project they often need to some unique, specific to the language syntax. And in order to find this syntax they look around on Github and because it’s a programming language every project has their own unique, specific to the project approach.

Versus something like Cargo.toml where it’s simple and consistent regardless of which project you look at.

emidln · 2025-02-13T22:41:32 1739486492

> No they don’t. Just like everyone doesn’t know Cobol, Fortran, Scala etc.

Sure somebody might not have Python experience, but it's pretty easy to just not hire someone who says they don't know Python and isn't willing to learn for the role. I don't know that you'd filter out many candidates out of any random 100 devs.

threeseed · 2025-02-13T23:34:59 1739489699

I am talking about graduates and others new to programming.

Of course they are willing to learn for the role but making it hard for them in the beginning can forever turn them off a language. That has been a big problem with Scala and Spark.

morkalork · 2025-02-13T23:41:48 1739490108

I've never seen a language used for ancillary purposes be the make or break on hiring for a role, it's always just been expected that you'd pick it up as you go. And IMO, python is the least offensive compared to stuff like Perl, Ruby (for Chef) or whatever the heck Terraform is.

threeseed · 2025-02-13T23:56:07 1739490967

I have onboarded dozens of Data Engineering graduates in using Spark.

In the beginning this was with Scala and every single one struggled with SBT.

Giving developers unlimited flexibility in how they create build files is a bad idea.

iforgot22 · 2025-02-14T00:59:43 1739494783

Can they use pyspark?

LorenzoGood · 2025-02-14T00:33:10 1739493190

lmm · 2025-02-13T23:40:04 1739490004

> Sure somebody might not have Python experience, but it's pretty easy to just not hire someone who says they don't know Python and isn't willing to learn for the role.

This works just as well for Scala.

iforgot22 · 2025-02-14T00:24:54 1739492694

There are way more people who know Python than Scala, and it's also an easier language to get started with.

iforgot22 · 2025-02-14T00:33:35 1739493215

So then they need to know toml (Tom's Obvious Minimal Language)? https://github.com/gtk-rs/examples/blob/master/Cargo.toml I don't know what this file says.

wocram · 2025-02-13T22:41:01 1739486461

I think it's hard to argue that Cargo.toml is any simpler than Python. Json might be ubiquitous enough for anyone to read and understand, but if Python is foreign than toml is no better.

aidenn0 · 2025-02-13T22:40:42 1739486442

I don't know about Python specifically, but using a language I'm familiar with to generate ninja files (+ any header/environment/&c) for the build has become my go-to way of doing builds in the past 18 months or so.

manoDev · 2025-02-14T00:39:03 1739493543

> I also think that now that LLMs are on the rage, how much context do you think they have for bespoke config language vs Scala vs Python? I think we know the answer to that one.

Nothing against Python, but of all the reasons to choose a technology, whatever is more represented on the dataset of some LLM is the worst reason.

This is a death spiral. There's no hope for the future of this industry if newcomers are thinking like this.

iforgot22 · 2025-02-14T00:42:51 1739493771

I cared about programming languages when I was a newcomer. Stopped caring about 10 years ago. They're just tools, each with their own gotchas and different design choices I couldn't care less about. Between two tools that both work ok, I will definitely pick whichever one my team and I can learn the easiest, and that includes LLM coverage.

asalahli · 2025-02-15T21:44:40 1739655880

> There's boundless libraries to make Python more functional, use stricter typing, or reduce the amount of side effects it can cause.

What are some examples of a library that can limit or prevent side effects of a piece of python code? I could use one right now.

eptcyka · 2025-02-13T22:34:29 1739486069

Ah yes, the age old belief that all software is complex enough that one must first run some other bespoke turing complete program to build every single piece of software.

And of all the languages to pick for this, python, with it's non-hermetic execution environment is bound to bite you in the ass, once your buildscripts start depending on libraries. Oh, you could use poetry to solve the library issue with python, or maybe it'll be setuptools, pip or whatever is the flavour of the month in python packaging.

After fighting with Nix for a sufficiently long time, I think most language specific build tools are not neccesarily the best solution to the problem of automating a build for bit of software written in language X. Complex projects will eventually evolve to depend on multiple languages (unless you're the Linux kernel), at which point the specialized language build tools turn into cumbersome barriers in the build process, where different build tools are not aware of the caching, conventions and configurations of any other tool. As such, in an ideal world, any new language would come with a compiler or bundler that can be supported well by higher level build/packaging tools. And bespoke python scripts ain't that.

koito17 · 2025-02-13T21:00:12 1739480412

The article fails to mention whether Scala can ensure code is deterministic and hermetic. Starlark code is deterministic and hermetic, but the article never mentions this. Unfortunately, Starlark does not have static typing, but I think types would make Starlark one of the best languages today for build configuration.

In the Clojure community, there was a huge push for "builds are programs". I somewhat agree with this assertion, but I also think "one should restrict the class of programs a build belongs to". Neither Clojure nor Scala, compared to Starlark, seem to offer a way to ensure builds belong to a deterministic subset of programs.

Thus I am still wondering "why Scala?". I have never used Scala, but reading this whole article gives me the impression that Mill is the Scala equivalent of Clojure's tools.build. That is not what I would want in a build system.

fmbb · 2025-02-13T21:26:38 1739481998

What is special in starlark that makes it hermetic and deterministic?

Scala is deterministic.

If you call functions that have side effects and nondeterministic behavior you can fall outside the comforts of determinism. But you can stumble upon library functions someone wrote in Starlark that accidentally put you there as well.

The Starlark homepage says

> Hermetic execution - Execution cannot access the file system, network, system clock. It is safe to execute untrusted code.

But the last time I wrote Starlark it was to define build targets in Basel. And executing the builds definitely accessed my file system and the network, otherwise builds would have no results.

thirtyseven · 2025-02-14T00:47:10 1739494030

Bazel splits the build into multiple phases. Starlark only comes into play in the first two, load and analysis. During these phases, Starlark code doesn't have access to the filesystem, except in a few very limited cases like using the glob() function to expand a wildcard to a list of source files. Furthermore, it only generates an abstract graph of build actions. The Bazel engine is responsible for executing this graph in later stages, which might result in non-hermetic things happening but usually not.

Starlark has intentionally limited functionality such as lacking Turing completeness or global variables. This provides guarantees that it can be executed in parallel and will have a finite runtime.

cbeach · 2025-02-13T22:04:53 1739484293

Bazel enforces a hermetic sandbox for Starlark to operate within.

All input files have to be declared, and the build process can only see files that are declared.

Starlark cannot access arbitrary files at runtime, and it deliberately has no APIs for things like system time or random number generation, or global state.

Scala, on the other hand, has no such restrictions. As much as I love Scala, I think it’s an odd choice for a pure, deterministic system. Although perhaps if you use Scala to build a DSL (an area where Scala shines) you could engineer a pure functional sandbox within Scala.

fmbb · 2025-02-13T23:40:01 1739490001

Do you only declare the file names, or also file content and owner and group and everything up front?

Sure it deliberately has a bunch of restrictions in its base form. But in order to use it for anything you have to write custom actions or download and execute custom actions others wrote. And this will mutate your file system and run arbitrary executables.

recursivecaveat · 2025-02-14T00:45:29 1739493929

Input and output files are tracked determistically via content hashing. It cannot reach outside of its sandbox and touch the network or actual filesystem. Those are all (generally pinned) inputs to the system. For a given version of a script and a tuple of input content hashes, you get the exact same output files every time. No way to accidentally iterate over a hashmap, embed a timestamp, or leave the filesystem in a weird state because you were interrupted.

winwang · 2025-02-14T00:26:22 1739492782

As someone who worked a bit with Bazel actions and uses Scala: I think Bazel cuts down on the area where arbitrary I/O happens. Also, many executables come from pre-defined targets (e.g. java toolchains). One way to deal with I/O is if Mill injected its own implementation of a JVM filesystem/network layer, which I'm pretty sure it can do.

michaelmior · 2025-02-13T22:14:04 1739484844

> Scala is deterministic.

I don't see how Scala is any more deterministic than any other language.

fmbb · 2025-02-13T23:32:42 1739489562

That’s what I’m saying. About Starlark.

fire_lake · 2025-02-14T07:53:19 1739519599

Starlark does not expose sources of randomness like date.now()

Capricorn2481 · 2025-02-15T06:38:06 1739601486

If you're deliberately using that to prove Scala doesn't work for config than sure

fire_lake · 2025-02-22T08:27:06 1740212826

It’s about guarantees vs eternal vigilance.

Kwpolska · 2025-02-13T22:31:19 1739485879

Because it’s a Scala project, written by a Scala fan, simple as that. No need to come up with extra justification.

kunley · 2025-02-13T22:21:48 1739485308

"Mill is a fast, scalable, multi-language build tool that supports Java, Scala, Kotlin, and Python".

So, while I understand this tool can resonate in the JVM world, I have no idea why one would want to pull Java into their toolset in order to build Python.

lamp_book · 2025-02-14T11:43:43 1739533423

I don’t know who is using mill but I’d guess Scala people and Scala frequently implies Spark which in turn implies PySpark. For instance I could see it being useful for a team building a source connector which will be Scala-centric but need some PySpark additions as well. Even without Spark, Scala is frequently used for data centric applications and those kinds of teams would naturally be working with python as well.

wavemode · 2025-02-13T22:42:07 1739486527

What does it even mean to "build" Python?

pletnes · 2025-02-14T07:22:39 1739517759

I imagine creating a wheel file with your code, metadata and resources? Also recall that a lot of python packages have dependencies written in C, Fortran, Rust etc.

Also, many tools exist to create executable programs - basically bundling the python interpreter with some .py files, etc.

tacticus · 2025-02-13T23:28:48 1739489328

shouting at pip inconsistencies

kunley · 2025-02-14T11:42:57 1739533377

Yeah had the same thought actually

vander_elst · 2025-02-13T21:49:41 1739483381

It seems this is mainly a scala project and then they are using scala also for the configuration, it probably makes sense for them.

rpcope1 · 2025-02-13T23:07:07 1739488027

One interesting thing is that there are comparisons to Maven and Gradle but not sbt. Do people just not use sbt anymore or is it omitted because it's also Scala and/or prone to becoming a mess?

ezst · 2025-02-15T11:37:13 1739619433

https://mill-build.org/mill/comparisons/sbt.html

videogreg93 · 2025-02-13T23:27:51 1739489271

I used sbt in my previous job and didn't hate it. All I want from build systems is to get out of my way and it let us do that pretty well. Very simple to add new tasks as well.

openplatypus · 2025-02-14T00:09:01 1739491741

We use it.

It is actively developed.

It doesn't get in the way.

It does what it says on the tin.

ATMLOTTOBEER · 2025-02-14T16:37:26 1739551046

It’s slow, and the task/setting key resolution makes no sense unless you waste several hours reading sbt docs (time I will never get back). I’m not saying dump sbt for mill but u gotta admit sbt kinda sucks. Afaik this is also the conclusion most larger orgs that use scala come to when they start calling scalac from bazel or similar.

gdgghhhhh · 2025-02-13T23:05:01 1739487901

The title really confused me until I realized this has nothing to do with https://millcomputing.com/ :-)

pabs3 · 2025-02-14T04:07:28 1739506048

Wonder when Mill Computing will become more publicly active and have hardware available.

waste_monk · 2025-02-14T07:01:59 1739516519

It seemed quite promising but from the outside momentum appears to be almost completely stalled, with only a handful of posts per year on the forum.

I'd be curious to know if there's progress being made behind the scenes.

pabs3 · 2025-02-15T00:37:30 1739579850

From forum threads in the last year or two it sounds like they have gotten quite far and only need investment to progress further.

https://millcomputing.com/topic/any-plans-for-2024/ https://millcomputing.com/topic/yearly-ping-and-see-how-thin...

Edit: posted a HN thread about getting investors for Mill:

https://news.ycombinator.com/item?id=43054697

ncgl · 2025-02-16T13:22:26 1739712146

The first thing any self respecting python dev is going to do on a new repo is implement typechecking. And the second thing they're going to do is complain about pip/pyenv.

These days I would trade my python experience for scala, even knowing it'd mean less job prospects. We make a lot of excuses for python.

Lyngbakr · 2025-02-13T23:45:53 1739490353

I'm surprised that Lua wasn't included in the discussion. I'm not saying they necessarily should have chosen to use it, but it's a notable omission given its popularity as a config language.

rubenvanwyk · 2025-02-15T06:41:29 1739601689

A lot of these reasons also apply to Kotlin and Kotlin is arguably simpler than Scala, why not just use Kotlin? It is much more widely adopted than Scala.

agentultra · 2025-02-13T21:49:31 1739483371

I'm not big on programming languages being used for configuration. It adds a lot of complexity and maintenance burden. Configurations are often read and your reader isn't a compiler but now they have to imagine what the final configuration state will be after evaluating the "program" that generates the configuration. I think plain old configuration languages are better, even if verbose, since the usual text-based tooling works quite well for managing, searching, etc.

I use nix a lot and the main thing that bothers me about it is the language. People are quick to get clever with it. It becomes a morass of code that is difficult to read for anyone but experts and when it breaks and your not that expert... good luck fixing it.

winwang · 2025-02-14T00:39:05 1739493545

Unfortunately, I find that real-world configs don't quite conform to easy understandability past tutorial examples, i.e. k8s and yaml.