The best config system I've ever seen used plain old Python to generate static configs. Everyone knows Python. Python is easy to do data munging in, as demonstrated by it's popularity as the #1 data science tool. There's boundless libraries to make Python more functional, use stricter typing, or reduce the amount of side effects it can cause. Even Starlark is just a dialect of Python.
You can spend decades building a complicated configuration language, use a bespoke functional language as Mill does, but if you're a single company that can enforce code quality and just wants to get the job done, I feel like everything else is just unnecessary and over-engineered to scratch some academic itch for a "better system" that enforces "purity" at the cost of velocity.
I also think that now that LLMs are on the rage, how much context do you think they have for bespoke config language vs Scala vs Python? I think we know the answer to that one.
Python is a terrible choice for that sort of thing. Who really wants to have to set up a venv and deal with pip nonsense just to write a config file? Hell even installing Python is sometimes difficult.
JVM doesn't have everything all over the place, JAVA_HOME and PATH suffices.
It doesn't require reaching out to tons of C libraries when performance is called for, and there are at least two free beer options to AOT compile to native code, if required.
Not really. Uv is great but it doesn't really help here. If an app uses Python as its configuration file format then the app is running Python. It's going to do `python3 config.py` or similar. It doesn't know anything about uv.
So you would still need to create a uv project, run `uv sync` and `uv activate` or whatever and then run your app. Not practical.
The only option if you use Python as a config file format is to stick to old features (Python 3.6) and not use any third party libraries. But op was saying third party libraries are one of the benefits of using Python...
Up until recently Apple only included Python2 and so developers used Homebrew to install Python3. Now it’s very common to find two versions of Python3 installed on a Mac developer’s laptop that conflicts with each other.
Hence the advice to stick to the standard library - because then it doesn't matter all that much. I'm not sure what the full set of environments I've tested my Python 3 scripts actually is, but they've run OK on all the various cloud CI systems I've tried, plus my laptops, VMs, and work PCs.
All a Mac user has needed to do was install from https://www.python.org/downloads/ and then run python3 in the shell. Even if you use MacPorts or brew or conda or whatever, there's a distinct command to run Python 3 instead of 2.
I get that Python's package manager situation is terrible, but like the other user said, you only need built-in packages to spit out a config json or whatever.
This does work well. A team I was on at a past job did exactly this. On Unix the service literally ran `std::system("python config.py >config.json")` on startup.
The problem with this is that the answer to the question "what kind of configuration can I expect?" is "simulate the script and find out."
If the script is written well, and is short, then the parameters that are filled in by the runtime environment are apparent. Over time, though, there is a risk that the script will not remain written well, and it almost certainly won't remain short.
An approach I use is splitting my config tools into 2 stages
Stage 1 creates a "explicit" config that can be exported to plaintext that contains exactly what is going to be created/modified with no abstraction/simplification
Stage 2 applies the "explicit" config
You get to be as clever as you want in stage 1 to avoid excessive copy pasting or not being able to know what your tool is going to do because all you have to go on is some homegrown DSL
True. One advantage I can imagine for a DSL is that it constrains what is possible and optimizes (syntactically) what it's supposed to be for. I think that the author of Nix justified its language that way.
The counterargument is "eventually you'll need every facility provided by a programming language, so just start with a programming language."
I'm not sure how I feel about it. The YAML templating situation in Kubernetes is a [shit show][1]. Then again, I did once cave into the temptation of writing a [lisp-like XML preprocessor][2] to make my configurations less verbose. It doesn't have any access to the environment, though, so it's not a general purpose configuration language, just a shorthand for static XML.
What constraints are needed? I've used DSLs that are almost Python but not quite, I think because they were hermetic and deterministic. Even those ended up being produced dynamically using some higher level config DSL or just regular code. Like once you're doing RPCs, it's general programming language territory (though there are also DSLs that do this, which is cursed).
And yes, I have very bad memories of Kubernetes YAML, also YAML itself.
Scala is hardly some obscure bespoke language. It's a top-20, maybe top-10 programming language, that's been around for 20+ years (and had far fewer breaking changes than Python over that time). Most Python translates directly into Scala, but with the benefit of a proper sound type system and full IDE support. And it's a great language for data munging.
Scala is only in 0.5% of the scanned job offerings, and is far far behind the major languages in numbers, but I was surprised there's more demand than Rust or even Perl to be honest.
No they don’t. Just like everyone doesn’t know Cobol, Fortran, Scala etc.
But by having a programming language as your build tool you now make it harder for new people to onboard. As in order to build project they often need to some unique, specific to the language syntax. And in order to find this syntax they look around on Github and because it’s a programming language every project has their own unique, specific to the project approach.
Versus something like Cargo.toml where it’s simple and consistent regardless of which project you look at.
> No they don’t. Just like everyone doesn’t know Cobol, Fortran, Scala etc.
Sure somebody might not have Python experience, but it's pretty easy to just not hire someone who says they don't know Python and isn't willing to learn for the role. I don't know that you'd filter out many candidates out of any random 100 devs.
I am talking about graduates and others new to programming.
Of course they are willing to learn for the role but making it hard for them in the beginning can forever turn them off a language. That has been a big problem with Scala and Spark.
I've never seen a language used for ancillary purposes be the make or break on hiring for a role, it's always just been expected that you'd pick it up as you go. And IMO, python is the least offensive compared to stuff like Perl, Ruby (for Chef) or whatever the heck Terraform is.
> Sure somebody might not have Python experience, but it's pretty easy to just not hire someone who says they don't know Python and isn't willing to learn for the role.
I think it's hard to argue that Cargo.toml is any simpler than Python. Json might be ubiquitous enough for anyone to read and understand, but if Python is foreign than toml is no better.
I don't know about Python specifically, but using a language I'm familiar with to generate ninja files (+ any header/environment/&c) for the build has become my go-to way of doing builds in the past 18 months or so.
> I also think that now that LLMs are on the rage, how much context do you think they have for bespoke config language vs Scala vs Python? I think we know the answer to that one.
Nothing against Python, but of all the reasons to choose a technology, whatever is more represented on the dataset of some LLM is the worst reason.
This is a death spiral. There's no hope for the future of this industry if newcomers are thinking like this.
I cared about programming languages when I was a newcomer. Stopped caring about 10 years ago. They're just tools, each with their own gotchas and different design choices I couldn't care less about. Between two tools that both work ok, I will definitely pick whichever one my team and I can learn the easiest, and that includes LLM coverage.
Ah yes, the age old belief that all software is complex enough that one must first run some other bespoke turing complete program to build every single piece of software.
And of all the languages to pick for this, python, with it's non-hermetic execution environment is bound to bite you in the ass, once your buildscripts start depending on libraries. Oh, you could use poetry to solve the library issue with python, or maybe it'll be setuptools, pip or whatever is the flavour of the month in python packaging.
After fighting with Nix for a sufficiently long time, I think most language specific build tools are not neccesarily the best solution to the problem of automating a build for bit of software written in language X. Complex projects will eventually evolve to depend on multiple languages (unless you're the Linux kernel), at which point the specialized language build tools turn into cumbersome barriers in the build process, where different build tools are not aware of the caching, conventions and configurations of any other tool. As such, in an ideal world, any new language would come with a compiler or bundler that can be supported well by higher level build/packaging tools. And bespoke python scripts ain't that.
The article fails to mention whether Scala can ensure code is deterministic and hermetic. Starlark code is deterministic and hermetic, but the article never mentions this. Unfortunately, Starlark does not have static typing, but I think types would make Starlark one of the best languages today for build configuration.
In the Clojure community, there was a huge push for "builds are programs". I somewhat agree with this assertion, but I also think "one should restrict the class of programs a build belongs to". Neither Clojure nor Scala, compared to Starlark, seem to offer a way to ensure builds belong to a deterministic subset of programs.
Thus I am still wondering "why Scala?". I have never used Scala, but reading this whole article gives me the impression that Mill is the Scala equivalent of Clojure's tools.build. That is not what I would want in a build system.
What is special in starlark that makes it hermetic and deterministic?
Scala is deterministic.
If you call functions that have side effects and nondeterministic behavior you can fall outside the comforts of determinism. But you can stumble upon library functions someone wrote in Starlark that accidentally put you there as well.
The Starlark homepage says
> Hermetic execution - Execution cannot access the file system, network, system clock. It is safe to execute untrusted code.
But the last time I wrote Starlark it was to define build targets in Basel. And executing the builds definitely accessed my file system and the network, otherwise builds would have no results.
Bazel splits the build into multiple phases. Starlark only comes into play in the first two, load and analysis. During these phases, Starlark code doesn't have access to the filesystem, except in a few very limited cases like using the glob() function to expand a wildcard to a list of source files. Furthermore, it only generates an abstract graph of build actions. The Bazel engine is responsible for executing this graph in later stages, which might result in non-hermetic things happening but usually not.
Starlark has intentionally limited functionality such as lacking Turing completeness or global variables. This provides guarantees that it can be executed in parallel and will have a finite runtime.
Bazel enforces a hermetic sandbox for Starlark to operate within.
All input files have to be declared, and the build process can only see files that are declared.
Starlark cannot access arbitrary files at runtime, and it deliberately has no APIs for things like system time or random number generation, or global state.
Scala, on the other hand, has no such restrictions. As much as I love Scala, I think it’s an odd choice for a pure, deterministic system. Although perhaps if you use Scala to build a DSL (an area where Scala shines) you could engineer a pure functional sandbox within Scala.
Do you only declare the file names, or also file content and owner and group and everything up front?
Sure it deliberately has a bunch of restrictions in its base form. But in order to use it for anything you have to write custom actions or download and execute custom actions others wrote. And this will mutate your file system and run arbitrary executables.
Input and output files are tracked determistically via content hashing. It cannot reach outside of its sandbox and touch the network or actual filesystem. Those are all (generally pinned) inputs to the system. For a given version of a script and a tuple of input content hashes, you get the exact same output files every time. No way to accidentally iterate over a hashmap, embed a timestamp, or leave the filesystem in a weird state because you were interrupted.
As someone who worked a bit with Bazel actions and uses Scala: I think Bazel cuts down on the area where arbitrary I/O happens. Also, many executables come from pre-defined targets (e.g. java toolchains).
One way to deal with I/O is if Mill injected its own implementation of a JVM filesystem/network layer, which I'm pretty sure it can do.
"Mill is a fast, scalable, multi-language build tool that supports Java, Scala, Kotlin, and Python".
So, while I understand this tool can resonate in the JVM world, I have no idea why one would want to pull Java into their toolset in order to build Python.
I don’t know who is using mill but I’d guess Scala people and Scala frequently implies Spark which in turn implies PySpark. For instance I could see it being useful for a team building a source connector which will be Scala-centric but need some PySpark additions as well. Even without Spark, Scala is frequently used for data centric applications and those kinds of teams would naturally be working with python as well.
I imagine creating a wheel file with your code, metadata and resources? Also recall that a lot of python packages have dependencies written in C, Fortran, Rust etc.
Also, many tools exist to create executable programs - basically bundling the python interpreter with some .py files, etc.
One interesting thing is that there are comparisons to Maven and Gradle but not sbt. Do people just not use sbt anymore or is it omitted because it's also Scala and/or prone to becoming a mess?
I used sbt in my previous job and didn't hate it. All I want from build systems is to get out of my way and it let us do that pretty well. Very simple to add new tasks as well.
It’s slow, and the task/setting key resolution makes no sense unless you waste several hours reading sbt docs (time I will never get back). I’m not saying dump sbt for mill but u gotta admit sbt kinda sucks. Afaik this is also the conclusion most larger orgs that use scala come to when they start calling scalac from bazel or similar.
The first thing any self respecting python dev is going to do on a new repo is implement typechecking. And the second thing they're going to do is complain about pip/pyenv.
These days I would trade my python experience for scala, even knowing it'd mean less job prospects. We make a lot of excuses for python.
I'm surprised that Lua wasn't included in the discussion. I'm not saying they necessarily should have chosen to use it, but it's a notable omission given its popularity as a config language.
A lot of these reasons also apply to Kotlin and Kotlin is arguably simpler than Scala, why not just use Kotlin? It is much more widely adopted than Scala.
I'm not big on programming languages being used for configuration. It adds a lot of complexity and maintenance burden. Configurations are often read and your reader isn't a compiler but now they have to imagine what the final configuration state will be after evaluating the "program" that generates the configuration. I think plain old configuration languages are better, even if verbose, since the usual text-based tooling works quite well for managing, searching, etc.
I use nix a lot and the main thing that bothers me about it is the language. People are quick to get clever with it. It becomes a morass of code that is difficult to read for anyone but experts and when it breaks and your not that expert... good luck fixing it.
You can spend decades building a complicated configuration language, use a bespoke functional language as Mill does, but if you're a single company that can enforce code quality and just wants to get the job done, I feel like everything else is just unnecessary and over-engineered to scratch some academic itch for a "better system" that enforces "purity" at the cost of velocity.
I also think that now that LLMs are on the rage, how much context do you think they have for bespoke config language vs Scala vs Python? I think we know the answer to that one.