The authors implemented a stack-based virtual machine for their benchmarks, but as they briefly alluded a register-based virtual machine should perform better. With a register-based VM there is no excessive stack fiddling and fewer instructions to perform the same task. Consider the instructions for adding two integers and storing the result in a variable. In a stack machine this requires four instructions: 'push int', 'push int', 'add', and 'pop to variable' but in a register machine it could be done with a single instruction: 'add $r1, $r2, $r3'.
For further reading, there is an excellent paper [1] by the authors of Lua discussing the practical benefits of implementing the language using a register machine.
I am experimenting with runtime generation of shared libs.
We have a database of rules in the form 'A OR (B AND C)' stored just like that in human readable form. The current implementation creates an AST of the expression and executes it via the visitor pattern. The problem with this is it's slow and there can be recursion problems if the expression is too complex.
I've shown it's actually quite easy to translate these expressions to C code, and then compile that to machine code. My only real concern is around loading and unloading shared libs at runtime. These expressions are updated every hour or so.
It is possible to strip the .text section from a binary and then mmap() it to executable memory, which has the advantage of not having to deal with reference counts in shared libs. The disadvantage is that the generated C code must just be code, no state of its own.
> I am experimenting with runtime generation of shared libs.
Sure, you can produce an executable or a shared library at runtime; after all, that is what compilers and linkers do, those are also just programs, aren't they :). But do you really need to do that? As others said, you can also dlopen and rename your symbols, put them in an array and what not, but your problem seems quite simple, unless you need to transport that library somewhere or keep it for later use, I don't see reason why would you bother with generating a file just to load it after into your program again?
> We have a database of rules in the form 'A OR (B AND C)' stored just like that in human readable form.
That sounds like you could construct a "bottom-up" parser, i.e. an operator precedence parser, and just evaluate that directly, or emit the assembly code directly to a page memory and mark that as an executable page? Unless you are doing some optimizations, do you really need an explicit AST?
Anyway, if your AST is slow, it may depend of course on the size, but also on how you create your AST in the memory and how you use it. If you are doing some linked structure with pointers pointing all over the memory and accessing it randomly you can be trashing your cache which should be slow. But if you put it in some vector and can access it sequentially, it might help performance. I don't know, just thinking loud, no idea what you are really doing.
I suggest look at a good Lisp compiler like SBCL or CCL. They generate code at runtime which is than mark as executable and call that code. They can also save themselves into an executable (or a shared library in the case of sbcl). Writing a read-macro that transforms your human readable strings into compiled Lisp functions would be trivial exercise in Common Lisp. Perhaps you should try to solve your problem with that tool instead of C++? If you are going to generate code dynamically every now and than as you describe, than pick a tool that already has infrastructure to do that in-place so you don't have to do everything from the scratch; albeit you could do something similar based on llwm or libgccjit too.
What's the problem with dlopen and dlclose? Or LoadLibrary/FreeLibrary on Windows? Loading and unloading shared libraries is a perfectly normal thing to do, there's no issue with it. Where it may get complex is if you want to have new and old versions active simultaneously - then you'd need to be careful about namespacing and symbol scoping.
But you didn't mention what language the rest of your server is written in. If you read the paper, it talks about Truffle/Graal, which is a compiler system that JIT compiles AST interpreters. So if you're working on the JVM, you could instead write (or reuse) a simple interpreter for your expression language that uses the Truffle annotations, and switch to the GraalVM as your JDK. Then it'd get faster automatically and you don't have to think about compilers, shared libraries or whatnot. Of course you don't need Truffle. For a simple language you could also generate bytecode and load that into a classloader. It's not particularly complex.
But if you want to stick with C then I'd just go ahead and load/unload shared libs on the fly. Just make sure you don't unload one whilst a thread is still inside it.
The rest of the server is C++. And yes, the problem is that there will temporarily be two libs active simultaneously for a few seconds. I'll need to think it through. Sounds like a debugging nightmare if things go wrong.
That is okay, your libs can be uniquely named, and have unique symbols.
I would echo the suggestions that you may be better off going for a byte code interpreter, at least as a first stage. If interpreting rules is expensive enough to worry about compiling then it’s probably expensive enough to be thinking about resource limits, interruption, and maybe profiling, and that stuff will be easier to experiment with if you have an interpreter.
I had the same thought; a simple bytecode VM shouldn't be too bad to implement (famous last words...), and the improvement in cache locality might help a lot.
Well, my thoughts are that if a C compiler can do all the heavy lifting I may as well take advantage of that. Transpiling to C also reduces, if not eliminates, any internal recursion in the expression evaluation. You also get decades of optimisations too, although I don't see there being much of that.
This looks like one of the perfect application of wasm.
You could do ruleset -> C -> wasm and run the wasm in process with wasmer/wasmtime.
From my understanding wasmer might be easier to use.
It sounds like this should solve the immediate problem, the main downside I can see is that it might be a new significant dependency whose cost can be hard to judge.
Compiling to C introduces the complexity of loading/unloading a shared library at runtime, though, as you and other folks in this thread have been talking about. A bytecode compiler + VM could just be built into your main program.
It might be best to emit machine code directly to memory. Template JIT style.
Removes hazards like the C compiler putting jump tables in .rodata, has some nice properties like you can emit calls to functions in the address space as literal address instead of using a loader to patch them from relocation tables.
> ... you can emit calls to functions in the address space as literal address instead of using a loader to patch them from relocation tables.
Technically, dlopen can do this too, except it'd make dlopen() slower.
My toy compiler's C backend does just that: emit ELF files with relocation tables but load them and patch all relocations to direct addresses instead of PLT/GOT jump.
If you want to keep the call into a C compiler, you might want to go with freestanding (no calling into glibc by accident) and position independent, see if you can get everything statically resolved. Nothing in the loader relocation tables. Then mmap the entire elf. That gives you tables of constant data and removes all the edge cases that come from carving .text out.
If you're partly looking for a easy and sufficient answer, and are building an elf anyway, much like the sibling post, I'd think dlopen is probably the way to go. Beware that dlclose might be a no-op, in which case eventually the address space leak will sink you.
Compiling to C is definitely an option, but there are also plenty of low-hanging fruit that could come before. You could invest in a better interpreter or even some bytecode compilation first.
I don't know where your rules are coming from but I would personally not be super happy with the introduction of a c compiler running user-provided strings in the middle of my application.
I'm curious about a few things:
- How many rules are there, and how complex do they get? It sounds pretty substantial if you're running into performance issues (and/or you're working with highly demanding performance requirements)
- What do your rules look like? I'm having a bit of trouble understanding how you'd get recursion if there's just AND/OR operators. I can imagine an AST getting fairly big/deep, but the only way I can think of to get recursion would be if you're calling external functions multiple times, with different values over the course of processing a single request.
So it sounds like the rules themselves aren't recursive, just your current strategy for evaluating them; is that correct?
Given that, I'm more inclined towards the currently upvoted reply about translating them to JS/Lua and just calling eval(). [1] It'd be very simple and _should_ be fast enough, though you might need to worry a little bit about security (especially depending on whoever's providing the rule sets and how much validation there is around them).
You could also check out libjit: https://www.gnu.org/software/libjit/ . I've not used it before but know it exists. Essentially you pass your bytecode in and the library handles turning that into machine code and everything. So that library would handle the loading and unloading of executable memory for you, you would just have to handle translating your rules into the libjit bytecode.
I can highly recommend libtcc (https://github.com/TinyCC/tinycc.git) for this kind of thing. I recently ported the code developed in linux on an ARM chromebook to a generic windows box in 20 minutes.
Yes, the language there is Zephyr ASDL [1]. I actually changed this:
Dict(expr* keys, expr* values)
to
Dict(List[expr] keys, List[expr] values)
in our fork for https://oilshell.org/ , and it made a lot more sense to contributors.
It's funny how "sticky" syntax is -- two contributors ALSO read * as "pointer" ! So I changed it to be consistent with MyPy syntax earlier this year, and now I think I should have done that a long time ago.
(It would also make sense to change it to keys: List[expr], as that's the Python 3 syntax for types)
---
The funny thing is that while the web page says "Abstract Grammar", I would not call Zephyr ASDL a grammar.
Python has a separate Grammar/Grammar file, that is distinct from this file Python/Python.asdl.
A grammar describes a set of strings, and that's not what ASDL does.
It describes the data structure that the valid strings get transformed to after parsing, and there is not a 1:1 correspondence (e.g. due to the difference between a CST and AST, and other post-parsing validation).
(Pretty sure "ungrammar" from rust-analyzer is nodding at this -- https://rust-analyzer.github.io//blog/2020/10/24/introducing... -- it's "ASDL for concrete syntax trees", i.e. it's not a grammar because it doesn't describe strings; it describes trees)
These features are always nice for meta-magic and monkey-patching but those are usually exactly the reasons why a language can only be optimized so much. It's often a worthwhile tradeoff but needs to be carefully considered.
Is "meta-magic" and "monkey-patching" what we mortals call code generation and which is an useful technique to automate coding and which can both save time and generate safer and more correct code or is it something else?
There are many ways to do code generation. For example, you can have one program write out text files that a compiler reads to create another program. Or you can have a program change itself while it's running. These represent two extremes on a static to dynamic continuum. Most "dynamic" languages like Ruby and Python take the latter approach. This approach is hard to reason about, for both an optimizing runtime and a human, and is one reason Python and Ruby have such poor performance.
Their work seems to focus on discovering phases. IMO it's better if the language allows the programmer to express phases as a language concept. There are a few languages that support this. Racket is the best example I know of.
Many programs already implicitly have phases. For example, a typical web app has one phase at what is traditionally thought of at compile time, then another at startup when it reads it's configuration, and then another phase when it starts running. We smoosh together the last two phases into one "run time" because our languages usually don't support phases, but actually the configuration is known earlier: at deployment time. So we could run that phase as part of the CI / CD process and catch configuration errors earlier. Once you start thinking of phases you see them everywhere. They're in data science (loading data vs exploring data), front-end web apps ("hydration"), etc. I think it's a large productivity gain that is unexplored in commercial programming.
This is such a great comment, which I didn't expect to find in this thread at all! I've been thinking about application "phases" for a while and I wasn't aware that there was research being done in this direction. Do you happen to have any more references beyond Racket and the link above?
> IMO it's better if the language allows the programmer to express phases as a language concept.
Yeah, not just to make life easier for the compiler, but I suspect it'd also make it easier to read & reason about the code.
I mean, performance gains are nice but sometimes performance isn't really the bottleneck but reading & maintaining that unholy cocktail of application code, bash scripts, schema files & specs, build scripts, code generators, Dockerfiles, and Gitlab YAMLs is.
This is a great paper. The general idea is often called "multi-stage programming". The academic work has mainly focused on performance, which I don't think is the most interesting use in industry.
Perhaps you wish to look at Common Lisp? I think it gets you covered in all those things you mention. There is distinct read, compile and evaluation phase, all exposed to the application code.
If you get a good compiler like sbcl, you can go a long way with just the language itself since the language itself offers a blend of scripting language qualities while being a compile language. Emacs Lisp can go long way too.
Meta-magic can include code-gen, but monkey-patching can include effectively doing ast-ast transforms to change the behaviour of existing code without having to actually edit the code.
Honestly, I had no idea that "monkey-patching" was a term, I thought it was something Op just made up :-).
I have just looked it up, and it indeed is a term [meaning something](https://en.wikipedia.org/wiki/Monkey_patch). Even meta-magic seem to be a term, but less defined seems like.
I don't think we need AST for monkey-patching, but it can be argued that each program can be converted to an AST. Anyway, less important.
The "monkey-patching" seem to basically mean anything that changes the runtime somehow. The example in Python on Wikipedia article just changes the value of a global variable. In Lisp we have many, many tools which can change code at runtime, I would even argue that Lisp is all about flexibility and "monkey-patching" since we already write AST and have the entire symbol table and the compiler available to us at runtime. Functions are just objects like any others and we can install any object in the function slot of an symbol, we can wrap it in our own function and install that one, everything is a list and dynamic in some way, so we can add slots to classes and structs if we want etc.
As I conclude, the terms mean what I meant, but with code generation, I mean not just classical generate some code to a file from another file; I mean transform the code in some way, during either the runtime or compile time.
It’s something else. Monkey patching comes from Ruby, since classes are open, you can change implementations at runtime. I’m not quite as familiar with meta magic, but I believe it has similar characteristics. Sorta like dynamically injected accessors via autoload in perl.
There are cool tricks, but they come at a cost. Both for the reader of code, and the compiler of code.
My Lisp Machine's operating system (MIT, end 70s onwards) is written in an object-oriented Lisp (using Flavors and later CLOS). Everything is open and everything can be changed at runtime. Mixins and similar stuff comes from there.
I'm happy to give lisp credit for everything cool.
but Common Lisp has all the good stuff you need to manage different generations of objects under changing classes, not the least of which is jumping into the running system and examining the running state. warnings and errors about redefinition, maybe jump under a different package to manage generations. state and functions are separate, thanks to dynamic dispatch.
Ruby, on the other hand, rewrites the class. hope all the data and functions are forwards and backwards compatible! good luck.
> Apparently from earlier guerilla patch (“code that sneakily changes other code”), understood as gorilla patch. A monkey patch was then a more carefully written instance of such a patch.[1] Or simply derived from monkey (“to mess with”, verb).
Certainly not in Lisp; I have never heard it before; I thought Op made it up :); but have looked it up on Wikipedia and indeed is a term. Seems just to alter the behavior at the runtime.
For further reading, there is an excellent paper [1] by the authors of Lua discussing the practical benefits of implementing the language using a register machine.
[1] https://www.lua.org/doc/jucs05.pdf