Hacker Newsnew | past | comments | ask | show | jobs | submit | Jeaye's commentslogin

This is superb. Thank you for making it and licensing it MIT. I think this is a contender to replace the lexer within jank. I'll do some benchmarking next year and we'll see!


Wow, that is a greate news!) Thanks for looking at it from this perspective! There are some benchmarks already available in the project - https://github.com/DotFox/edn.c/blob/main/bench/bench_integr...

you can run it locally with `make bench bench-clj bench-wasm`

Let me know if I can do anything to help you with support in jank.


It looks like the key missing part which would be needed for a lexer is source information (bare minimum: byte offset and size). I don't think edn.c can be used as a lexer without that, since error reporting requires accurate source information.

As a side note, I'm curious how much AI was used in the creation of edn.c. These days, I like to get a measure of that for every library I use.


It should be easy to add source info for every token, some of them already keep both (size and offset) I can create a branch for that.

> I'm curious how much AI was used in the creation of edn.c

A fair amount. This is my first big public project written in pure C. I did consult LLM about best practices for code organisation, memory management, difference in SIMD instructions between platforms, etc. All the things Clojure developer typically don't think about (luxury of a hosted language). Ultimately, the goal was to learn some part of C programming, working reader is a side effect of that.

> These days, I like to get a measure of that for every library I use.

Btw, I'm curious, what kind of measuring you are looking for?


Oooo that’d be nice.


Yeah, this isn't quite C++ interop on its own. It's C++ interop via C, which is an incredibly pertinent qualifier. Since we go through C, opaque pointers are needed for everything, we can't stack allocate C++ values, we need to write extern C wrappers for everything we want to do (like calling member fns), and we don't get any compile-time type/safety checking, due to the opaque pointers.

Direct C++ interop is doable, by embedding Clang into Zig and using its AST, but this is significantly more work and it needs to be done in the Zig compiler. As a Zig user, going through C is about as good as you can do, probably.


It's a bit more than your typical "interop via C". With a "sized opaque" type you actually can stack allocate C++ values in Zig (and vice versa stack allocate Zig values in C++), i.e.

fn stackExample() void {

    var some_cpp_type: c.SomeCppType = undefined;
    c.some_cpp_type_ctor(&some_cpp_type);
    defer c.some_cpp_type_dtor(&some_cpp_type);

    // ...

}


Rocket League works just fine for me, via Proton. I have over 4k hours in, each one of them done from Linux.

BakkesMod also works, thanks to https://github.com/CrumblyLiquid/BakkesLinux

Rocket League has a platinum rating on ProtonDB: https://www.protondb.com/app/252950


Does that include multiplayer? As far as i know, multiplayer was killed a couple years ago, which is actually what i meant by “works on Linux”


Yeah I have hundreds of hours or more in Rocket League on Linux, all competitive multiplayer. I use the Heroic launcher: https://heroicgameslauncher.com/


YES! Use the Proton version, not the native Linux version.


A GC is nowhere near the most difficult part of this. In 2014, there was no viable technology for JIT compiling C++, and very little technology for JIT compiling native code in general.


I started with reference counting, but the amount of garbage Clojure programs churn out ends up bogging everything down unless a GC is used. jank's GC will change, going forward, and I want jank to grow to support optional affine typing, but the Clojure base is likely always going to be garbage collected.


For a novice, could you elaborate the difference that GC does? Naively, it seems like the only difference would be whether you pay the deallocation fee immediately or later on.

Is there less of a problem when done in bulk if the volume of trash to collect is high enough?


GCs typically fall into two categories:

1. Reference counting - tracks how many references point to each object. When references are added or removed, the count is updated. When it hits zero, the object is freed immediately. This places overhead on every operation that modifies references.

2. Mark and sweep - objects are allocated in heap regions managed by the GC. Periodically the GC traces from roots (stack, globals) to find all live objects, then frees the rest. Usually generational: new objects in a nursery/gen0 are collected frequently, survivors are promoted to older generations collected less often.

In general reference counting is favoured for predictable latency because you’re cleaning up incrementally as you go. Total memory footprint is similar to manual memory management with some overhead for counting refs. The cost is lower throughput as every reference change requires bookkeeping (see Swift ARC for a good example).

Mark and sweep GCs are favoured for throughput as allocations and reference updates have zero overhead - you just bump a pointer to allocate. When collection does occur it can cause a pause, though modern concurrent collectors have greatly reduced this (see Java G1GC or .NET for good examples). Memory footprint is usually quite a bit larger than manual management.

In the case of Clojure which in addition to being a LISP also uses immutable data structures, there is both object churn and frequent changes to the object graph. This makes throughput a much larger concern than a less allocation heavy language - favouring mark and sweep designs.


> i mean it's as prone to error as any other thing that relies on string munging.

This is misleading. Having done a great deal of both (as jank also supports C++ codegen as an alternative to IR), if the input is a fully analyzed AST, generating IR is significantly more error prone than generating C++. Why? Well, C++ is statically typed and one can enable warnings and errors for all sorts of issues. LLVM IR has a verifier, but it doesn't check that much. Handling references, pointers, closures, ABI issues, and so many more things ends up being a huge effort for IR.

For example, want to access the `foo.bar` member of a struct? In IR, you'll need to access foo, which may require loading it if it's a reference. You'll need to calculate the offset to `bar`, using GEP. You'll need to then determine if you're returning a reference to `bar` or if a copy is happening. Referencing will require storing a pointer, whereas copying may involve a lot more code. If we're generating C++, though, we just take `foo` and add a `.bar`. The C++ compiler handles the rest and will tell us if we messed anything up.

If you're going to hand wave and say anything that's building strings is error prone and unsafe, regardless of how richly typed and thoroughly analyzed the input is, the stance feels much less genuine.


I've pondered this for a while and I have no idea how jank is a recursive acronym. What're you seeing that I'm not?


Jank's A Native Klojure? :)


It’s a joke (hence the “/s”) on the “[PL name] is [words beginning with the rest of the letters of the Pl name]” snowclone. However as time approaches infinity I’m sure it will get a recursive backronym.


I hear you when it comes to C++ portability, ABI, and standards. I'm not sure what you would imagine jank using if not for LLVM, though.

Clojure uses the JVM, jank uses LLVM. I imagine we'd need _something_ to handle the JIT runtime, as well as jank's compiler back-end (for IR optimization and target codegen). If it's not LLVM, jank would embed something else.

Having to build both of these things myself would make an already gargantuan project insurmountable.


Which particular delta between the title and the content gave you extreme pause?


It said "jank is C++", which I assumed would be explaining that jank compiles down to C++ or something similar, i.e. there is a layer of abstraction between jank and C++, but it effectively "works like" C++.

On re-read, I recognize where it is used in the article:

"jank is C++. There is no runtime reflection, no guess work, and no hints. If the compiler can't find a member, or a function, or a particular overload, you will get a compiler error."

I assume other interop scenarios don't pull this off*, thus it is distinctive. Additionally, I'm not at all familiar with Clojure, sadly, but it also sounds like there's some special qualities there ("I think that this is an interesting way to start thinking about jank, Clojure, and static types")

Now I'll riff and just write out the first 3-5 titles that come to mind with that limited understanding:

- Implementing compile-time verifiable C++ interop in jank

- Sparks of C++ interop: jank, Clojure, & verifying interop before runtime

- jank's progress on C++ interop

- Safe C++ interop lessons from jank

* for example, I write a lot of Dart day to day and rely on Dart's "FFI" implementation to call C++, which now that I'm thinking about, only works because there's a code generator that creates "Dart headers" (my term) for the C++ libraries. I could totally footgun and call arbitrary functions that don't exist.


My reasoning is this:

jank is written in C++. Its compiler and runtime are both in C++. jank can compile to C++ directly (or LLVM IR). jank can reach into C++ seamlessly, which includes reaching into its own compiler/runtime. Thus, the boundary between what is C++ and what is Clojure is gone, which leaves jank as being both Clojure and C++.

Achieving this singularity is a milestone for jank and, I think, is worthy of the title.


FWIW, I saw that the title was false (after all, Jank and C++ are two different things), but I assumed it was playing on the snowclone "Are we _X_ yet?" and therefore the blog post was going to be explaining why the answer to "Is Jank C++ yet?" should be "Yes, Jank is C++ now."


To be fair, jank embeds both Clang and LLVM. We use Clang for C++ interop and JIT C++ compilation. We use LLVM for IR generation and jank's compiler back-end.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: