Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wrote an operating system in a purely functional dialect of Lisp; http://losak.sf.net

There are a few things I learned in the process that I think are relevant here:

1. Forget garbage collection in the sense of any variant of mark-and-sweep. For an operating system latency is more important than through put, so you really want real time memory management. If you use reference counting, then there is a lazy variant that gives you just that; no pauses due to reclaiming free memory. The downside with reference counting is that it doesn't work with cycles in heap-allocated memory, but if your data is all immutable then you can prevent cycles from being created in the first place.

2. Treat mutable memory as a form of I/O. Even if you just want to use memory mapped I/O, you will need some way to read and write to (some portion of) memory. Set aside a fixed block of memory (say, for instance, the first 10 MB), and only allow that memory to be read and written in an imperative fashion. Keep the managed heap completely separate from that mutable block, and don't allow any direct access to the managed memory.

3. You can actually perform preemptive multitasking entirely at compile time (or at program load time for apps not included when you build a binary for your OS). I've done this for both my OS and for Scheme (https://github.com/ojarjur/multischeme). It works extremely well and winds up being much faster than using the native primitives usually used for building multitasking support. For more details on exactly how to do it see the link to my Scheme implementation.

4. The biggest difficulty the author will hit, and the most likely cause for performance issues, will be in code generation. Since the proposal is for a new bytecode, the author will have to implement their own interpreter and/or JIT compiler for it, and writing an efficient one is an extremely large chunk of work. I've built an operating system and a programming language, and the language is by far the harder project. This is why I went from working on a new OS written in a new language to re-implementing my compile-time multitasking technique for Scheme; it allows me to take advantage of all of the hard work people have done to make compilers for that language which produce fast code.



> For an operating system latency is more important than through put

One interesting exception to this rule is operating systems for network packet routers. These need to be optimized for throughput over latency. I mean, the difference between 10 and 20 million packets per second is important, but the difference between 10 and 20 microseconds of processing delay is not.

This is a fun area to be in at the moment because simple academic-looking code can actually compete with highly optimized kernels like Linux because those have been optimized for something else.


When you're talking about an average latency of single digit us for a regular switch and < 500ns for a low-end performance switch, the difference between 10 and 20 microseconds is massive.


Packets between internet hosts spend about 10,000,000ns in transit and they traverse around $10,000,000 worth of network elements on the path. These numbers are both worth optimizing. I'd say that the lowest hanging fruit for operating system hackers like myself is the cost i.e. serving more customers with fewer boxes. That latency figure is tougher because it's mostly due to the speed of light rather than processing time.


Accidentally clicked on the wrong icon and downvoted you, sorry, my bad.


I gave a corrective upvote :)


> For an operating system latency is more important than through put

The old OS/390 that is still processing your bank statement and your payroll is another counter example that the above statement is too generic.


Network switches don't handle most packets at the OS level, but in hardware.


Author here. The language is the part I also dread the most, and is also the part that will be the least unique and the least interesting to "re-invent".. I'll look into finding an existing language that runs on bare metal and that has (or supports) immutable values.


There were some dataflow languages like VAL or SISAL that did exactly that. The hardware and software on top was built in the 70s and 80s, but such parallel research funding sort of dried out after the cold war ended.

There is a resurgence of dataflow related research again, which you would probably be interested in. For example, Jack Dennis is pushing the 'Fresh Breeze Architecture': a write-only-once memory CPU with in-hardware GC. This should be right up your alley:

http://csg.csail.mit.edu/Users/dennis/final-pact.pdf


Haskell


- HaLVM : https://github.com/GaloisInc/HaLVM

- House: http://programatica.cs.pdx.edu/House/

- SeL4 : http://ssrg.nicta.com.au/projects/seL4/

Emphasis is on the security and verifiability typed, pure-by-default Haskell provides, over OS design.


There's actually been a bit of work on this very idea!

http://programatica.cs.pdx.edu/House/


funny sidenote, one of House dev (Jeremey Bobbio) had a talk at FOSDEM2014 about reproducible builds for debian, similar to Nix (also immutable in spirit) I believe.

https://fosdem.org/2014/schedule/event/reproducibledebian/


The current Haskell execution engine is very imperative indeed.

Instead of graph reduction, you could drive your execution in a completely immutable way by using string reduction. I don't think this has been explored much, I only found mention of it in 80s FP books (and Dybvig's thesis).


What about D 2.0? I haven't used it but I think they changed everything to const by default. Maybe you would have to write a collections library, not sure.


Rust?


Have considered writing the entire kernel in Rust. Haven't considered using it for the system language, will defenitely consider that.


If you do end up using Rust for any part of this project, we'd love to hear about the experience on our mailing list:

https://mail.mozilla.org/listinfo/rust-dev


Forget garbage collection in the sense of any variant of mark-and-sweep. For an operating system latency is more important than through put, so you really want real time memory management.

Is the assumption that mark and sweep is slower than real-time memory management? It's actually been proven false, e.g. the .NET CLR runs faster than native C code in certain situations.

If you use reference counting, then there is a lazy variant that gives you just that; no pauses due to reclaiming free memory.

Reference counting is extremely slow because of how memory caching works. Whenever you update reference counts, you're trashing the CPU cache. It seems like there's no way refcounting could be faster than mark and sweep due to this. What am I missing?


Tracing garbage collection has better throughput than reference counting, but reference counting has lower and more predictable latency.

If you were actually going to try do tracing GC on the operating system level, you would probably have to use one of the more involved GC algorithms that require kernel support like Azul's C4 collector.


> you would probably have to use one of the more involved GC algorithms that require kernel support like Azul's C4 collector

Which shouldn't be such a problem if you're already writing an OS?

Also AIUI the only "hard part" of C4 is making sure objects don't get modified while they're being moved (that's what the barrier's there to detect) - which isn't a problem if everything's immutable.


> Which shouldn't be such a problem if you're already writing an OS?

I guess that could've used more context. I was trying to point out that in the larger landscape of GC algorithms, the one used by .NET is not that advanced, even though it's better than the GC used by common scripting languages.

> Also AIUI the only "hard part" of C4 is making sure objects don't get modified while they're being moved (that's what the barrier's there to detect) - which isn't a problem if everything's immutable.

Would absolutely everything be immutable? How would you implement lazy evaluation with the correct time complexity?

The expensive part of any garbage collector with short pause times is bound to be the read barrier, which is still needed with only immutable data. If the GC is compacting and wants to relocate an object, it has to rewrite all references to point to the new copy of the object.

C4 uses a read barrier to enforce the invariant that all references point to the up-to-date copy on a load of a reference, not just on a use of a reference. It relies on its ability to immediately update all references to reuse compacted pages during compaction, which means that gratuitous amounts of free memory are not required (a common problem with moving collectors).

You might be able to come up with alternate collector design that uses some of the same ideas as C4 while exploiting immutability, but if you loosen the healing read barrier you weaken your ability to reuse memory and put a bound on memory growth, since a reference to an old copy might always escape to the next round of collection.


"Tracing garbage collection has better throughput than reference counting, but reference counting has lower and more predictable latency."

Exactly this. I would actually back off a bit and only claim that reference counting has more predictable latency, but the point is to sacrifice throughput in order to reduce worst case latency.


> It's actually been proven false, e.g. the .NET CLR runs faster than native C code in certain situations.

[citation needed]

and by citation I mean example code that "proves" this.


throughput != latency




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: