I wrote an operating system in a purely functional dialect of Lisp; http://losak...

lukego · on Feb 2, 2014

> For an operating system latency is more important than through put

One interesting exception to this rule is operating systems for network packet routers. These need to be optimized for throughput over latency. I mean, the difference between 10 and 20 million packets per second is important, but the difference between 10 and 20 microseconds of processing delay is not.

This is a fun area to be in at the moment because simple academic-looking code can actually compete with highly optimized kernels like Linux because those have been optimized for something else.

dzderic · on Feb 2, 2014

When you're talking about an average latency of single digit us for a regular switch and < 500ns for a low-end performance switch, the difference between 10 and 20 microseconds is massive.

lukego · on Feb 3, 2014

Packets between internet hosts spend about 10,000,000ns in transit and they traverse around $10,000,000 worth of network elements on the path. These numbers are both worth optimizing. I'd say that the lowest hanging fruit for operating system hackers like myself is the cost i.e. serving more customers with fewer boxes. That latency figure is tougher because it's mostly due to the speed of light rather than processing time.

sbierwagen · on Feb 2, 2014

Accidentally clicked on the wrong icon and downvoted you, sorry, my bad.

im3w1l · on Feb 3, 2014

I gave a corrective upvote :)

weinzierl · on Feb 2, 2014

> For an operating system latency is more important than through put

The old OS/390 that is still processing your bank statement and your payroll is another counter example that the above statement is too generic.

ansible · on Feb 4, 2014

Network switches don't handle most packets at the OS level, but in hardware.

augustl · on Feb 2, 2014

Author here. The language is the part I also dread the most, and is also the part that will be the least unique and the least interesting to "re-invent".. I'll look into finding an existing language that runs on bare metal and that has (or supports) immutable values.

yvdriess · on Feb 2, 2014

There were some dataflow languages like VAL or SISAL that did exactly that. The hardware and software on top was built in the 70s and 80s, but such parallel research funding sort of dried out after the cold war ended.

There is a resurgence of dataflow related research again, which you would probably be interested in. For example, Jack Dennis is pushing the 'Fresh Breeze Architecture': a write-only-once memory CPU with in-hardware GC. This should be right up your alley:

http://csg.csail.mit.edu/Users/dennis/final-pact.pdf

gohrt · on Feb 2, 2014

Haskell

dons · on Feb 3, 2014

- HaLVM : https://github.com/GaloisInc/HaLVM

- House: http://programatica.cs.pdx.edu/House/

- SeL4 : http://ssrg.nicta.com.au/projects/seL4/

Emphasis is on the security and verifiability typed, pure-by-default Haskell provides, over OS design.

chongli · on Feb 2, 2014

There's actually been a bit of work on this very idea!

http://programatica.cs.pdx.edu/House/

agumonkey · on Feb 2, 2014

funny sidenote, one of House dev (Jeremey Bobbio) had a talk at FOSDEM2014 about reproducible builds for debian, similar to Nix (also immutable in spirit) I believe.

https://fosdem.org/2014/schedule/event/reproducibledebian/

yvdriess · on Feb 3, 2014

The current Haskell execution engine is very imperative indeed.

Instead of graph reduction, you could drive your execution in a completely immutable way by using string reduction. I don't think this has been explored much, I only found mention of it in 80s FP books (and Dybvig's thesis).

chubot · on Feb 2, 2014

What about D 2.0? I haven't used it but I think they changed everything to const by default. Maybe you would have to write a collections library, not sure.

Tyr42 · on Feb 2, 2014

Rust?

augustl · on Feb 2, 2014

Have considered writing the entire kernel in Rust. Haven't considered using it for the system language, will defenitely consider that.

kibwen · on Feb 2, 2014

If you do end up using Rust for any part of this project, we'd love to hear about the experience on our mailing list:

https://mail.mozilla.org/listinfo/rust-dev

sillysaurus2 · on Feb 2, 2014

Forget garbage collection in the sense of any variant of mark-and-sweep. For an operating system latency is more important than through put, so you really want real time memory management.

Is the assumption that mark and sweep is slower than real-time memory management? It's actually been proven false, e.g. the .NET CLR runs faster than native C code in certain situations.

If you use reference counting, then there is a lazy variant that gives you just that; no pauses due to reclaiming free memory.

Reference counting is extremely slow because of how memory caching works. Whenever you update reference counts, you're trashing the CPU cache. It seems like there's no way refcounting could be faster than mark and sweep due to this. What am I missing?

cwzwarich · on Feb 2, 2014

Tracing garbage collection has better throughput than reference counting, but reference counting has lower and more predictable latency.

If you were actually going to try do tracing GC on the operating system level, you would probably have to use one of the more involved GC algorithms that require kernel support like Azul's C4 collector.

lmm · on Feb 2, 2014

> you would probably have to use one of the more involved GC algorithms that require kernel support like Azul's C4 collector

Which shouldn't be such a problem if you're already writing an OS?

Also AIUI the only "hard part" of C4 is making sure objects don't get modified while they're being moved (that's what the barrier's there to detect) - which isn't a problem if everything's immutable.

cwzwarich · on Feb 3, 2014

> Which shouldn't be such a problem if you're already writing an OS?

I guess that could've used more context. I was trying to point out that in the larger landscape of GC algorithms, the one used by .NET is not that advanced, even though it's better than the GC used by common scripting languages.

> Also AIUI the only "hard part" of C4 is making sure objects don't get modified while they're being moved (that's what the barrier's there to detect) - which isn't a problem if everything's immutable.

Would absolutely everything be immutable? How would you implement lazy evaluation with the correct time complexity?

The expensive part of any garbage collector with short pause times is bound to be the read barrier, which is still needed with only immutable data. If the GC is compacting and wants to relocate an object, it has to rewrite all references to point to the new copy of the object.

C4 uses a read barrier to enforce the invariant that all references point to the up-to-date copy on a load of a reference, not just on a use of a reference. It relies on its ability to immediately update all references to reuse compacted pages during compaction, which means that gratuitous amounts of free memory are not required (a common problem with moving collectors).

You might be able to come up with alternate collector design that uses some of the same ideas as C4 while exploiting immutability, but if you loosen the healing read barrier you weaken your ability to reuse memory and put a bound on memory growth, since a reference to an old copy might always escape to the next round of collection.

procrastitron · on Feb 2, 2014

"Tracing garbage collection has better throughput than reference counting, but reference counting has lower and more predictable latency."

Exactly this. I would actually back off a bit and only claim that reference counting has more predictable latency, but the point is to sacrifice throughput in order to reduce worst case latency.

wfunction · on Feb 3, 2014

> It's actually been proven false, e.g. the .NET CLR runs faster than native C code in certain situations.

[citation needed]

and by citation I mean example code that "proves" this.

banachtarski · on Feb 2, 2014

throughput != latency