Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I believe that's an accident of the evolutionary path chosen with syscalls. If we'd instead gone with a ring buffer approach to make requests, then you'd never need to partition the memory address space; the kernel has its memory and userspace has its and you don't need the situation where the kernel is always mapped.


Hmm. I don't understand how that would work.

I think it would be possible for e.g. microkernels to greatly reduce the size of the reservation (though not to eliminate it entirely). However, I can't imagine how you would handle the privilege escalation issue without having at least some system code in the application's virtual address space that's not modifiable by the application.


I'm not sure how privilege escalation would be an issue since you'd never escalate privilege in the first place (I'm assuming you're talking about CPU ring privileges and not OS privileges). You'd just enqueue into the shared kernel/user space ring buffer your operations and the kernel would pick them up on its side, but you'd never jump between rings.

Such a design may require at least one processor dedicated to running the kernel at all times, so it might not work on a single processor architecture. However, single processor architectures might be supportable by having the "kernel process" go to sleep by arming a timer and the timer interrupt is the only one that's specially mapped so it can modify the page table to resume the kernel (for handling all the ring buffers + scheduling). As you note, there's some reserved address space but it's a trivial amount just to be able to resume running the kernel. I don't think it has anything to do with monolithic vs microkernels.


I have for a while wondered why we don't have "security core"s that are are really slow, but don't have caches or speculative execution so you can run security critical code without having to worry about CPU timing attacks


I think that would be of little help if the faster, less-secure cores have access to the same memory system as the secure cores. If the JavaScript engine of your browser runs on the fast cores, and you're visiting a malicious website, then vulnerabilities due to speculative execution on the fast cores could still leak information that was written by the slow cores. And you really wouldn't want to run the JS engine on cores without caches and speculative execution, at least not for everyday browsing.


What problem are you trying to solve?

If you just want constant-time operations, you just need dedicated instructions. Maybe they run on dedicated cores, but that's an implementation detail. I think this is a good idea, and it feels like we're heading this way already, more or less.

If you want to address all of the vulnerabilities that have arisen, you need full isolation. As mentioned in a sibling comment, you can't share any memory or caches etc. Each "security core" would have to be fully isolated from every other core, including every other security core. And you also need to segregate out "sensitive" code and data from non-sensitive code and data. When all is said and done, I don't see how we're talking about anything less than scrapping all existing ISAs, and perhaps even von Neumann architecture itself. I'm sure there are some middle-ground solutions between where we are today and this extreme, but I think they're going to look more like TPM/Secure Enclave than special cores, and they're only going to partially address the vulnerabilities.


You don’t really need a security core with my proposal. By ensuring that kernel and userspace code run in completely different page table domains, it removes the possibility for a meltdown style attack precisely because the memory was mapped alongside userspace and relied on CPU protections to keep it secret (those protections got thwarted by speculative execution). It’s actually a software design flaw.


True, you don't have to go full microkernel just to have messages passed though a buffer. However, if the buffer is shared by all processes, it does need to have some protection. I guess you could assign one buffer per process (which might end up using a lot of physical RAM), and then just crash the process if it corrupts its own buffer. The bigger issue with this approach might be adapting to asynchrony though.


It wouldn't be by all processes. One per process just like with io_uring. Not sure how it would end up being all that much physical RAM - you get a lot more memory mapped when you just start a process. Page faults might be another tricky corner case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: