Microkernels and drivers in user-space

Inspired by the bunch of fucktards yesterday who knew nothing about device drivers, I decided to make this thread explaining why microkernels that run drivers in userspace is a meme and needs to die.

Myth 1: Running a driver as a normal user-space process is safer and doesn't crash the system.

This claim is based on a number of misunderstandings. While it is true that protection faults (aka dereferencing a NULL pointer) in kernel space will crash the kernel and cause a kernel panic and will only kill the user-space process in user-space, this issue is minimal.

There are two situations I'd like to address here, MMIO and DMA.

For MMIO, IO devices have on-board memory regions called BARs that are mapped into IO address space by the BIOS on system boot in a process called bus enumeration. This allows the CPU to read and write to memory addresses, and these read and writes will be forwarded to the IO device itself. In other words, this is how the CPU is able to read and write registers onboard the device.

Running drivers in user-space would mean exposing physical addresses to user-space. With no additional form of protection, a bad or malicious driver could potentially read and write from arbitrary locations in RAM including where the kernel resides. It would be able to not only crash the system, but breaking out of user-space isolation, meaning that the separation of kernel-space and user-space is completely void.

1 / 4

Other urls found in this thread:

wiki.sel4.systems/FrequentlyAskedQuestions#What_about_DMA.3F
sigops.org/sosp/sosp13/papers/p133-elphinstone.pdf
sel4.systems/Info/Roadmap/
blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html
twitter.com/SFWRedditVideos

You might argue that the kernel could provide some form of protection against this, for example offering an API that provides the physical memory regions that are valid for the device alone. The issue, however, is that where in memory device are mapped is a completely arbitrary process and done solely at the discression of the BIOS. In other words, you'd end up with an extremely bloated API that does a bunch of redundant checking in order not to expose physical address space to user-space. This would violate the very premise for running a microkernel in itself, not to mention that you'd be exporting a bunch of functionality such as pinning pages, requesting DMA buffers at certain ranges with such and such alignment, to user-space while still having to do all the checks in kernel. If you think syscalls are a bad idea and very monolithic, imagine this monstrosity of an API.

Of course, this only addresses the issue of a driver having access to physical address space. Then there's also the issue of DMA and the device itself. When a driver does DMA, it typically requests the kernel for a continuous memory region which the device is able to reach (some devices, such as Nvidia GPUs only have 30 address bits, meaning that they can't address the entire 64-bit address space). The driver then passes the address of this range to the device (writing it into a register using MMIO) and the device will then either read or write directly to RAM without involving the CPU. In other words, Direct Memory Access (DMA).

2 / 4

Myth 2: IOMMUs solve everything

The problem though, is that the driver can pass along ANY physical address and make the device read or write into arbitrary memory locations (again, for example where the kernel resides). Of course, you'd might be thinking right now that this is where IOMMUs come in, and you're quite right.

In addition to eliminating the need for bounce buffers (the case mentioned above where the Nvidia GPU need to address something above 30-bit address), IOMMU can also provide address isolation by grouping devices into so-called domains. This prevents a driver from flushing data into random physical addresses. HOWEVER, the problem again is that setting these up are usually the task of the device driver. Exposing IOMMU access to user-space is a bad idea, so you'd end up incorporating this into the horrendously bloated API mentioned above.

This of course assumes that there is an IOMMU available in the first case, something architectures other than x86 usually don't have. There's also the issue with PCIe P2P, enabling the IOMMU means that every TLP (aka memory operation) is forwarded to the root complex instead of just taking the shortest path. A network card reading from a disk would experience a serious performance degradation. There is stuff like ATS but they are highly vendor specific and an AMD implementation of ATS is not respected by Intel's VT-d for example. In additon, only a minority of devices actually support ATS. Nvidia GPUs certainly don't.

3 / 4

Myth 3: The performance penalty of running in user-space is negligible

As I mentioned above, P2P performance is kill if you use an IOMMU. However, the biggest performance killer is the cost of context switch.

First of all, running in user-space means that your driver is subject to the scheduler and in risk of having it's memory swapped out. Of course, you could solve this by giving higher priorities to the driver as well as pinning its memory in RAM, but then you again have the situation where there is no real separation between user-space and kernel-space. The Linux kernel also runs in virtual address space and its memory is just always part of the first 1 GB of memory and protected using the hardware page protections. So in effect you blur out the differences between kernel-space and user-space and you gain none of the "benefits" of running in user-space.

The real issue here, however, is that you'd need some sort of mechanism to disable interrupts (and thus preemption) from user-space, because sometimes the device driver might do something that requires atomicity and can't be interrupted by the scheduler. So add this functionality to the already bloated driver API and in addition further blur out the hard separation between kernel-space and user-space.

Secondly, there's also the issue of device initiated interrupts. Imagine a Gigabit ethernet network controller generating an interrupt for every received packet. Normally, interrupt routines are short and do little stuff. For user-space drivers, however, you'd need to context switch back into user-space and then run some routine in user-space while at the same time providing deadline guarantees and blocking guarantees.

4 / 4

tl;dr

Someone is mad as fuck

tldr

No one cares, you fucking angry turbonerd

Sage

>informative argument against microkernel design
>"tl;dr"

Millennials

I guess Linus was right, then.

>muh millennials

It's an autistic rant spanning 4 posts about a subject no one cares about. KYS

>Inspired by the bunch of fucktards yesterday who knew nothing about device drivers, I decided to make this thread explaining why microkernels that run drivers in userspace is a meme and needs to die.
Now that's some dedicated shitposting

I made an informative rant about why I think device drivers in userspace is a bad idea, and that's your best comeback?

>I just slapped together some random words and hope that they sound coherent enough to seem intelligible
Fix'd. Now go back to playing with Java, pajeet, and let the grown ups discuss kernel design.

Why are you so mad? How about actually addressing the actual arguments instead of throwing semi-racist Sup Forums memes around?

>baaawwww it's racist!!!!

Did I hurt your little poo in loo feelings, you disgusting currynigger?

All of this is very easy to refute: managed code.

You wouldn't be willing to let heavy abstraction layers such as JVM and CLR clog up your systems in exchange for a little security to begin with if monolithic kernels weren't such a massive failure.

2/10 made me reply

Cool now go make your autism kernel that will never ever be used.

Linux, Windows, Mac and the good BSDs are popular because they deliver features instead of internet autism wars

Managed code would introduce even more abstractions...

>Cool now go make your autism kernel that will never ever be used.
I'm arguing for the design that's already used in Linux and FreeBSD, instead of some autistic pet projects

So for the MMIO you just map a page to that physical address, I don't see the problem here.

>All of this is very easy to refute: managed code.
Not OP, but you'd still at some point need to deal with physical addresses and actual pages, which is OP's argument. And when you do so, it doesn't matter how "safe" the rest of the driver is, you're still able to fuck up unless you introduce layers upon layers with bloated "security" abstractions (which entirely violates the microkernel design principle in the first place).

>Not running NodeOS

>So for the MMIO you just map a page to that physical address, I don't see the problem here.
The problem is that you're exposing physical addresses to user-space, which would then be able to read and write into arbitrary memory locations thus breaking out of the user-space encapsulation.

Read part 2. It isn't a problem to map pages to physical addresses (this is already how it's done in all systems), the problem is that you're allowing user-space access to physical memory as said.

Uh, no Richard. Linux is the operating system, not just the kernel, and correcting users to say ganoo plus Linux in a vapid attempt to stay relevant is sad.

What the fuck did you just fucking say about me, you proprietary slave? I’ll have you know I graduated top of my class at Harvard, and I’ve been involved in numerous free software projects, and I have contributed to over 300 core-utils for GNU. I am skilled in Lisp and I’m St. IGNU-cius, saint of the Church of Emacs. You are nothing to me but just another unethical non-free software advocate. I will distribute the fuck out of your source code with freedom the likes of which has never been seen before on this Earth, mark my fucking words. You think you can get away with saying that shit about me and the GPL on the Internet? Think again, fucker. As we speak I am contacting my colleagues at FSF and your binaries are being reversed engineered right now so you better prepare for the storm, maggot. The storm that wipes out the pathetic little thing you call your copyright. You're fucking dead, kid. Free software can be anywhere, anytime, and it can ensure your freedom in over four ways, and that’s just with the GPLv2. Not only am I extensively skilled at C hacking, but I have access to the source of the entire GNU userland and core-utils and I will use it to its full extent to wipe your miserable proprietary code off the face of the continent, you little shit. If only you could have known what ethical retribution your little “clever” program was about to bring down upon you, maybe you would have ensured your users' freedom. But you couldn’t, you didn’t, and now you’re paying the price, you goddamn idiot. I will shit free as in freedom all over you and you will drown in it. You’re fucking dead, kiddo.

Sup Forums in a nutshell

That's Doctor Richard Matthew Stallman, PhD to you.

>Subject no one cares about
> Sup Forums - Technology
They should rename this board to mindless consumerism or something

I use Windows, couldn't care less about this.

The Kernel/user space dichotomy shouldn't exist. There should simply be an int's worth of process spaces, with no TLB or cache flushing required to switch between them.

Processor hardware is retarded.

>I don't care about inner workings of an operating system or how device drivers work because I'm a mindless consumer
ftfy

>it's 2017 and he still uses drivers
lmao

Like that I see. I'm actually thinking of making my own microkernel, so I thought about this. How it would work in my kernel is that the drivers get loaded in either by the init system or a command ran by the user (under root of course). Upon init the driver then requests the devices it wants to use to be mapped into their address space. If the device is already mapped the call fails and the driver is shut down. It prevents just any process taking control over a driver.

As for the part about exposing physical addresses to user-space processes, it is not any more dangerous than running a driver in a monolithic kernel. Drivers carry certain responsibilities, part of which is not fucking up the devices they are made for.

Fuckin nerd

>Upon init the driver then requests the devices it wants to use to be mapped into their address space. If the device is already mapped the call fails and the driver is shut down. It prevents just any process taking control over a driver.
So what about stuff like NVMe disks that can potentially support multiple lightweight drivers (as long as one is responsible for setting up admin queues)? What about SR-IOV capable devices that have a separate driver for each virtual function (which maps to the same physical address range)?

>As for the part about exposing physical addresses to user-space processes, it is not any more dangerous than running a driver in a monolithic kernel.
This is true, and is also part of my point. You're effectively blurring out the hard line between user-space and kernel-space.

>Drivers carry certain responsibilities, part of which is not fucking up the devices they are made for.
I'm mostly concerned with breaking out of their isolation, by accessing arbitrary regions of system memory and gaining access to kernel pages. In this case, the argument for running in user-space is entirely moot.

To reformulate my posts: I'm not saying that user-space drivers are inherently any less secure than kernel-space drivers. What I'm saying is that I fail to see how the benefits of running in user-space applies when you're dealing with stuff that has full control over physical memory. Such a driver crashing will still bring down the entire system in almost all cases.

atlast an informative quality post. Thanks for making 4chin better.

>informative quality post
>pure lies
huh.... really makes you think..

How is it lies? Or are you baiting?

>b-but I'm too retarded to read one or two paragraphs so only Sup Forums tier GPU shitposting shoul be allowed on Sup Forums

obvioua bait

(You)

This isn't smartphones, get out

I guess why people do microkernels is for the little added security such as dereferencing a null pointer or restarting drivers when they crash. Also tell me in what case a driver would crash the whole system.

>I guess why people do microkernels is for the little added security such as dereferencing a null pointer
This could be handled simply by adding a protection fault handler to do some gracious error handling in kernel space, seeing how most modern kernels run in virtual address space (like Linux do).

>Also tell me in what case a driver would crash the whole system.
Pointing a device DMA to kernel memory for example (which could be handled by IOMMU, assuming you've set up proper domains, something your microkernel must also do and further blurring out the lines between user-space and kernel-space). Or a malicious driver could retrieve physical address of kernel pages and inject code into them.

Then there's device initiated interrupts and having interrupt routines in user-space, you'd have to add the cost of doing a context switch.

Then there's the issue of disabling interrupts and scheduler preemption, you'd be even further blurring out the lines between user-space and kernel-space. If you disable preemption and you start a blocking call by mistake (to take a lock that's already taken or whatever), you're fucked and have deadlocked your system no matter if it runs in kernel space or user space.

Not the guy you're responding to, but thanks.

wtf i hate microkernels now

What does this autistic shitfest have to do with Sup Forums ?

Fuck off back to your friendly linux thread generals.

>what does computer architecture and OS design have to do with technology

Thamk you based Torwalds.

Thank you based OP

So you got butthurt and rekt in the Rust thread yesterday and now you made an 8000 character long rant about how butthurt you are? keke

I think your computer science teachers are still teaching you from books written in the '80s, when the word "micro-kernel" was associated with a future utopia.

>Tannenbaum
He is a fucking retard who spouts academic memes, because he can't into real stuff.

I would even go as far as to say that anyone who would like to see some different shit should look at Terry A. Davis's TempleOS. The way he does certain things is interesting.

I was actually referring to (uninformed) opinions touted in a thread yesterday, where a bunch of Sup Forumsentoo men claimed that microkernels and userspace drivers was the best thing since sliced bread yet they seemed completely unknowledgable about how device drivers actually work.

Terry doesn't do things interestingly. Quite the contrary, his FAT-like file system, his 1:1 virtual memory layout, his drawing to VGA buffer graphics etc are just how DOS did it 30 years ago. The only improvement upon DOS is an preemptive scheduler, which isn't fucking hard to implement.

desu Minix is a lot more interesting than Terry "I can't into virtual memory" Davis

>Minix is a lot more interesting than [templeOS]
This

Sorry, I meant that more towards them, than you.

...

>However, the biggest performance killer is the cost of context switch.

No, not really. See cost of context switch on Linux vs cost of context switch on seL4.

You'd need hundreds of times as many context switches on seL4 for it to even match Linux overhead of one context switch.

>took a semester of operating systems, had to implement a microkernel
>it was literally just a rip off of MIT's 6.828
>implying I didn't steal all the code
Hardware classes are more fun than operating systems. Fuck OS niggers.

This is not true, user. The cost of context switching on Linux is actually very low since the kernel makes out the first 1 GB of memory of every process and is pinned to that memory.

seL4 is a meme, user

wiki.sel4.systems/FrequentlyAskedQuestions#What_about_DMA.3F

>cannot prove that DMA is well-behaved so proof assumes that it doesn't exist

Hah.

Run lmbench's lat_ctx on your Linux super workstation sometime. Compare it with well known seL4 values from this well known paper.

sigops.org/sosp/sosp13/papers/p133-elphinstone.pdf

Well, DMA is more of a hardware problem.

All that drivers under seL4 can do is use the IOMMU to contain the hardware to the driver's own pagetable.

The only numbers I see in that papers are for "one-way IPC of various L4 kernels", which does not really give any indication of the cost of context switching.

>All that drivers under seL4 can do is use the IOMMU to contain the hardware to the driver's own pagetable.
It clearly states that IOMMU is experimental and for the unproved/unverified variant of seL4.

Was meant for

OP you are so intelligent. I wish I could regurgitate my Intro Operating System lecture notes, and put them in my own words as well as you!

>IOMMU to contain the hardware to the driver's own pagetable.
That's not how IOMMUs work though. IOMMU provides virtual IO addresses (bus addresses) which translates into physical addresses. EIther the driver or the kernel needs to set up the correct IOMMU domains, it doesn't happen magically. Usually it's done by the driver in order to have full control over the device.

>It clearly states that IOMMU is experimental and for the unproved/unverified variant of seL4.

seL4 usually implements things and only verifies them sometime later. Personally, I don't care so much about verification as I do about the ongoing virtualization support, particularly the VMM, which is userland.

sel4.systems/Info/Roadmap/

> The only numbers I see in that papers are for "one-way IPC of various L4 kernels", which does not really give any indication of the cost of context switching.

That's the cost of sending one message to one process to another. That is the most relevant case of context switch, in terms of microkernel design overhead. There's also the timer causing tasks to switch so that all runnable tasks get a chance to run, but that one isn't microkernel specific.

>I think I'm not a mindless consumer because I use a broken operating system, with the compensation "At least I understand how my OS werks XD"

But it is an extremely vaguely defined metric (messages can be of an arbitrary size), and it doesn't say anything about how they measured it (did they run it once, is it an average, what is the distribution? etc). I cannot compare this number fairly, you'd need to run tests on the same hardware with fairly similar circumstances.

Anyway, as for the cost of context switching on Linux. As mentioned, it is very low because kernel memory is already mapped into the first 1 GB of every process' address space. It's only a matter of restoring a stack pointer and a couple of registers and flushing the cache (which is the highest cost of context switching). Not even seL4 can avoid this.

We're not discussing an actual OS here, we're discussing the cost of running drivers in user-space. Pay attention, you might learn something.

I was remarking your stubborn pretentiousness of not understanding people share your interests. Keep shitposting pretending to be a CS PhD, you might go somewhere.

*don't share your interests

>I was remarking your stubborn pretentiousness of not understanding people share your interests
You are free to ignore this thread completely, yet you decided to post in it and pretending to be proud of being an ignoramus.

The post literally reads "I use Windows, couldn't care less about this", yet it should be obvious that even Windows has IO device drivers and needs to do things in an optimal way.

>Keep shitposting pretending to be a CS PhD, you might go somewhere.
I am actually a PhD student.

I posted in this thread because I try ignoring idiots like you everyday. No it shouldn't be obvious, because he simply "doesn't care" something you fail to understand.
>I'm a PhD student
Okay kid

I agree OP.

You want to know what's really going on? Microsoft and other companies, (and even governments) are independently trying to kill linux and open source.

The GPL is a danger to every big tech company because it means they can't community driven software, re-brand it, and sell it.

Governments are also anti-FOSS because they're all abusing technology to spy on us. If all software were open source and people thought negatively of closed source, it would make it much harder to spy on people (we know the NSA was working with Microsot).

Also, these organizations use SJW tactics to try to control and destroy communities. Look what happened to Mozilla. Firefox was the alternative to the bot net, now shills post everyday about how it's an SJW company, and the SJWs have really infiltrated it.

People pushing non-Linux kernels are just attempting to disrupt the Linux and open source ecosystem.

>I posted in this thread because I try ignoring idiots like you everyday.
That literally doesn't make any sense.

What's to understand, I got tired of seeing threads like these everyday, so I started shitposting them. Was that hard Doc?

>No it shouldn't be obvious, because he simply "doesn't care" something you fail to understand.
You're on a technology discussion board and you get angry because people are discussing how technology works?

That's some high level autism right there.

>I got tired of seeing threads like these everyday,
You get tired of seeing threads discussing hardware architecture and OS design every day?

Tell me, why the fuck are you even on Sup Forums then?

>and it doesn't say anything about how they measured it

They didn't just measure it. This is WCET (Worst Case Execution Time), which is one of the proofs they do.

But still, you agree that it is a vague metric? I mean, the statement "message passing takes 0.05 microseconds" doesn't really say anything. What is a message? How long is it?

I get tired of reading about pseudo-intellectuals getting butthurt with other posters because they don't share interests.

More like "Shallow IT and PC Gaming".

Make /prog/ a thing again.

>it is very low because kernel memory is already mapped into the first 1 GB of every process' address space. It's only a matter of restoring a stack pointer and a couple of registers and flushing the cache (which is the highest cost of context switching)

Linux is remarkably slow at this. It takes over a whole microsecond, even on many-GHz CPUs.

Part of why is that the process concept is bloated on Linux. Part of it is that kernel memory being mapped isn't anywhere as good as a whole microkernel permanently pinned in the TLB.

> Running drivers in user-space would mean exposing physical addresses to user-space. With no additional form of protection, a bad or malicious driver could potentially read and write from arbitrary locations in RAM including where the kernel resides.

How about every driver having its virtual address space just like any ordinary process?

>pseudo-intellectuals
Do you have some sort of inferiority complex, user?

I was making a case for why user-space drivers is a bad idea, by explaining how it would be implemented and why this is insufficient. What would you have me do, post memes and reaction images about Tannenbaum instead?

I'm not butthurt that you don't share my interest, all I did was commenting on the ignorant statement saying "I use Windows therefore I don't care". Obviously Windows have IO devices and drivers too, therefore Windows users aren't unaffected by driver design choices.

>Linux is remarkably slow at this. It takes over a whole microsecond, even on many-GHz CPUs.
No, it literally takes nanoseconds.

>Part of it is that kernel memory being mapped isn't anywhere as good as a whole microkernel permanently pinned in the TLB.
It *IS* permanently pinned in the TLB... Why do you think it is mapped into the same area of memory for every process?

>How about every driver having its virtual address space just like any ordinary process?
If you want an actual device to access memory, you cannot avoid having to deal with physical memory.

>No, it literally takes nanoseconds.

No, it literally takes microseconds.

blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html

>How about every driver having its virtual address space just like any ordinary process?
Drivers already run in virtual address space, they do it on Linux and they do it on most OSes I know. The point is that you have to get physical addresses for stuff like DMA buffers.

isn't that why IOMMU has been recently included in new processors? just to have MMU for IO devices just like the processor?
so If we have IOMMU, every driver can deal with its space without having to know the real physical address or worrying about other process stealing or modifying its data, right?

user, this is seven years old.

If you're saying this changed, where's your newer data?

It's halfway correct. See and First of all, only x86s have IOMMUs, and they are still highly vendor specific. In addition, enabling the IOMMU absolutely kills device to device access, because instead of taking the shortest PCIE path, everything is now routed through the root complex.

Secondly,
> every driver can deal with its space without having to know the real physical address or worrying about other process stealing or modifying its data, right?
Not really. Someone has to set up correct IOMMU mappings, and this is a job for the device driver (because the device driver is aware of addressing limitations of the device, knows how many DMA buffers it needs, knows when to set up and tear down these buffers, knows when to access potential other devices [to do RDMA] etc). This would still lead to the situation where the device driver has control over physical address space.

>This would still lead to the situation where the device driver has control over physical address space.

No, not necessarily. See openbsd's pledge() for the idea of dropping privileges after initialization, and Genode handbook 16.05 for the idea of capabilities to physical memory frames.

I do share your interests, you're obviously distraught when you are calling someone a "mindless consumer".

Again, you're using imprecise metrics because "cost of context switch" isn't clearly defined. The blog you posted to, for example, says that system calls aren't triggering what he calls a "full context switch", but he doesn't define what he considers a full context switch.

What he measures is the time it takes for a thread to wait for a mutex, which isn't the same as the cost of a context switch but how long it takes for a blocked process to be rescheduled.

Seeing that flushing the cache is the most influential cost of a context switch, you and I could agree on using the time it takes for flushing the cache as a metric of the cost of a context switch. But I imagine you wouldn't agree to this, because it is the same regardless of OS.

>enabling the IOMMU absolutely kills device to device access, because instead of taking the shortest PCIE path, everything is now routed through the root complex.

Please provide a common real-life example of this device to device communication which isn't DMA (as DMA always has CPU as arbiter) happening. I'm genuinely curious.