Re: [patch 00/13] devtmpfs

From: Arjan van de Ven
Date: Mon May 11 2009 - 09:49:40 EST


On Mon, 11 May 2009 15:28:33 +0200
Kay Sievers <kay.sievers@xxxxxxxx> wrote:

> On Mon, May 11, 2009 at 15:05, Arjan van de Ven <arjan@xxxxxxxxxxxxx>
> wrote:
> > On Mon, 11 May 2009 13:34:52 +0200
> > Kay Sievers <kay.sievers@xxxxxxxx> wrote:
> >>
> >> > - That the other proposals are worse than yours.
> >>
> >> I also did, in exactly this thread.
> >
> > no you have not. But I'd like you to ;)
>
> I did. It's reliability, the race for new devices coming in when you
> start reading your list and finishing creating the nodes. You will
> miss these device, which we don't want to work around with another
> hack. You will have to bring up the machinery that listens to events
> for new devices, before synthesizing the stuff that is already there.

and this is hard because ?
certainly udev already does the listen machinery as its first step;
after setting that up, quickly racing down a "list" (whatever form you
make that in) and then daemonizing to work through the events shouldn't
be a problem.

(and you are right, while this is not an issue without initramfs,
because the kernel doesn't return until all probing activity has
finished, it might be a problem for initramfs, because that executes
before all the probing is done. But it's not a hard issue, just a
sequencing issue)

>
> > You have not commented substantially on my counter proposal to make
> > the single file with the "device list" (eg char/block, major,
> > minor, name) so that userland can make the nodes in 0.01-or-less
> > seconds, but with the permissions/owners it wants and the tmpfs
> > mount options it wants.
>
> I did. We have all that information in /sys already. I don't see the
> reason for another file other than to provide the information to make
> just another new userspace hack a bit more efficient.

I personally don't think the 0.06 seconds are a problem, but I got the
impression that you were trying to optimize this path with your patch.
(After all, it is pretty much the cost of the thing you're optimizing)

>
> And the main point is the reliability, let all the weird speed
> arguments and made-up numbers alone.

I'm sorry, but you're either trying to be obnoxious or telling us your
own numbers are made up. Since I doubt the former, and neither my nor
Erics numbers are made up, I'll assume the later...

> You depend on whatever rather
> complex userspace to bring your box. And people complain about that
> for years, and for good reason.

prior to sysfs people depended on MAKEDEV (and the fact that they chose
to not use tmpfs but a real fs for /dev) for this. It's not that much
different today. Using tmpfs for /dev is a local choice. It's fully
optional in fact.. and that's a good thing.
>
> On my box we create 12152 files in /sys on bootup, and with devtmps
> the same code creates 218 simple device nodes with the same call, and
> this makes bootup reliable, more self-contained, and as a nice side
> effect makes it faster. Just focus on init=/bin/sh, if you want to see
> the reason behind all this.

init=/bin/sh is an interesting subcase, sure. It means in the "before
your patch" scenario that people get the "real /dev directory", and not
the tmpfs overmount. It's a distro choice what to put there. Fedora puts
nothing there, Moblin puts only all static-allocated device nodes there.
I don't know what openSuSE puts there.

People who use init=/bin/sh don't expect a full system, yet they expect
a certain amount of system that allows them to do system recovery I
suppose. I don't consider the delta between "static nodes only" and
"devshmfs" to be significant here. In a recovery scenario, if you WANT
something dynamic you start the thing to do dynamic by hand. Otherwise
you want something predictable.


> We focused on the speed here, because we want to solve the initramfs
> problem, a problem you solved by getting rid of it entirely, which is
> what I do on all my own boxes too, but what the distro guys
> never want to accept.

Now you've lost me. I am missing the link between this and initramfs
entirely. How you do /dev has extremely little to do with initramfs or
not. Sure you need the rootfs device node before you can mount root for
the initramfs case, just like you need the rootfs device node before
you can fsck it in the non-initramfs case. In both cases you want all
the device nodes as fast as you can, but within the system policy of
ownership, permissions, selinux contexts, tmpfs mount options etc.
And Eric showed that that is a 0.06 second thing.. not a big deal in
the grand scheme of things, even if you want to boot the whole system
in 2 seconds like we do.

Also you're rather generalizing with "the distro guys"... the Moblin
distribution already does this for the cases where it is possible. I
wouldn't be surprised if other distros figured out how to detect this
case and ditch the initramfs when it's possible, while keeping it and
doing it cheaply when it's required to have an initramfs.

--
Arjan van de Ven Intel Open Source Technology Centre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/