Re: [PATCH] sysfs: Optionally count subdirectories to support buggy applications

From: Eric W. Biederman
Date: Thu Mar 08 2012 - 17:15:26 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Mon, Mar 5, 2012 at 8:09 AM, Greg Kroah-Hartman
> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>>
>> I don't remember. ÂI thought there was a proposed patch for this issue
>> from Eric, but I don't see it in my queue anywhere.
>
> That patch was an abortion. Adding a config option for behavior like
> this is totally bogus, and the only reason for that config option was
> that sysfs did silly things.

The biggest reason it is bogus is that it doesn't get properly tested
or reviewed. Sigh. My first patch to fix things had a bad typo
that everyone missed.

> It's only in -next, though, I was assuming that the whole "Kill nlink
> counting" commit never makes it to me. Because I won't take it.
>
> I outlined how the counting could easily be done without actually
> having to maintain an explicit count in the sysfs.

And if you had bothered to look you would have seen how we used to
have that code and it was removed because it was a performance
bottleneck.

> Or we should just keep doing the counting.

The current counting that we do gives the wrong numbers, in the
edge cases. To my knowledge a deleted sysfs directory has never
returned nlink == 0.

Keeping compatibility is easy enough that it looks like it is worth
doing, but maintaining 30+ years of backwards compatibility is what
nlink >1 in unix filesystem directories is. I don't see any practical
sense in keeping . and .. directories on disk or upping the unix
nlink directory count because of them. To me it looks like just one
of those things you do. Like hash directory entries so you can
have a big directory and still be able to have a 32bit offset you
can pass to lseek that is stable across renames and deletes.

>From the point of view of maintaining sysfs a 32bit nlink_t in sysfs is
too small. It is wrong for sysfs to refuse to represent devices that
exist and I have heard of machines that have enough memory it possible
to create more than 2^32 network devices. So sysfs must handle overflow
and sysfs must use the nlink == 1 in some cases. I was just thinking
we would get better userspace test coverage if we don't bother to handle
the other cases.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/