Re: sysfs: tagged directories not merged completely yet

From: Tejun Heo
Date: Tue Oct 07 2008 - 08:21:56 EST


Eric W. Biederman wrote:
>> If the filler is a real concern, I think it's better to just decouple
>> it rather than making sysfs locking fine-grained. sysfs metadata
>> might as well be protected by a single spinlock if it can be decoupled
>> from vfs locking and stuff. It's just an in-memory tree which isn't
>> used too often.
>
> I think with a little care we can make the sysfs read side rcu
> protected which would remove any real locking from lookup
> and readdir.

IIRC, the original readdir implementation put a cursor entry to walk
through the children list. The implementation was horribly broken in
a number of different ways (ISTR problems with locking and multiple
and different type of walkers) and I just gutted out all the
complexity out and made it simple as getting it correct was far more
important and there seemed to be little need for optimization.

Yeah, using RCU sounds like a plan.

>> Generally, the VFS layer isn't too easy for sysfs which is a bit like
>> distributed filesystem but has more strict here-and-now rule (all
>> changes should be visible instantaneously). At the beginning, sysfs
>> didn't have much metadata itself, it just used the VFS data structures
>> but that was too large so sysfs_dirent got introduced and it tried to
>> update VFS data structures as necessary and (this is when I started
>> working on it) the current code and Eric's patcheset evolved from
>> there.
>>
>> Maybe it can be done better by taking more traditional distributed
>> filesystem approach - re/invalidation on access. I don't know whether
>> it will fit sysfs's needs but if it can be done, sysfs would be able
>> to ride along with other distributed filesystems and become much more
>> conventional in its interfacing with VFS.
>
> The revalidate on access model doesn't appear to have a way to track
> remote renames. Something sysfs supports.

Yeap, IIRC, one of the reasons why sysfs wasn't converted over to
sysfs was because sysfs guarantees inode doesn't change over rename or
move so that notifications keep working over renames.

> I have just spent a little bit of time thinking it through. I had
> previously thought that we could take advantage of the fact that
> sysfs only allows VFS reads we could fix our backwards lock ordering
> by optimizing the read side with rcu. Unfortunately the VFS still
> takes locks on rename and similar paths despite the fact sysfs does
> not implement those paths functions. Therefore whatever we do has
> to be handle all VFS operations even if we don't support them.
> Weird, but true.
>
> We may need to delay dentry unhashing until revalidate. I think I see
> some issues if we don't do that.

Ah... okay. It shouldn't be difficult, right?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/