Re: [PATCHSET] FUSE: extend FUSE to support more operations

From: Tejun Heo
Date: Thu Nov 13 2008 - 10:11:17 EST


Hello,

Miklos Szeredi wrote:
>> I kind of like the original implementation tho. The f_ops->poll
>> interface is designed to be used like ->poll returning events if
>> available immediately and queue for later notification as necessary.
>> Notification is asynchronous and can be spurious (this actually comes
>> pretty handy for low level implementation). When notified, upper layer
>> queries the same way using ->poll. This is quite convenient for low
>> level implementation as the actual logic of poll can live in ->poll
>> proper while notifications can be scattered around places where events
>> can occur.
>
> Yes, that kind of interface is nice for f_ops->poll, and for libfuse.
>
> But for the kernel interface it's inefficient. A wake up event is 3
> context switches instead of one. And that's inherent in the interface
> itself for no good reason.

Event notification performance problem is usually in its scalability
not in each notification. It's nice to optimize that too but I don't
think it weighs too much especially for FUSE. Doing it request/reply
way could have scalability concerns, please see below.

> Also there's again the question of userspace filesystem messing with
> the caller: your original implementation allows the userspace
> filesystem to block f_ops->poll() forever, which really isn't what
> poll/select is about.

That would simply be a broken poll implementation just as O_NONBLOCK
read can block in ->read forever.

> So I'd still argue for the simple POLL-request/POLL-notify protocol on
> the kernel API, and possibly have the async notification similar to
> the kernel interface on the library API.
>
> Implementation wise I don't care all that much, but I'd actually
> prefer if it was implemented using the traditional request/reply thing
> and optimized (possibly later) to find requests in a more efficient
> way than searching the linear list, which would benefit not just poll
> but all requests.

Given that the number of in-flight requests are not too high, I think
linear search is fine for now but switching it to b-tree shouldn't be
difficult.

So, pros for req/reply approach.

* Less context switch per event notification.

* No need for separate async notification mechanism.

Cons.

* More interface impedence matching from libfuse.

* Higher overhead when poll/select finishes. Either all outstanding
requests need to be cancelled using INTERRUPT whenever poll/select
returns or kernel needs to keep persistent list of outstanding polls
so that later poll/select can reuse them. The problem here is that
kernel doesn't know when or whether they'll be re-used. We can put
in LRU-based heuristics but it's getting too complex. Note that
it's different from userland server keeping track. The same problem
exists with userland based tracking but for many servers it would be
just a bit in existing structure and we can be much more lax on
userland. ie. actual storage backed files usually don't need
notification at all as data is always available, so the amount of
overhead is limited in most cases but we can't assume things like
that for the kernel.

Overall, I think being lazy about cancellation and let userland notify
asynchronously would be better performance and simplicity wise. What
do you think?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/