Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

From: Evgeniy Polyakov
Date: Sat Mar 03 2007 - 15:37:49 EST


On Sat, Mar 03, 2007 at 10:46:59AM -0800, Davide Libenzi (davidel@xxxxxxxxxxxxxxx) wrote:
> On Sat, 3 Mar 2007, Evgeniy Polyakov wrote:
>
> > > You've to excuse me if my memory is bad, but IIRC the whole discussion
> > > and loong benchmark feast born with you throwing a benchmark at Ingo
> > > (with kevent showing a 1.9x performance boost WRT epoll), not with you
> > > making any other point.
> >
> > So, how does it sound?
> > "Threadlets are bad for IO because kevent is 2 times faster than epoll?"
> >
> > I said threadlets are bad for IO (and we agreed that both approaches
> > shouldbe usedfor the maximum performance) because of rescheduling overhead -
> > tasks are quite heavy structuresa to move around - even pt_regs copy
> > takes more than event structure, but not because there is something in other
> > galaxy which might work faster than another something in another galaxy.
> > That was stupid even to think about that.
>
> Evgeny, other folks on this thread read what you said, so let's not drag
> this over.

Sure, I was wrong to start this again, but try to get my position - I
really tired from trying to prove that I'm not a camel just because we
had some misunderstanding at the start.

I do think that threadlets are relly cool solution and are indeed very
good approach for majority of the parallel processing, but my point is
still that it is not a perfect solution for all tasks.

Just to draw a line: kevent example is extrapolation of what can be
achieved with event-driven model, but that does not mean that it must be
_only_ used for AIO model - threadlets _and_ event driven model (yes, I
accepted Ingo's point about its declining) is the best solution.

> > > And if you really feel raw about the single O(nready) loop that epoll
> > > currently does, a new epoll_wait2 (or whatever) API could be used to
> > > deliver the event directly into a userspace buffer [1], directly from the
> > > poll callback, w/out extra delivery loops (IRQ/event->epoll_callback->event_buffer).
> >
> > > [1] From the epoll callback, we cannot sleep, so it's gonna be either an
> > > mlocked userspace buffer, or some kernel pages mapped to userspace.
> >
> > Callbacks never sleep - they add event into list just like current
> > implementation (maybe some lock must be changed from mutex to spinlock,
> > I do not rememeber), main problem is binding to the file structure,
> > which is heavy.
>
> I was referring to dropping an event directly to a userspace buffer, from
> the poll callback. If pages are not there, you might sleep, and you can't
> since the wakeup function holds a spinlock on the waitqueue head while
> looping through the waiters to issue the wakeup. Also, you don't know from
> where the poll wakeup is called.

Ugh, no, that is very limited solution - memory must be either pinned
(which leads to dos and limited ring buffer), or callback must sleep.
Actually in any way there _must_ exist a queue - if ring buffer is full
event is not allowed to be dropped - it must be stored in some other
place, for example in queue from where entries will be read (copied)
which ring buffer will have entries (that is how it is implemented in
kevent at least).

> File binding heavy? The first, and by *far* biggest, source of events
> inside an event collector, of someone that cares about scalability, are
> sockets. And those are already files. Second would be AIO, and those (if
> performance figures agrees) can be hosted inside syslets/threadlets.
> Then you fall into the no-care category, where the extra 100 bytes do not
> make a case against the ability of using it with an existing POSIX
> infrastructure (poll/select).

Well, sockets are the files indeed, and sockets already are perfectly
handled by epoll - but there are other users of petential interace - and
it must be designed to scale in _any_ situation very well.
Even if we right now do not have problems with some types of events, we
must scale with any new one.

> BTW, Linus made a signalfd sketch code time ago, to deliver signals to an
> fd. Code remained there and nobody cared. Question: Was it because
> 1) it had file bindings or 2) because nobody really cared to deliver
> signals to an event collector?
> And *if* later requirements come, you don't need to change the API by
> adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new
> XXEVENT-only submission structure. You create an API that automatically
> makes that new abstraction work with POSIX poll/select, and you get epoll
> support for free. Without even changing a bit in the epoll API.

Well, we get epoll support for free, but we need to create tons of other
interfaces and infrastructure for kernel users, and we need to change
userspace anyway.
But epoll support requires to have quite heavy bindings to file
structure, so why don't we want to design new interface (since we need
to change userspace anyway) so that it could allow to scale and be very
memory optimized from the beginning?

> - Davide
>

--
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/