Re: [Fwd: Re: connector is missing in 2.6.12-rc2-mm1]

From: Evgeniy Polyakov
Date: Thu Apr 07 2005 - 05:08:18 EST


On Thu, 2005-04-07 at 01:32 -0700, Andrew Morton wrote:
> Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:
> >
> > >
> > > Plus, I'm still quite unsettled about the whole object lifecycle
> > > management, refcounting and locking in there. The fact that the code is
> > > littered with peculiar barriers says "something weird is happening here",
> > > and it remains unobvious to me why such a very common kernel pattern was
> > > implemented in such an unusual manner.
> > >
> > > So. I'd like to see the whole thing reexplained and resubmitted so we can
> > > think about it all again.
> >
> > All those barriers can be replaced with atomic_dec_and_test(),
>
> What a shame you didn't say atomic_dec_and_lock()...

Oops...

> > i.e. with something that returns the value.
> > Methods that return value requires explicit barriers.
> >
> > Actually there are quite many places where we have:
> >
> > cpu0 cpu1
> > use object
> > atomic_dec()
> > if atomic_read/atomic_dec_and_test == 0
> > free object.
>
> Yes, but those places normally also use locking to prevent the obvious race.

In case of connector all free/remove/destroy places
can be accessed only after appropriate object is unlinked
and no new work [only that work remains that already incremented the
counter]
can be assigned to the object.
It was design goal [the same is true for w1, superio, acrypto]
to create such a lifecycle:
object is created
work is assigned with atomical refcnt incrementing
object is "scheduled"[1] to be removed
object is destroyed if it is not used [i.e. it's refcnt is zero].

[1]: "scheduled" means that work [removal in our case] can be done not
now, but after some condition, like refcnt zeroing.

> Yes, atomic_dec_and_test() and barrier removal would be better. Especially
> if it's associated with code commentary which explains why the whole thing
> isn't racy (ie: explains why no other CPU can look this object up).

atomic_dec_and_test() is more expensive than 2 barriers + atomic_dec(),
but in case of connector I think the price is not so high.

Should questionable design notes be described in in-source
documentation
or patch description is enough?

> > > Which comments were not addressed?
> >
> > CBUS code comments [I did not get ack on CBUS itself], and two below
> > issues.
>
> I continue to not see any point in cbus. It moves work from one place to
> another while increasing the amount of code and quite probably increasing
> the net amount of work too.

Ok.

Cache utilisation and ability to send events from any context
and under the locks are significant issues which are solved in CBUS,
but they are not the most important reasons.

Ability to not slow down fast pathes - that is the main reason.

Concider situation when one may want to have notification
of each new write system call - let's say without such
notification it took about 1 second to write one page from userspace,
now with notification sending, which is not so fast, it will take
1.5 seconds, but with CBUS write() still costs 1 second plus
later, when we do not care about writing performance and scheduler
decides to run CBUS thread, those notifications will take additional
0.5 seconds to be delivered.

But if one requires not delayed fact of the notification, but
almost immediate event - one can still use direct connector's methods.

So the main usage case for CBUS is sutuations when we do not want
to slow process down just to send notification about it,
instead we can create such a notification, and deliver it later.

It is design issue - queue events in fast pathes, lock contexts,
irq [soft irq] contexts and safely send them later.

It also provides better controlling mechanism over messages to be sent -
using limited number of messages to be sent in a time,
CBUS smoothes shape peaks which are created by huge amount of
notification
to be send in a time.

> IOW it looks like a net loss which happens to provide gains in one rather
> uninteresting microbenchmark.

No messages were lost during tests,
since there were no OOM condition [atomic allocation never fails]
and userspace application read it's socket queue properly,
so socket queue and thus connector was not starved by userspace.

That benchmark showed that CBUS/connector does not slow down fast
fork() path while delivering messages, and all notifications about
forks were delivered.

From your point of view it is better to complete notification work
before event itself is finished, but you definitely want to
queue block requests before they reach low-level device driver,
and queue skbs but not process them in receive IRQ handler.

Last two cases are created exactly for not slowing down fast path,
so other fast events [new interrupts] may occure and all them
[and the real work can be done] can be processed later.

> I'll gleefully admit that I'm wrong, but I don't think that has been
> demonstrated yet.
>
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

Attachment: signature.asc
Description: This is a digitally signed message part