Re: cn_queue.c

From: Evgeniy Polyakov
Date: Fri Apr 01 2005 - 06:07:07 EST


On Fri, 2005-04-01 at 02:43 -0800, Andrew Morton wrote:
> Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:
> >
> > On Fri, 2005-04-01 at 01:50 -0800, Andrew Morton wrote:
> > > Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:
> > > >
> > > > cn_queue_free_dev() will wait until dev->refcnt hits zero
> > > > before freeing any resources,
> > > > but it can happen only after cn_queue_del_callback() does
> > > > it's work on given callback device [actually when all callbacks
> > > > are removed].
> > > > When new callback is added into device, it's refcnt is incremented
> > > > [before adition btw, if addition fails in the middle, reference is
> > > > decremented], when callbak is removed, device's reference counter
> > > > is decremented aromically after all work is finished.
> > >
> > > hm.
> > >
> > > How come cn_queue_del_callback() uses all those barriers if no other CPU
> > > can grab new references against cbq->cb->refcnt?
> >
> > The work may be already assigned to that callback device,
> > new work cant, barriers are there to ensure that
> > reference counters are updated in proper places, but not
> > before.
>
> What are the "proper places"? What other control paths could be inspecting
> the refcount at this time? (That's the problem with barriers - you can't
> tell what they are barriering against).

Device's refcnt must be updated after work is done, not before.
Callback's refcnt must be incremented before work is done, and
decremented after.

Without barriers all above cases can be mixed.

> > It would be a bug to update dev->refcnt before assigned work is finished
> > and callback removed.
> >
> > > cn_queue_free_callback() forgot to do flush_workqueue(), so
> > > cn_queue_wrapper() can still be running while cn_queue_free_callback()
> > > frees up the cn_callback_entry, I think.
> >
> > cn_queue_wrapper() atomically increments cbq->cb->refcnt if runs, so it
> > will
> > be caught in
> > while (atomic_read(&cbq->cb->refcnt))
> > msleep(1000);
> > in cn_queue_free_callback().
> > If it does not run, then all will be ok.
>
> But there's a time window on entry to cn_queue_wrapper() where the recsount
> hasn't been incremented yet, and there's no locking. If
> cn_queue_free_callback() inspects the refcount in that window it will free
> the cn_callback_entry() while cn_queue_wrapper() is playing with it?

If we already run cn_queue_wrapper() [even before refcnt incrementing,
probably it is not even needed there], then cancel_delayed_work() will
sleep,
since appropriate timer will be deleted in del_timer_sync(), which
will wait untill it is finished on the different CPU.

> > Btw, it looks like comments for del_timer_sync() and cancel_delayed_work
> > ()
> > are controversial - del_timer_sync() says that pending timer
> > can not run on different CPU after returning,
> > but cancel_delayed_work() says, that work to be cancelled still
> > can run after returning.
>
> Not controversial - the timer can have expired and have been successfully
> deleted but the work_struct which the timer handler scheduled is still
> pending, or has just started to run.

Ugh, I see now.
There are two levels of work deferring - into cpu_workqueue_struct,
and, in case of delayed work, into work->timer.


Yes, I need to place flush_workqueue() in cn_queue_free_callback();

--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

Attachment: signature.asc
Description: This is a digitally signed message part