Re: cn_queue.c

From: Andrew Morton
Date: Fri Apr 01 2005 - 05:45:03 EST


Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:
>
> On Fri, 2005-04-01 at 01:50 -0800, Andrew Morton wrote:
> > Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:
> > >
> > > cn_queue_free_dev() will wait until dev->refcnt hits zero
> > > before freeing any resources,
> > > but it can happen only after cn_queue_del_callback() does
> > > it's work on given callback device [actually when all callbacks
> > > are removed].
> > > When new callback is added into device, it's refcnt is incremented
> > > [before adition btw, if addition fails in the middle, reference is
> > > decremented], when callbak is removed, device's reference counter
> > > is decremented aromically after all work is finished.
> >
> > hm.
> >
> > How come cn_queue_del_callback() uses all those barriers if no other CPU
> > can grab new references against cbq->cb->refcnt?
>
> The work may be already assigned to that callback device,
> new work cant, barriers are there to ensure that
> reference counters are updated in proper places, but not
> before.

What are the "proper places"? What other control paths could be inspecting
the refcount at this time? (That's the problem with barriers - you can't
tell what they are barriering against).

> It would be a bug to update dev->refcnt before assigned work is finished
> and callback removed.
>
> > cn_queue_free_callback() forgot to do flush_workqueue(), so
> > cn_queue_wrapper() can still be running while cn_queue_free_callback()
> > frees up the cn_callback_entry, I think.
>
> cn_queue_wrapper() atomically increments cbq->cb->refcnt if runs, so it
> will
> be caught in
> while (atomic_read(&cbq->cb->refcnt))
> msleep(1000);
> in cn_queue_free_callback().
> If it does not run, then all will be ok.

But there's a time window on entry to cn_queue_wrapper() where the recsount
hasn't been incremented yet, and there's no locking. If
cn_queue_free_callback() inspects the refcount in that window it will free
the cn_callback_entry() while cn_queue_wrapper() is playing with it?

> Btw, it looks like comments for del_timer_sync() and cancel_delayed_work
> ()
> are controversial - del_timer_sync() says that pending timer
> can not run on different CPU after returning,
> but cancel_delayed_work() says, that work to be cancelled still
> can run after returning.

Not controversial - the timer can have expired and have been successfully
deleted but the work_struct which the timer handler scheduled is still
pending, or has just started to run.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/