Re: [PATCH 1/2] resubmit cciss: kernel thread to detect changes onMSA2012

From: James Bottomley
Date: Sat Mar 07 2009 - 15:37:15 EST


On Fri, 2009-03-06 at 15:56 -0800, Andrew Morton wrote:
> On Fri, 6 Mar 2009 17:29:18 -0600
> "Mike Miller (OS Dev)" <mikem@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Fri, Mar 06, 2009 at 12:24:27PM -0600, James Bottomley wrote:
> > > On Fri, 2009-03-06 at 12:16 -0600, Mike Miller wrote:
> > > > Patch 1 of 2
> > > >
> > > > This is a resubmission of yesterdays patch to detect changes on the MSA2012.
> > > > I hope I've addressed all concerns. This patch rearranges some of the code
> > > > so we also have coverage in the sg and the ioctl paths as well as the main
> > > > data path.
> > > >
> > > > The MSA2012 cannot inform the driver of configuration changes since all
> > > > management is out of band. This is a departure from any storage we have
> > > > supported in the past. We need some way to detect changes on the topology so
> > > > we implement this kernel thread. In some instances there's nothing we can do
> > > > from the driver (like LUN failure) so just print out a message. In the case
> > > > where logical volumes are added or deleted we call rebuild_lun_table to
> > > > refreash the driver's view of the world.
> > > >
> > > > Please consider this for inclusion.
> > >
> > > I still don't quite see how the thread stops on module removal ... there
> > > needs to be an explicit kthread_stop() somewhere in the clean up path.
> > >
> > > James
> > >
> > >
> > This time I make a call to kthread_stop in cciss_remove_one. The driver can
> > be unloaded and the thread gets cleaned up.
>
> Please include a complete (and suitably updated) copy of the changelog
> with each iteration of a patch.
>
>
> > KNOWN BUG: it seems the timeout must expire before kthread_stop actually
> > stops the thread. This causes the driver to hang and wait during rmmod. I've
> > played around with several things but haven't found the correct way to
> > address the problem. Looking at other drivers hasn't been much help. Any
> > advice is greatly appreciated.
>
> Well, wait_for_completion_timeout() is only going to return when the
> timeout timed out, or someone ran complete().
>
> > +static int scan_thread(ctlr_info_t *h)
> > +{
> > + int rc;
> > + DECLARE_COMPLETION_ONSTACK(wait);
> > + h->rescan_wait = &wait;
> > +
> > + while (!kthread_should_stop()) {
> > + rc = wait_for_completion_timeout(&wait, 300 * HZ);
> > + if (!rc)
> > + continue;
> > + else
> > + rebuild_lun_table(h, 0);
> > + }
> > + return 0;
> > +}
>
> So.. we shouldn't need the timeout here at all - just use
> wait_for_completion().
>
> static int scan_thread(ctlr_info_t *h)
> {
> DECLARE_COMPLETION_ONSTACK(wait);
>
> h->rescan_wait = &wait;
> for ( ; ; ) {
> wait_for_completion(&wait);
> if (kthread_should_stop())
> break;
> rebuild_lun_table(h, 0);
> }
> return 0;
> }
>
> And on the teardown path, do
>
> complete(...);
> kthread_stop(...);

This is racy ... although I think the race would only show in a pre-empt
kernel: complete causes the thread to run immediately pre-empting us.
Now it runs around the loop, through kthread_should_stop() and back to
wait_for_completion() before we get a chance to run kthread_stop().

The only way to avoid this seems to be to use wait queues and wake up
(kthread_stop does an automatic wake_up of the process, which is ignored
by completions).

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/