Re: workqueues and percpu (was: [PATCH] dm: remake of the veritytarget)

From: Mandeep Singh Baines
Date: Fri Mar 09 2012 - 16:15:26 EST


Tejun Heo (tj@xxxxxxxxxx) wrote:
> Hello,
>
> On Thu, Mar 08, 2012 at 02:39:09PM -0800, Andrew Morton wrote:
> > > I looked at it --- and using percpu variables in workqueues isn't safe
> > > because the workqueue can change CPU if the CPU is hot-unplugged.
>
> Generally, I don't think removing preemption enable/disable around
> percpu variable access is a worthwhile optimization unles it's on
> really really really hot path. We'll eventually add debug annotations
> to percpu accessors and the ones used outside proper preemption
> protections would need to be updated anyway.
>

In this case, I need the per-cpu data for the duration of calculating
a cryptographics hash on a 4K page of data. That's a long time to disable
pre-emption.

I could fix the bug temporarily by adding get/put for the per_cpu data
but would that be acceptable? I'm not sure what the OK limit is for how
long one can disable preemption. An alternative fix would be not allow
CONFIG_VERITY when CONFIG_HOTPLUG_CPU. Once workqueues are fixed, I could
remove that restriction.

Thoughts?

> > > dm-crypt has the same bug --- it also uses workqueue with per-cpu
> > > variables and assumes that the CPU doesn't change for a single work item.
> > >
> > > This program shows that work executed in a workqueue can be switched to a
> > > different CPU.
> > >
> > > I'm wondering how much other kernel code assumes that workqueues are bound
> > > to a specific CPU, which isn't true if we unplug that CPU.
> >
> > ugh.
> >
> > We really don't want to have to avoid using workqueues because of some
> > daft issue with CPU hot-unplug.
>
> Using or not using wq is orthogonal tho. Using kthreads directly
> requires hooking into CPU hotplug callbacks and one might as well call
> flush_work_sync() from there instead of shutting down kthread.
>
> > And yes, there are assumptions in various work handlers that they
> > will be pinned to a single CPU. Finding and fixing those
> > assumptions would be painful.
> >
> > Heck, even debug_smp_processor_id() can be wrong in the presence of the
> > cpu-unplug thing.
>
> Yeah, that's a generic problem with cpu unplug.
>
> > I'm not sure what we can do about it really, apart from blocking unplug
> > until all the target CPU's workqueues have been cleared. And/or refusing
> > to unplug a CPU until all pinned-to-that-cpu kernel threads have been
> > shut down or pinned elsewhere (which is the same thing, only more
> > general).
> >
> > Tejun, is this new behaviour? I do recall that a long time ago we
> > wrestled with unplug-vs-worker-threads and I ended up OK with the
> > result, but I forget what it was. IIRC Rusty was involved.
>
> Unfortunately, yes, this is a new behavior. Before, we could have
> unbound delays during unplug from work items. Now, we have CPU
> affinity assumption breakage. The behavior change was primarily to
> allow long running work items to use regular workqueues without
> worrying about inducing delay across cpu hotplug operations, which is
> important as it's also used on suspend / hibernation, especially on
> mobile platforms.
>
> During the cmwq conversion, I ended up auditing a lot of (I think I
> went through most of them) workqueue users and IIRC there weren't too
> many which required stable affinity.
>
> > That being said, I don't think it's worth compromising the DM code
> > because of this workqueue wart: lots of other code has the same wart,
> > and we should find a centralised fix for it.
>
> Probably the best way to solve this is introducing pinned attribute to
> workqueues and have them drained automatically on cpu hotplug events.
> It'll require auditing workqueue users but I guess we'll just have to
> do it given that we need to actually distinguish the ones need to be
> pinned. Or maybe we can use explicit queue_work_on() to distinguish
> the ones which require pinning.
>
> Another approach would be requiring all workqueues to be drained on
> cpu offlining and requiring any work item which may stall to use
> unbound wq. IMHO, picking out the ones which may stall would be much
> less obvious than the ones which require cpu pinning.
>
> Better ideas?
>
> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/