Re: [PATCH v3 5/5] cpusets, suspend: Save and restore cpusets duringsuspend/resume

From: David Rientjes
Date: Tue May 15 2012 - 17:49:48 EST


On Wed, 16 May 2012, Srivatsa S. Bhat wrote:

> What you are suggesting was precisely the v1 of this patchset, which went
> upstream as commit 8f2f748b06562 (CPU hotplug, cpusets, suspend: Don't touch
> cpusets during suspend/resume).
>
> It got reverted due to a nasty suspend hang in some corner case, where the
> sched domains not being up-to-date got the scheduler confused.
> Here is the thread with that discussion:
> http://thread.gmane.org/gmane.linux.kernel/1262802/focus=1286289
>
> As Peter suggested, I'll try to fix the issues at the 2 places that I found
> where the scheduler gets confused despite the cpu_active mask being up-to-date.
>
> But, I really want to avoid that scheduler fix and this cpuset fix from
> being tied together, for the fear that until we root-cause and fix all
> scheduler bugs related to cpu_active mask, we can never get cpusets fixed
> once and for all for suspend/resume. So, this patchset does an explicit
> save and restore to be sure, and so that we don't depend on some other/unknown
> factors to make this work reliably.
>

Ok, so it seems like this is papering over an existing cpusets issue or an
interaction with the scheduler that is buggy. There's no reason why a
cpuset.cpus that is a superset of cpu_active_mask should cause an issue
since that's exactly what the root cpuset has. I know root is special
cased all over the cpuset code, but I think the real fix here is to figure
out why it can't be left as a superset and then we end up doing nothing
for s/r.

I don't have a preference for cpu hotplug and whether cpuset.cpus = 1-3
remains 1-3 when cpu 2 is offlined or not, I think it could be argued both
ways, but I disagree with saving the cpumask, removing all suspended cpus,
and then reinstating it for no reason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/