Re: [PATCH v3 5/5] cpusets, suspend: Save and restore cpusets duringsuspend/resume

From: Srivatsa S. Bhat
Date: Tue May 15 2012 - 18:17:53 EST


On 05/16/2012 03:19 AM, David Rientjes wrote:

> On Wed, 16 May 2012, Srivatsa S. Bhat wrote:
>
>> What you are suggesting was precisely the v1 of this patchset, which went
>> upstream as commit 8f2f748b06562 (CPU hotplug, cpusets, suspend: Don't touch
>> cpusets during suspend/resume).
>>
>> It got reverted due to a nasty suspend hang in some corner case, where the
>> sched domains not being up-to-date got the scheduler confused.
>> Here is the thread with that discussion:
>> http://thread.gmane.org/gmane.linux.kernel/1262802/focus=1286289
>>
>> As Peter suggested, I'll try to fix the issues at the 2 places that I found
>> where the scheduler gets confused despite the cpu_active mask being up-to-date.
>>
>> But, I really want to avoid that scheduler fix and this cpuset fix from
>> being tied together, for the fear that until we root-cause and fix all
>> scheduler bugs related to cpu_active mask, we can never get cpusets fixed
>> once and for all for suspend/resume. So, this patchset does an explicit
>> save and restore to be sure, and so that we don't depend on some other/unknown
>> factors to make this work reliably.
>>
>
> Ok, so it seems like this is papering over an existing cpusets issue or an
> interaction with the scheduler that is buggy.


Well, I don't think so. See below.

> There's no reason why a
> cpuset.cpus that is a superset of cpu_active_mask should cause an issue
> since that's exactly what the root cpuset has.


No, even root cpuset is modified during hotplug (and hence during suspend/resume).
If you look at the code, the root cpuset's cpuset.cpus is always exactly equal
to cpu_active_mask (and it is updated during every hotplug, both online and
offline).

> I know root is special
> cased all over the cpuset code, but I think the real fix here is to figure
> out why it can't be left as a superset and then we end up doing nothing
> for s/r.
>
> I don't have a preference for cpu hotplug and whether cpuset.cpus = 1-3
> remains 1-3 when cpu 2 is offlined or not, I think it could be argued both
> ways, but I disagree with saving the cpumask, removing all suspended cpus,
> and then reinstating it for no reason.
>


I think there is a valid reason behind doing that.

Cpusets translates to sched domains in scheduler terms. So whenever you update
cpusets, the sched domains are updated. IOW, if you don't touch cpusets during
hotplug (suspend/resume case), you allow them to have offline cpus, meaning,
you allow sched domains to have offline cpus. Hence sched domains are rendered
stale.

Now, from my point of view, keeping the sched domains stale and expecting
the scheduler to work flawlessly doesn't seem right.

However, the cpu_active_mask was introduced to handle situations where hotplug
transition is still in progress, and the scheduler needs to take appropriate
decisions even with some of its data-structures in an inconsistent/stale state.
But once the hotplug operation is completed, the scheduler doesn't need to
depend on cpu_active_mask.

Now coming to the suspend/resume/cpuset case, what you are suggesting will
essentially boil down to keeping the sched domains stale not just _during_
hotplug transitions, but also after that, in fact across multiple hotplug
operations in succession! So, from my point of view, the real bug is keeping
the sched domains stale for so long, across hotplug operations.
(And on those lines, making the scheduler work correctly even in such cases
is only a good-to-have as a robustness measure and not a "bugfix".)

So to avoid keeping the sched domains stale during suspend/resume, we need
to update cpusets for every hotplug as necessary, and this means we need to do
an explicit save and restore.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/