Re: [RFC 0/3] Extend type checking macros

From: Srivatsa S. Bhat
Date: Sat Apr 14 2012 - 18:32:20 EST


On 04/15/2012 02:48 AM, Srivatsa S. Bhat wrote:

> On 04/15/2012 03:44 AM, Sasha Levin wrote:
>
>> Commit e3831ed ("sched: Fix incorrect usage of for_each_cpu_mask() in
>> select_fallback_rq()") fixes a very non obvious bug in select_fallback_rq()
>> which was caused by passing 'struct cpumask' instead of 'struct cpumask *'
>> to a macro in include/linux/cpumask.h
>>
>
>
> Good heavens! I just found out that *each* *and* *every* *one* of the
> existing 12 users of for_each_cpu_mask() are wrong!! Unbelievable!
>


Never mind.. I think I was blind! Gah, having 2 helpers to do the same
thing is confusing, no doubt!

So, looking again, for_each_cpu_mask() does expect the second value to
be struct cpumask, and not struct cpumask *. Which means, we saw the
warning (http://thread.gmane.org/gmane.linux.kernel/1274579/) for an
entirely different reason than what the changelog of commit e3831ed says!

Looking further, the problem is fairly simple:
for_each_cpu() and friends (eg: cpumask_next) cap at nr_cpu_ids.
for_each_cpu_mask() and friends (eg: __next_cpu) cap at NR_CPUS.

And cpumask_test_cpu() that was used in select_fallback_rq() essentially
boils down to cpumask_check(), which uses nr_cpumask_bits as the cap,
which can be either nr_cpu_ids or NR_CPUS depending on whether we have
CONFIG_CPUMASK_OFFSTACK set or unset.

IOW, we got the warning because we mixed up NR_CPUS and nr_cpu_ids.
And commit e3831ed fixed it because it properly matched the functions
used, so that they cap at, and check for the same value; and not because
of a mismatch between the type of arguments expected vs sent to
for_each_cpu_mask(), as its changelog states.
(No wonder Peter's code compiled without problems...)

One more thing: the old interface decides between nr_cpu_ids vs NR_CPUS
depending on whether NR_CPUS > 64, whereas the new interface decides
based on the config option - CPUMASK_OFFSTACK.
All in all, lots of confusion and incompatibility.

So I think we need to standardize the values - use nr_cpumask_bits
everywhere (since it can become nr_cpu_ids or NR_CPUS depending on our
config options), and be done with it. Then we wouldn't have such
mismatches and bugs.

Don't know how many more places have got these mixed up.. All the more
reason to get rid of for_each_cpu_mask(), if you ask me..

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/