Re: cpumask: fix compat getaffinity

From: KOSAKI Motohiro
Date: Mon May 17 2010 - 02:04:56 EST


> On Wed, 12 May 2010 06:00:45 pm Milton Miller wrote:
> >
> > At least for parsing, we need to allocate and parse NR_CPUS until
> > all places like arch/powerpc/platforms/pseries/xics.c that compare a
> > user-supplied mask to CPUMASK_ALL are eliminated.
>
> Good point. Anton will want to fix those anyway for CONFIG_CPUMASK_OFFSTACK,
> too, but that's the reason the parsing uses nr_cpumask_bits.
>
> > > Would it make sense to use my initial patch for -stable, which reverts
> > > the ABI back to before the change that caused the problem, but apply
> > > the correct fix (changing the ABI throughout) for future releases?
> >
> > This would definitly be the conservative fix.
>
> Instead of changing back to NR_CPUS which will break libnuma for
> CPUMASK_OFFSTACK, how about changing it to nr_cpumask_bits and having an
> explicit comment above it:

Yes and No.

1) sched_getaffinity syscall is used from glibc and libnuma.
2) glibc doesn't use the return value almostly. glibc emulate it as NR_CPUS=1024.
3) Now, both sched_getaffinity() and compat_sys_sched_getaffinity() have nr_cpu_ids thing.
4) But only compat_sys_sched_getaffinity() hit libnuma problem.

I think It mean compat_sys_sched_getaffinity() should behave as sched_getaffinity().
IOW, libnuma assume compat_sys_sched_getaffinity() return len args or NR_CPUS.
then, following patch do it. I confirmed the patch works with or without CPUMASK_OFFSTACK.

So, My proposal is
1) merge both mine and yours to linus tree
2) but backport only mine


How do you think?


==========================================================
From: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
Subject: [PATCH] cpumask: fix compat getaffinity

Commit a45185d2d "cpumask: convert kernel/compat.c" broke
libnuma, which abuses sched_getaffinity to find out NR_CPUS
in order to parse /sys/devices/system/node/node*/cpumap.

On NUMA systems with less than 32 possibly CPUs, the
current compat_sys_sched_getaffinity now returns '4'
instead of the actual NR_CPUS/8, which makes libnuma
bail out when parsing the cpumap.

The libnuma call sched_getaffinity(0, bitmap, 4096)
at first. It mean the libnuma expect the return value of
sched_getaffinity() is either len argument or NR_CPUS.
But it doesn't expect to return nr_cpu_ids.

Strictly speaking, userland requirement are

1) Glibc assume the return value mean the lengh of initialized
of mask argument. E.g. if sched_getaffinity(1024) return 128,
glibc make zero fill rest 896 byte.
2) Libnuma assume the return value can be used to guess NR_CPUS
in kernel. It assume len-arg<NR_CPUS makes -EINVAL. But
it try len=4096 at first and 4096 is always bigger than
NR_CPUS. Then, if we remove strange min_length normalization,
we never hit -EINVAL case.

sched_getaffinity() already solved this issue. This patch
adapt compat_sys_sched_getaffinity() as it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
---
kernel/compat.c | 25 +++++++++++--------------
1 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/kernel/compat.c b/kernel/compat.c
index 7f40e92..5adab05 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -495,29 +495,26 @@ asmlinkage long compat_sys_sched_getaffinity(compat_pid_t pid, unsigned int len,
{
int ret;
cpumask_var_t mask;
- unsigned long *k;
- unsigned int min_length = cpumask_size();
-
- if (nr_cpu_ids <= BITS_PER_COMPAT_LONG)
- min_length = sizeof(compat_ulong_t);

- if (len < min_length)
+ if ((len * BITS_PER_BYTE) < nr_cpu_ids)
+ return -EINVAL;
+ if (len & (sizeof(compat_ulong_t)-1))
return -EINVAL;

if (!alloc_cpumask_var(&mask, GFP_KERNEL))
return -ENOMEM;

ret = sched_getaffinity(pid, mask);
- if (ret < 0)
- goto out;
+ if (ret == 0) {
+ size_t retlen = min_t(size_t, len, cpumask_size());

- k = cpumask_bits(mask);
- ret = compat_put_bitmap(user_mask_ptr, k, min_length * 8);
- if (ret == 0)
- ret = min_length;
-
-out:
+ if (compat_put_bitmap(user_mask_ptr, cpumask_bits(mask), retlen * 8))
+ ret = -EFAULT;
+ else
+ ret = retlen;
+ }
free_cpumask_var(mask);
+
return ret;
}

--
1.6.5.2







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/