Re: [PATCH 2/3] Linux: Use rseq in sched_getcpu if available (v9)

From: Florian Weimer
Date: Mon Jul 06 2020 - 09:59:50 EST


* Mathieu Desnoyers:

> When available, use the cpu_id field from __rseq_abi on Linux to
> implement sched_getcpu(). Fall-back on the vgetcpu vDSO if
> unavailable.

I've pushed this to glibc master, but unfortunately it looks like this
exposes a kernel bug related to affinity mask changes.

After building and testing glibc, this

for x in {1..2000} ; do posix/tst-affinity-static & done

produces some âerror:â lines for me:

error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0

âexpected 0â is a result of how the test has been written, it bails out
on the first failure, which happens with CPU ID 0.

Smaller systems can use a smaller count than 2000 to reproduce this. It
also happens sporadically when running the glibc test suite itself
(which is why it took further testing to reveal this issue).

I can reproduce this with the Debian 4.19.118-2+deb10u1 kernel, the
Fedora 5.6.19-300.fc32 kernel, and the Red Hat Enterprise Linux kernel
4.18.0-193.el8 (all x86_64).

As to the cause, I'd guess that the exit path in the sched_setaffinity
system call fails to update the rseq area, so that userspace can observe
the outdated CPU ID there.

Thanks,
Florian