NMI reason 2d when running perf

From: Jens Axboe
Date: Fri Mar 17 2023 - 10:25:58 EST


Hi,

When running perf on my Dell R7525 on a running process, I get a ton of:

[ 504.234782] Dazed and confused, but trying to continue
[ 504.267843] Uhhuh. NMI received for unknown reason 2d on CPU 48.
[ 504.267846] Dazed and confused, but trying to continue
[ 504.335975] Uhhuh. NMI received for unknown reason 2d on CPU 48.
[ 504.335977] Dazed and confused, but trying to continue
[ 504.368031] Uhhuh. NMI received for unknown reason 2d on CPU 48.
[ 504.368033] Dazed and confused, but trying to continue
[ 504.371037] Uhhuh. NMI received for unknown reason 2d on CPU 48.
[ 504.371038] Dazed and confused, but trying to continue
[ 504.439165] Uhhuh. NMI received for unknown reason 2d on CPU 48.
[ 504.439167] Dazed and confused, but trying to continue

spew in dmesg. The box has 2x7763 CPUS. This seems to be a recent
regression, been using this box for a while and haven't seen this
before. The test being traced is pinned to CPU 48. The box is currently
running:

commit 6015b1aca1a233379625385feb01dd014aca60b5 (origin/master, origin/HEAD)
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Tue Mar 14 19:32:38 2023 -0700

sched_getaffinity: don't assume 'cpumask_size()' is fully initialized

with the pending block/io_uring branches merged in for testing.

--
Jens Axboe