Re: NMI hardlock stacktrace deadlock [was Re: Linux 5.2-rc5]

From: Chris Wilson
Date: Wed Jun 19 2019 - 15:24:55 EST


Quoting Linus Torvalds (2019-06-19 19:49:37)
> On Wed, Jun 19, 2019 at 5:40 AM Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > I haven't bisected this, but with the merge of rc5 into our CI we
> > started hitting an issue that resulted in a oops and the NMI watchdog
> > firing as we dumped the ftrace.
>
> Do you have the oops itself at all?

An example at
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/dmesg0.log
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6310/fi-kbl-x1275/boot0.log

The bug causing the oops is clearly a driver problem. The rc5 fallout
just seems to be because of some shrinker changes affecting some object
reaping that were unfortunately still active. What perturbed the CI
team was the machine failed to panic & reboot.
-Chris