Re: [dm-devel] [PATCH v5] fault-injection: introduce kvmalloc fallback options

From: Mikulas Patocka
Date: Thu Apr 26 2018 - 15:36:27 EST




On Thu, 26 Apr 2018, Michael S. Tsirkin wrote:

> On Thu, Apr 26, 2018 at 02:54:26PM -0400, Mikulas Patocka wrote:
> >
> >
> > On Thu, 26 Apr 2018, Michael S. Tsirkin wrote:
> >
> > > On Thu, Apr 26, 2018 at 12:07:25PM -0400, Mikulas Patocka wrote:
> > > > > IIUC debug kernels mainly exist so people who experience e.g. memory
> > > > > corruption can try and debug the failure. In this case, CONFIG_DEBUG_SG
> > > > > will *already* catch a failure early. Nothing special needs to be done.
> > > >
> > > > The patch helps people debug such memory coprruptions (such as using DMA
> > > > API on the result of kvmalloc).
> > >
> > > That's my point. I don't think your patch helps debug any memory
> > > corruptions. With CONFIG_DEBUG_SG using DMA API already causes a
> > > BUG_ON, that's before any memory can get corrupted.
> >
> > The patch turns a hard-to-reproduce bug into an easy-to-reproduce bug.
>
> It's still not a memory corruption. It's a BUG_ON the source of which -
> should it trigger - can be typically found using grep.
>
> > Obviously we don't want this in production kernels, but in the debug
> > kernels it should be done.
> >
> > Mikulas
>
> I'm not so sure. debug kernels should make debugging easier,
> definitely.
>
> Unfortunately they are already slower so some races don't trigger.
>
> If they also start crashing more because we are injecting
> memory allocation errors, people are even less likely to
> be able to use them.

I've actually already pushed this patch to RHEL-7 (just before 7.5 was
released) and it found out some powerpc issues. See the commit
ea376cc55bc3 in the RHEL-7 git. It was reverted just before RHEL-7.5 was
released with the intention that it will be reinstated just after RHEL-7.5
release, so that these issues could be found and eliminated in the
7.5->7.6 development cycle. Jeff Moyer asked me to put it upstream because
they want to follow upstream and they don't like RHEL-specific patches.
There's clear incentive to put this patch to RHEL-7, that's why I'm
posting it here.

> Just add a comment near the BUG_ON within DMA API telling people how
> they can inject this error some more if the bug does not
> reproduce, and leave it at that.

But the problem is that the powerpc bug only triggers with this patch. It
doesn't trigger without it. So, we have a potential random-crashing bug in
the codebase (and perhaps more others) and we want to eliminate them -
that's why we need the patch.

People on this list argue "this should be a kernel parameter". But the
testers won't enable the kernel parameter, the crashes won't happen
without the kernel parameter and the bugs will stay unreported and
uncorrected. That's why it needs to be the default.

Mikulas