Re: Report 2 in ext4 and journal based on v5.17-rc1

From: Theodore Ts'o
Date: Fri Mar 04 2022 - 22:40:47 EST


On Fri, Mar 04, 2022 at 12:20:02PM +0900, Byungchul Park wrote:
>
> I found a point that the two wait channels don't lead a deadlock in
> some cases thanks to Jan Kara. I will fix it so that Dept won't
> complain it.

I sent my last (admittedly cranky) message before you sent this. I'm
glad you finally understood Jan's explanation. I was trying to tell
you the same thing, but apparently I failed to communicate in a
sufficiently clear manner. In any case, what Jan described is a
fundamental part of how wait queues work, and I'm kind of amazed that
you were able to implement DEPT without understanding it. (But maybe
that is why some of the DEPT reports were completely incomprehensible
to me; I couldn't interpret why in the world DEPT was saying there was
a problem.)

In any case, the thing I would ask is a little humility. We regularly
use lockdep, and we run a huge number of stress tests, throughout each
development cycle.

So if DEPT is issuing lots of reports about apparently circular
dependencies, please try to be open to the thought that the fault is
in DEPT, and don't try to argue with maintainers that their code MUST
be buggy --- but since you don't understand our code, and DEPT must be
theoretically perfect, that it is up to the Maintainers to prove to
you that their code is correct.

I am going to gently suggest that it is at least as likely, if not
more likely, that the failure is in DEPT or your understanding of what
how kernel wait channels and locking works. After all, why would it
be that we haven't found these problems via our other QA practices?

Cheers,

- Ted