Re: About the try to remove cross-release feature entirely by Ingo

From: Byungchul Park
Date: Fri Dec 29 2017 - 04:46:56 EST


On 12/29/2017 5:09 PM, Amir Goldstein wrote:
On Fri, Dec 29, 2017 at 3:47 AM, Byungchul Park <byungchul.park@xxxxxxx> wrote:
On Wed, Dec 13, 2017 at 03:24:29PM +0900, Byungchul Park wrote:
Lockdep works, based on the following:

(1) Classifying locks properly
(2) Checking relationship between the classes

If (1) is not good or (2) is not good, then we
might get false positives.

For (1), we don't have to classify locks 100%
properly but need as enough as lockdep works.

For (2), we should have a mechanism w/o
logical defects.

Cross-release added an additional capacity to
(2) and requires (1) to get more precisely classified.

Since the current classification level is too low for
cross-release to work, false positives are being
reported frequently with enabling cross-release.
Yes. It's a obvious problem. It needs to be off by
default until the classification is done by the level
that cross-release requires.

But, the logic (2) is valid and logically true. Please
keep the code, mechanism, and logic.

I admit the cross-release feature had introduced several false positives
about 4 times(?), maybe. And I suggested roughly 3 ways to solve it. I
should have explained each in more detail. The lack might have led some
to misunderstand.

(1) The best way: To classify all waiters correctly.

Ultimately the problems should be solved in this way. But it
takes a lot of time so it's not easy to use the way right away.
And I need helps from experts of other sub-systems.

While talking about this way, I made a trouble.. I still believe
that each sub-system expert knows how to solve dependency problems
most, since each has own dependency rule, but it was not about
responsibility. I've never wanted to charge someone else it but me.

(2) The 2nd way: To make cross-release off by default.

At the beginning, I proposed cross-release being off by default.
Honestly, I was happy and did it when Ingo suggested it on by
default once lockdep on. But I shouldn't have done that but kept
it off by default. Cross-release can make some happy but some
unhappy until problems go away through (1) or (2).

(3) The 3rd way: To invalidate waiters making trouble.

Of course, this is not the best. Now that you have already spent
a lot of time to fix original lockdep's problems since lockdep was
introduced in 2006, we don't need to use this way for typical
locks except a few special cases. Lockdep is fairly robust by now.

And I understand you don't want to spend more time to fix
additional problems again. Now that the situation is different
from the time, 2006, it's not too bad to use this way to handle
the issues.


Purely logically, aren't you missing a 4th option:

(4) The 4th way: To validate specific waiters.


Hello,

Thanks for your opinion. I will add my opinion on you.

Is it not an option for a subsystem to opt-in for cross-release validation
of specific locks/waiters? This may be a much preferred route for cross-

Yes. I think it can be a good option.

I think we have to choose a better one between (3) and (4) depending
on the following:

In case that there are few waiters making trouble, it would be
better to choose (3).

In case that there are a lot of waiter making trouble, it would be
better to chosse (4).

I think (3) is better for now because there's only one or two cases
making us hard to handle it. However, if you don't agree, I also
think (4) can be an available option.

release. I remember seeing a post from a graphic driver developer that
found cross-release useful for finding bugs in his code.

For example, many waiters in kernel can be waiting for userspace code,
so does that mean the cross-release is going to free the world from
userspace deadlocks as well?? Possibly I am missing something.

I don't see what you are saying exactly.. but cross-release can be
used if we know (a) the spot waiting for an event and (3) the other
spot triggering the event. Please explain it more if I miss something.

In any way, it seem logical to me that some waiters should particpate
in lock chain dependencies, while other waiters should break the chain
to avoid false positives and to avoid protecting against user configurable
deadlocks (like loop mount over file inside the loop mounted fs).

For example, when we had cross-release enabled, the following chain
was built and false positives were produced:

link 1: ext4 spin lock class A (in a lower fs) ->
waiter class B (in submit_bio_wait())

link 2: waiter class B (in submit_bio_wait()) ->
ext4 spin lock class A (in an upper fs)

Even though conceptually it should have been "class A in lower fs
!= class A in upper fs", current code registers these two as class A.

So we need to correct the chain like, using (1):

link 1: ext4 spin lock class A (in a lower fs) ->
waiter class B (in submit_bio_wait())

link 2: waiter class B (in submit_bio_wait()) ->
ext4 spin lock class *C* (in an upper fs)

Or using (3) or (4):

no link (because waiter class B does not exist anymore)

And if you agree that this logic claim is correct, than surely, an inclusive
approach is the best way forward.

I'm also curious about other opinions..

Cheers,
2> Amir.


--
Thanks,
Byungchul