Re: BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!

From: Waiman Long
Date: Thu Jan 26 2023 - 13:31:37 EST


On 1/26/23 12:38, Boqun Feng wrote:
[Cc lock folks]

On Thu, Jan 26, 2023 at 02:47:42PM +0500, Mikhail Gavrilov wrote:
On Wed, Jan 25, 2023 at 10:21 PM David Sterba <dsterba@xxxxxxx> wrote:
On Wed, Jan 25, 2023 at 01:27:48AM +0500, Mikhail Gavrilov wrote:
On Tue, Jul 26, 2022 at 9:47 PM David Sterba <dsterba@xxxxxxx> wrote:
On Tue, Jul 26, 2022 at 05:32:54PM +0500, Mikhail Gavrilov wrote:
Hi guys.
Always with intensive writing on a btrfs volume, the message "BUG:
MAX_LOCKDEP_CHAIN_HLOCKS too low!" appears in the kernel logs.
Increase the config value of LOCKDEP_CHAINS_BITS, default is 16, 18
tends to work.
Hi,
Today I was able to get the message "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too
low!" again even with LOCKDEP_CHAINS_BITS=18 and kernel 6.2-rc5.

❯ cat /boot/config-`uname -r` | grep LOCKDEP_CHAINS_BITS
CONFIG_LOCKDEP_CHAINS_BITS=18

[88685.088099] BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
[88685.088124] turning off the locking correctness validator.
[88685.088133] Please attach the output of /proc/lock_stat to the bug report
[88685.088142] CPU: 14 PID: 1749746 Comm: mv Tainted: G W L
------- --- 6.2.0-0.rc5.20230123git2475bf0250de.38.fc38.x86_64 #1
[88685.088154] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4408 10/28/2022

What's next? Increase this value to 19?
Yes, though increasing the value is a workaround so you may see the
warning again.
Is there any sense in this WARNING if we would ignore it and every
time increase the threshold value?
Lockdep uses static allocated array to track lock holdings chains to
avoid dynmaic memory allocation in its own code. So if you see the
warning it means your test has more combination of lock holdings than
the array can record. In other words, you reach the resource limitation,
and in that sense it makes sense to just ignore it and increase the
value: you want to give lockdep enough resource to work, right?

May Be set 99 right away? Or remove such a check condition?
That requires having 2^99 * 5 * sizeof(u16) memory for lock holding
chains array..

Note that every increment of LOCKDEP_CHAINS_BITS double the storage space. With 99, that will likely exceed the total amount of memory you have in your system.

Boqun, where does the 5 figure come from. It is just a simple u16 array of size MAX_LOCKDEP_CHAIN_HLOCKS. The chain_hlocks array stores the lock chains that show up in the lockdep splats and in the /proc/lockdep* files. Each chain is variable size. As we add new lock into the chain, we have to repeatedly deallocate and reallocate a larger chain buffer. That will cause fragmentation in the chain_hlocks[]. So if we have a very long lock chain, the allocation may fail because the largest free block is smaller than the requested chain length. There may be enough free space in chain_hlocks, but it is just too fragmented to be useful.

Maybe we should figure out a better way to handle this fragmentation. In the mean time, the easiest way forward is just to increase the LOCKDEP_CHAINS_BITS by 1.


However, a few other options we can try in lockdep are:

* warn but not turn off the lockdep: the lock holding chain is
only a cache for what lock holding combination lockdep has ever
see, we also record the dependency in the graph. Without the
lock holding chain, lockdep can still work but just slower.

* allow dynmaic memory allocation in lockdep: I think this might
be OK since we have lockdep_recursion to avoid lockdep code ->
mm code -> lockdep code -> mm code ... deadlock. But maybe I'm
missing something. And even we allow it, the use of memory
doesn't change, you will still need that amout of memory to
track lock holding chains.

It is not just the issue of calling the memory allocator. There is also the issue of copying data from old chain_hlocks to new one while the old one may be updated during the copying process unless we can freeze everything else.

Cheers,
Longman