Re: [PATCH] locking/lockdep: Report comm/pid/timestamp information

From: Eugeniu Rosca
Date: Mon Jul 09 2018 - 08:25:42 EST


On Mon, Jul 09, 2018 at 10:31:18AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 09, 2018 at 02:57:25AM +0200, Eugeniu Rosca wrote:
> > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> > index 6fc77d4dbdcd..eeed7ea2e198 100644
> > --- a/include/linux/lockdep.h
> > +++ b/include/linux/lockdep.h
> > @@ -186,6 +186,10 @@ struct lock_list {
> > struct list_head entry;
> > struct lock_class *class;
> > struct stack_trace trace;
> > + unsigned long long ts;
> > + char comm[16];
> > + pid_t pid;
> > + int cpu;
> > int distance;
> >
> > /*
>
> Yeah, not going to happen. You grow that structure from 64 bytes to 96
> bytes and with that grow the static footprint of the kernel by 512k (in
> the very best case)

I confirm that in case of x86_64, the bss size is increased by ~1M [1]
with standard v4.18-rc4 x86_64_defconfig + CONFIG_LOCKDEP.

> possibly again breaking things like sparc (which
> have a strict limit on the kernel image size).

For sparc there seems to be a dedicated CONFIG_LOCKDEP_SMALL, which
seems to downsize the lockdep implementation anyway.

> And all that for data that I've never needed and never even considered
> useful when looking at lockdep output.

It's likely because you infer about certain aspects which are not
clearly stated in the deadlock report. As example, the original report
doesn't say that the process which holds 'cpu_hotplug_lock.rw_sem'
is different to the process which holds the other locks. On the
contrary, it tells the user that all the locks are being held by the
same task, which seems to be wrong.

You likely also infer about the order of consuming the locks based on
the contents of the stack dump associated to each lock. Without doing
some mental diffs between the backtraces, it's not possible to see the
chronological order of consuming the locks. Actually this only works for
backtraces with common history, i.e. there is no clue what is the
time/point of acquiring 'cpu_hotplug_lock.rw_sem' relative to the other
locks.

The patch mostly shares my personal experience of trying to make sense
of lockdep output. It's OK if it doesn't reach mainline.

I still hope that I can get some feedback from community regarding
the actual cpufreq-related issue pointed out in the splat. I can also
reproduce it on v4.14, so it appears to be in the kernel for quite
some time.

Thank you in advance.

Best regards,
Eugeniu.

[1] BSS size increase after applying the patch
$ bloaty -s vm vmlinux.after -- vmlinux.before
VM SIZE FILE SIZE
-------------- --------------
+8.2% +1024Ki .bss 0 [ = ]
----snip----
+2.6% +1024Ki TOTAL +3.36Ki +0.0%