Re: [BUG] fs: super: possible ABBA deadlocks in do_thaw_all_callback() and freeze_bdev()

From: Jia-Ju Bai
Date: Mon Dec 27 2021 - 21:41:23 EST




On 2021/12/27 21:43, Theodore Ts'o wrote:
On Mon, Dec 27, 2021 at 05:32:09PM +0800, Jia-Ju Bai wrote:
Thanks for your reply and suggestions.
I will try to trigger this possible deadlock by enabling lockdep and using
the workloads that you suggested.
In my opinion, static analysis can conveniently cover some code that is hard
to be covered at runtime, and thus it is useful to detecting some
infrequently-triggered bugs.
However, it is true that static analysis sometimes has many false positives,
which is unsatisfactory :(
I am trying some works to relieve this problem in kernel-code analysis.
I can understand that the related code is not frequently executed, but I
think that finding and fixing bugs should be always useful in practice :)
The thing about the sysrq commands is that they are almost always used
in emergency situations when the system administrator with physical
access to the console sends a sysrq command (e.g., by sending a BREAK
to the serial console). This is usually done when the system has
*already* locked up for some reason, such as getting livelocked due to
an out of memory condition, or maybe even a deadlock. So if sysrq-j
could potentially cause a deadlock, so what? Sysrq-j would only be
used when the system was in a really bad state due to a bug in any
case. In over 10 years of kernel development, I can't remember a
single time when I've needed to use sysrq-j.

So it might be that the better way to handle this would be to make
sure all of the emergency sysrq code in fs/super.c is under the
CONFIG_MAGIC_SYSRQ #ifdef --- and then do the static analysis without
CONFIG_MAGIC_SYSRQ defined.

Thanks for the explanation.
In fact, I did not know the sysrq commands, before finding this bug and seeing your explanation.


As I said, I agree it's a bug, and if I had infinite resources, I'd
certainly ask an engineer to completely rework the emergency sysrq-j
code path to address the potential ABBA deadlock. The problem is I do
*not* have infinite resources, which means I have to prioritize which
bugs get attention, and how much time engineers on my team spend
working on new features or performance enhacements that can justify
their salaries and ensure that they get good performance ratings ---
since leadership, technical difficulty and business impact is how
engineers get judged at my company.

I can understand the priority of bug fixing, with the consideration of resources and time.
My static analysis tool just provides a small message that there is a possible bug :)


Unfortunately, judging business impact is one of those things that is
unfair to expect a static analyzer to do.

Thanks for your understanding :)
Before seeing your explanation, I have no idea of business impact.
But it is indeed practical to consider business impact and resource assignment in kernel development.

And after all, if we have
infinite resources, why should an OS bother with a VM? We can just
pin all process text/data segments in memory, if money (and DRAM
availability in the supply chain) is no object. :-)

Haha, interesting idea :)


Thanks a lot,
Jia-Ju Bai