Re: [BUG] fs: super: possible ABBA deadlocks in do_thaw_all_callback() and freeze_bdev()

From: Theodore Ts'o
Date: Mon Dec 27 2021 - 08:43:36 EST


On Mon, Dec 27, 2021 at 05:32:09PM +0800, Jia-Ju Bai wrote:
> Thanks for your reply and suggestions.
> I will try to trigger this possible deadlock by enabling lockdep and using
> the workloads that you suggested.
> In my opinion, static analysis can conveniently cover some code that is hard
> to be covered at runtime, and thus it is useful to detecting some
> infrequently-triggered bugs.
> However, it is true that static analysis sometimes has many false positives,
> which is unsatisfactory :(
> I am trying some works to relieve this problem in kernel-code analysis.
> I can understand that the related code is not frequently executed, but I
> think that finding and fixing bugs should be always useful in practice :)

The thing about the sysrq commands is that they are almost always used
in emergency situations when the system administrator with physical
access to the console sends a sysrq command (e.g., by sending a BREAK
to the serial console). This is usually done when the system has
*already* locked up for some reason, such as getting livelocked due to
an out of memory condition, or maybe even a deadlock. So if sysrq-j
could potentially cause a deadlock, so what? Sysrq-j would only be
used when the system was in a really bad state due to a bug in any
case. In over 10 years of kernel development, I can't remember a
single time when I've needed to use sysrq-j.

So it might be that the better way to handle this would be to make
sure all of the emergency sysrq code in fs/super.c is under the
CONFIG_MAGIC_SYSRQ #ifdef --- and then do the static analysis without
CONFIG_MAGIC_SYSRQ defined.

As I said, I agree it's a bug, and if I had infinite resources, I'd
certainly ask an engineer to completely rework the emergency sysrq-j
code path to address the potential ABBA deadlock. The problem is I do
*not* have infinite resources, which means I have to prioritize which
bugs get attention, and how much time engineers on my team spend
working on new features or performance enhacements that can justify
their salaries and ensure that they get good performance ratings ---
since leadership, technical difficulty and business impact is how
engineers get judged at my company.

Unfortunately, judging business impact is one of those things that is
unfair to expect a static analyzer to do. And after all, if we have
infinite resources, why should an OS bother with a VM? We can just
pin all process text/data segments in memory, if money (and DRAM
availability in the supply chain) is no object. :-)

Cheers,

- Ted