Re: possible deadlock in sk_clone_lock

From: Paul E. McKenney
Date: Wed Mar 03 2021 - 14:07:16 EST


On Wed, Mar 03, 2021 at 09:03:27AM +0100, Michal Hocko wrote:
> [Add Paul]
>
> On Tue 02-03-21 13:19:34, Mike Kravetz wrote:
> > On 3/2/21 6:29 AM, Michal Hocko wrote:
> > > On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
> > >> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > >>>
> > >>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> > >>>> On 3/1/21 9:23 AM, Michal Hocko wrote:
> > >>>>> On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> > >>>>>> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > >>>>> [...]
> > >>>>>>> Then how come this can ever be a problem? in_task() should exclude soft
> > >>>>>>> irq context unless I am mistaken.
> > >>>>>>>
> > >>>>>>
> > >>>>>> If I take the following example of syzbot's deadlock scenario then
> > >>>>>> CPU1 is the one freeing the hugetlb pages. It is in the process
> > >>>>>> context but has disabled softirqs (see __tcp_close()).
> > >>>>>>
> > >>>>>> CPU0 CPU1
> > >>>>>> ---- ----
> > >>>>>> lock(hugetlb_lock);
> > >>>>>> local_irq_disable();
> > >>>>>> lock(slock-AF_INET);
> > >>>>>> lock(hugetlb_lock);
> > >>>>>> <Interrupt>
> > >>>>>> lock(slock-AF_INET);
> > >>>>>>
> > > [...]
> > >>> Wouldn't something like this help? It is quite ugly but it would be
> > >>> simple enough and backportable while we come up with a more rigorous
> > >>> solution. What do you think?
> > >>>
> > >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > >>> index 4bdb58ab14cb..c9a8b39f678d 100644
> > >>> --- a/mm/hugetlb.c
> > >>> +++ b/mm/hugetlb.c
> > >>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
> > >>> void free_huge_page(struct page *page)
> > >>> {
> > >>> /*
> > >>> - * Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
> > >>> + * Defer freeing if in non-task context or when put_page is called
> > >>> + * with IRQ disabled (e.g from via TCP slock dependency chain) to
> > >>> + * avoid hugetlb_lock deadlock.
> > >>> */
> > >>> - if (!in_task()) {
> > >>> + if (!in_task() || irqs_disabled()) {
> > >>
> > >> Does irqs_disabled() also check softirqs?
> > >
> > > Nope it doesn't AFAICS. I was referring to the above lockdep splat which
> > > claims irq disabled to be the trigger. But now that you are mentioning
> > > that it would be better to replace in_task() along the way. We have
> > > discussed that in another email thread and I was suggesting to use
> > > in_atomic() which should catch also bh disabled situation. The big IF is
> > > that this needs preempt count to be enabled unconditionally. There are
> > > changes in the RCU tree heading that direction.
> >
> > I have not been following developments in preemption and the RCU tree.
> > The comment for in_atomic() says:
> >
> > /*
> > * Are we running in atomic context? WARNING: this macro cannot
> > * always detect atomic context; in particular, it cannot know about
> > * held spinlocks in non-preemptible kernels. Thus it should not be
> > * used in the general case to determine whether sleeping is possible.
> > * Do not use in_atomic() in driver code.
> > */
> >
> > That does seem to be the case. I verified in_atomic can detect softirq
> > context even in non-preemptible kernels. But, as the comment says it
> > will not detect a held spinlock in non-preemptible kernels. So, I think
> > in_atomic would be better than the current check for !in_task. That
> > would handle this syzbot issue, but we could still have issues if the
> > hugetlb put_page path is called while someone is holding a spinlock with
> > all interrupts enabled. Looks like there is no way to detect this
> > today in non-preemptible kernels. in_atomic does detect spinlocks held
> > in preemptible kernels.
>
> Paul what is the current plan with in_atomic to be usable for !PREEMPT
> configurations?

Ah, thank you for the reminder! I have rebased that series on top of
v5.12-rc1 on -rcu branch tglx-pc.2021.03.03a.

The current state is that Linus is not convinced that this change is
worthwhile given that only RCU and printk() want it. (BPF would use
it if it were available, but is willing to live without it, at least in
the short term.)

But maybe Linus will be convinced given your additional use case.
Here is hoping!

Thanx, Paul