Re: BUG: sleeping function called from invalid context in rxe_alloc_nl

From: Jason Gunthorpe
Date: Tue Jan 19 2021 - 16:26:48 EST


On Tue, Jan 19, 2021 at 09:39:19AM -0800, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: b4bb878f Add linux-next specific files for 20210119
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=12d34e9f500000
> kernel config: https://syzkaller.appspot.com/x/.config?x=7b1ca623d7cc5ca3
> dashboard link: https://syzkaller.appspot.com/bug?extid=ec2fd72374785d0e558e
> compiler: gcc (GCC) 10.1.0-syz 20200507
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=148035af500000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10eb8494d00000
>
> The issue was bisected to:
>
> commit 3853c35e243d56238159e8365b6aca410bdd4576
> Author: Bob Pearson <rpearsonhpe@xxxxxxxxx>
> Date: Wed Dec 16 23:15:49 2020 +0000
>
> RDMA/rxe: Add unlocked versions of pool APIs
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=126612d0d00000
> final oops: https://syzkaller.appspot.com/x/report.txt?x=116612d0d00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=166612d0d00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+ec2fd72374785d0e558e@xxxxxxxxxxxxxxxxxxxxxxxxx
> Fixes: 3853c35e243d ("RDMA/rxe: Add unlocked versions of pool APIs")
>
> netdevsim netdevsim0 netdevsim1: set [1, 0] type 2 family 0 port 6081 - 0
> netdevsim netdevsim0 netdevsim2: set [1, 0] type 2 family 0 port 6081 - 0
> netdevsim netdevsim0 netdevsim3: set [1, 0] type 2 family 0 port 6081 - 0
> infiniband syz2: set active
> infiniband syz2: added bond_slave_0
> BUG: sleeping function called from invalid context at drivers/infiniband/sw/rxe/rxe_pool.c:346
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 8459, name: syz-executor401
> 6 locks held by syz-executor401/8459:
> #0: ffffffff8fc4a418 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
> #1: ffffffff8c78ced0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x261/0x540 drivers/infiniband/core/nldev.c:1545
> #2: ffffffff8c77c470 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1307
> #3: ffffffff8c77c330 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1315
> #4: ffff88802adc8598 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:715
> #5: ffff88802adc9640 (&pool->pool_lock){....}-{2:2}, at: rxe_alloc+0x1b/0x40 drivers/infiniband/sw/rxe/rxe_pool.c:384

Bob, yes, this is busted up

read_lock_irqsave(&pool->pool_lock, flags);
obj = rxe_alloc_nl(pool);
read_unlock_irqrestore(&pool->pool_lock, flags);

Those are spin locks

void *rxe_alloc_nl(struct rxe_pool *pool)
{
obj = kzalloc(info->size, (pool->flags & RXE_POOL_ATOMIC) ?
GFP_ATOMIC : GFP_KERNEL);

And that is always calling GFP_KERNEL inside a spinlock, regardless of
ATOMIC

No idea how this should be fixed

Jason