Re: [linux-next] BUG triggered in ptraceme

From: Jann Horn
Date: Wed Sep 19 2018 - 10:17:21 EST


Adding FS people to figure out whether GFP_KERNEL allocations with
i_rwsem's held for writing are okay.

On Wed, Sep 19, 2018 at 9:10 AM Cyrill Gorcunov <gorcunov@xxxxxxxxx> wrote:
> On Wed, Sep 19, 2018 at 10:07:37AM +0300, Cyrill Gorcunov wrote:
> > Hi Oleg! While been testing criu with linux-next we've triggered a BUG.
> > https://api.travis-ci.org/v3/job/430308998/log.txt
> >
> > [ 2.461618] BUG: sleeping function called from invalid context at security/apparmor/include/cred.h:154
> > [ 2.461794] in_atomic(): 1, irqs_disabled(): 1, pid: 152, name: init
> > [ 2.461890] 1 lock held by init/152:
> > [ 2.461981] #0: 00000000f30c3fda (tasklist_lock){.+.+}, at: ptrace_traceme+0x1c/0x70
> > [ 2.462114] irq event stamp: 2524
> > [ 2.462242] hardirqs last enabled at (2523): [<ffffffff98002922>] do_syscall_64+0x12/0x190
> > [ 2.462363] hardirqs last disabled at (2524): [<ffffffff98b8b02f>] _raw_write_lock_irq+0xf/0x40
> > [ 2.462476] softirqs last enabled at (1904): [<ffffffff98ac79ef>] unix_sock_destructor+0x4f/0xc0
> > [ 2.462586] softirqs last disabled at (1902): [<ffffffff98ac79ef>] unix_sock_destructor+0x4f/0xc0
> > [ 2.462697] CPU: 1 PID: 152 Comm: init Not tainted 4.19.0-rc4-next-20180918+ #1
> >
> > Which is due to commit
> >
> > commit 4b105cbbaf7c06e01c27391957dc3c446328d087
> > Author: Oleg Nesterov <oleg@xxxxxxxxxx>
> > Date: Wed Jun 17 16:27:33 2009 -0700
> >
> > ptrace: do not use task_lock() for attach
> >
> > because now after write_lock_irq(&tasklist_lock); apparmor calls for
> > traceme and
> >
> > static inline struct aa_label *begin_current_label_crit_section(void)
> > {
> > struct aa_label *label = aa_current_raw_label();
> >
> > --> might_sleep();
> >
> > Take a look please, once time permit.
>
> Heh, actually not :) It is due to commit
>
> commit 1f8266ff58840d698a1e96d2274189de1bdf7969
> Author: Jann Horn <jannh@xxxxxxxxxx>
> Date: Thu Sep 13 18:12:09 2018 +0200
>
> which introduced might_sleep. Seems it is bad idea to send bug report
> without having a cup of coffee at the morning :)

Yeah, I fixed one sleep-in-atomic bug and figured I'd throw a
might_sleep() in there for good measure... sigh.
I guess now I have to go through all the callers of
begin_current_label_crit_section() to see what else looks wrong...

apparmor_ptrace_traceme() is wrong, as reported...

apparmor_path_link() looks icky, but I'm not sure - from what I can
tell, it's called with an i_rwsem held for writing, and that probably
makes calling back into filesystem context from there a bad idea?
OTOH, it's just the i_rwsem of a newly-created path, so I don't know
whether that's actually an issue...

security_path_rename() is called with two i_rwsem's held, but again,
I'm not sure whether that's a problem.