Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper to be called in tracers (kprobes)

From: Daniel Borkmann
Date: Thu Jul 21 2016 - 07:30:37 EST


On 07/21/2016 12:47 PM, Sargun Dhillon wrote:
On Thu, Jul 21, 2016 at 01:00:51AM +0200, Daniel Borkmann wrote:
[...]
I don't really like couple of things, your ifdef CONFIG_MMU might not be
needed I think, couple of these checks seem redundant, (I'm not yet sure
about the task->mm != task->active_mm thingy), the helper should definitely
be gpl_only and ARG_PTR_TO_RAW_STACK is just buggy. Also, this should be
a bit analogue to bpf_probe_read we have. How about something roughly along
the lines of below diff (obviously needs extensive testing ...)? This
can still do all kind of ugly crap to the user process, but limited to
the cap_sys_admin to shoot himself in the foot.
* You're right about CONFIG_MMU. We don't need it, all of the nommu platforms
properly deal with it from my research.

The segment_eq() test should generically catch this from what I see.

It was always ARG_PTR_TO_STACK? Or are you saying ARG_PTR_TO_STACK is buggy and
we should make it ARG_PTR_TO_RAW_STACK?

No, in your patch, you had '+ .arg2_type = ARG_PTR_TO_RAW_STACK,',
which is not correct as it means you don't need to initialize the memory
you pass in for your *src pointer. I believe you took this over from
probe_read(), but there it's correct. ARG_PTR_TO_STACK means the verifier
checks that it's initialized with data.

I originally named the function bpf_probe_write. Upon further thought I don't
think that makes sense. The reason is because bpf_probe_read is analogous to
probe_kernel_read. If we had bpf_probe_write, I think people might reason it to
be equivalent to probe_kernel_write, and be confused when they can't write to
kernel space.

I still think that bpf_probe_write is the more appropriate name, and that
the -EPERM are also better than -EINVAL. For user space, you'll have the
bpf_probe_read() / bpf_probe_write() pair you can use, which is the more
intuitive complement, also since people might already use bpf_probe_read(),
so they kind of can infer its meaning. It's just that the kernel doesn't
give you _permission_ to mess with kernel memory, hence due to not being
allowed -EPERM to make this absolutely clear to the user that this is illegal.
-EINVAL usually means one of the function arguments might be wrong, I think
-EPERM is a better / more clear fit in this case, imho.

I tried to make the external facing documentaion close to copy_to_user. That's
how people should use it, not like _write. Therefor I think it makes sense to
keep that the name.

But still, you *probe* to write somewhere to the process' address space,
so it can still fail with -EFAULT. Other than that, see comment above.

I added a check for (!task) -- It seems to be spattered throughou the eBPF
helper code. Alexei mentioned earlier that it can be null, but I'm not sure of
the case

Well, the test of unlikely(in_interrupt() || (task->flags & PF_KTHREAD))
will cover these cases. It makes sure that you're neither in hardirq (NULL
here) nor softirq and that you're not in a kthread.

RE: task->mm != task->active_mm -- There are a couple scenarios where kthreads
do this, and the only user call that should hit this is execve. There's only a
brief period where this can be true and I don't think it's worth dealing with
that case -- I'm not really sure you can plant a kprobe at the right site either

But if kthreads do this, wouldn't this already be covered by task->flags &
PF_KTHREAD test for such case?

Did some minimal testing with tracex7 and others.

Ok, great!

I was able to copy memory into other process's space while certain
syscalls were going on. I don't think that there are a reasonable set of
protections.

Right, it doesn't prevent that syscalls going on in parallel.

I'll do more testing. Any suggestions of what might break? I've looked at
writing to unitialized memory, Memory out of bounds, etc... Do you know of any
"weak spots"?

Well, you could write into text/data, stack, etc for the user process so
quite a bit. ;) Or do you mean wrt kernel space? If someone runs some binary
installing such a proglet, I think it would also make sense to print a message
(rate-limited) to the klog with current->comm, task_pid_nr(current) for the
process installing this, from verifier side I mean. Maybe makes more sense
than the print-once from the helper side.

Thanks,
Daniel