Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

From: Alexei Starovoitov
Date: Fri Feb 22 2019 - 17:51:10 EST


On Fri, Feb 22, 2019 at 01:59:10PM -0800, Linus Torvalds wrote:
> On Fri, Feb 22, 2019 at 1:38 PM David Miller <davem@xxxxxxxxxxxxx> wrote:
> >
> > Don't be surprised if we see more separation like this in the future too.
>
> Yes, with the whole meltdown fiasco, there's actually more pressure to
> add more support for separation of kernel/user address spaces. As Andy
> pointed out, it's been discussed as a future wish-list for x86-64 too.
>
> But yeah, right now the *common* architectures all distinguish kernel
> and user space by pointers (ie x86-64, arm64 and powerpc).

That's all fine. I'm missing rationale for making probe_kernel_read()
fail on user addresses.
What is fundamentally wrong with a function probe_any_address_read() ?

For context, typical bpf kprobe program looks like this:
#define probe_read(P) \
({typeof(P) val = 0; bpf_probe_read(&val, sizeof(val), &P); val;})
SEC("kprobe/__set_task_comm")
int bpf_prog(struct pt_regs *ctx)
{
struct signal_struct *signal;
struct task_struct *tsk;
char oldcomm[16] = {};
char newcomm[16] = {};
u16 oom_score_adj;
u32 pid;

tsk = (void *)PT_REGS_PARM1(ctx);
pid = probe_read(tsk->pid);
bpf_probe_read(oldcomm, sizeof(oldcomm), &tsk->comm);
bpf_probe_read(newcomm, sizeof(newcomm), (void *)PT_REGS_PARM2(ctx));
signal = probe_read(tsk->signal);
oom_score_adj = probe_read(signal->oom_score_adj);
...
}

where PT_REGS_PARMx macros are defined per architecture.
On x86 it's #define PT_REGS_PARM1(x) ((x)->di)

The program writer has to know the meaning of function arguments.
In this example they need to know that __set_task_comm is defined as
void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) in the kernel.

Right now these programs just call bpf_probe_read() on whatever data
they need to access and not differentiating whether it's user or kernel.

One idea we discussed is to split bpf_probe_read() into kernel_read and user_read
helpers, but in the BPF verifier we cannot determine which address space
the program wants to access. The prog writer needs to manually analyze the program
to use correct one. But mistakes are possible and cannot be fatal.
On the kernel side we have to be safe.
Both probe_kernel_read and probe_user_read must not panic if a pointer
from wrong address space was passed.

Hence my preference is to keep probe_kernel_read as "try read any address".
The function can be renamed to indicate so.