Re: Linux needs flexible security

Pjotr Kourzanoff (pjotr@duticai.twi.tudelft.nl)
Sun, 21 Nov 1999 03:00:58 +0100 (CET)


On 20 Nov 1999, Mike Coleman wrote:

> I think this idea is very attractive in several ways. I've been thinking
> about something like this for a long time as part of an implementation for a
> security tool like MEC described earlier.
>
> The device you're talking about could go in /proc/<n>, for example. The
> tracing process could read a line like this from it for each system call

Why whould it go to proc? I thought bd's and cd's on /proc weren't supposed
work. We could try devfs, better that than limit the queue size to 4k.


> write 2, "this is data", 12
>
> (or perhaps a packed version for efficiency). The tracer could then react by
> writing a line like
>
> skip-fail EIO
> or
> skip-ok 12
>
> to make the call "fail" or "succeed" (returning 12) without even really being
> seen by the kernel. Alternatively, the tracer could write
>
> write 4, "this is data", 12

I think this becomes very tricky very soon. Here you need to access
the same request queue from both kernel and the tracer. And if you
allow for multiple tracers you may get infinite recursion! The
way to eliviate this is to have something like syscall batch
queue: all level 2 (safe:untrusted) sys_ entries just append the requests in
there, tracers then read and put replies into another queue which is
served by level 1 (unsafe:trusted) sys_ entries which do the real work.
This redirection can obviously done from syscall table even at runtime,
if necessary.

> [*]

> You would also want to be able to step on memory before returning (e.g., to
> alter the result of gettimeofday). It would be nice if the protocol would

Do you have some programs that require this?

> provide commands like
>
> set 1 "some data"
> set 2 "other data"
>
> but this might be problematic. Just using /proc/<n>/mem is also a
> possibility.

Now, do you want to change the tracee's memory?

> Similar commands would be provided for intercepting and handling signals.

Some would even say that it can capture an interrupt.

> The beauty of an approach like this is that it could be relatively generic.
> Right now, in order to do this sort of thing with ptrace, you have to know
> endless ugly details about exactly how to parse the stack, etc. This device,
> though, could have a generic protocol.

It actually has one :-) If you mean the format of the data, then I
would disagree. You do design this for a specific purpose - flexible
security, why not optimise for that?

> It's also fairly powerful. I think you could (portably) implement tools like
> strace or MEC's trace-and-replay debugger or the security tool we're
> discussing using just this interface. And by eliminating ptrace, you
> eliminate its problems (altered parent/child semantics).

Indeed. read/write rules doesn't it ?

> This device could also be reentrant, able to be opened by more than one
> tracer. Then each call would possibly be screened/altered by several
> processes, in a nested fashion. This would be difficult or impossible with
> ptrace.

Yeah, sure. Tracer just reads from the queue, does its job and
writes to another queue. This gives a chance to other tracers to do
their job while this one does his, since reads can be concurrent.
Often, tracer's actions would be to syscall (trusted one), meaning
that actions of a group of tracers can be interleaved with the
tracees. On SMP, this gives speedup.

>
> As far as performance, though, it's not clear at all that this would be an
> improvement over ptrace. In the SP case, you still end up suspending the
> calling process, waking up the tracer, which is presumably doing a select on
> the device, and then switching back to the tracee. I'm not very knowledgeable
> about MP, but I don't see why this would be a win in the MP case either.

OK. Two reasons:

1. syscall authentication requests (or whatever you want your
"tracer" to do :-) can be batched. So, while the calling
process is suspended, others (other than the "tracer")
can be resumed, once they are authenticated, all by the same
tracer and provided that the tracer is somehow faster than the
tracees.

2. 1. + MP = definite speedup.

> Another problem you run into is dealing with system call arguments. Suppose
> the call is a write. Are the contents of the buffer to be written also to be
> passed through this syscall device, so that the tracer can examine them? That
> would be nice, but how to you do it efficiently?

Voila. the same read()! for the sake simpliness: take readv(), make
a special iovec with two elements: pointer to the syscall meta-data,
and pointer to the syscall arguments that points to wherever you what your
trustee's data to appear inside tracer. Pass this to the kernel, and it can
fill syscall meta-data, and do an mmap of the arguments if that's
necessary, or just copy_page() it. What you get is just one copy of
the meta-data (less than 1k, right) and (as you like) one or 0 copy
of the argument values. That's not much, isn't it?

> A naive way to do it would be to simply copy the arguments out of user space.
> But that's really expensive if you're doing 10MB writes. You also have to
> know exactly what data each system call will possibly read or write from user
> memory, which MEC has previously pointed out is a rat's nest for ioctl, etc.

ioctl's have no play here. I am not sure what are you talking here
about. What you do is just initialize this queueing externally "once" and never
touch anything else but read() and write().

> Another way you could do it would be to fake the copy by having reads on the
> syscall device transparently pull data out of memory, depending on the
> semantics of the particular system call, values of other arguments, etc. This
> still has the ioctl problem, though, and if you don't copy the data, you have
> to worry about locking it down (making sure it doesn't get overwritten) if
> shared memory is involved.

Right. You know what you want to trace. From that function you can
give a hint to the queue implementation what should be copied and
what should be mapped. Immediately after the
tracer gets to the point of read()'ing its syscall, you can use this in your
do_read() dual and copy the data, as requested. What's a ioctl
problem? In some cases you would like to get a snapshot of the
tracee's activities, to gather some insight in it's guts, and you're
exactly interested what other guys doing with shared memory while
you control it.

In case you mean that it will be a nightmare to
configure the queue implementation at run-time, for each separate
syscall...that's right. But I remember someone proposed a
Interceptor patch to this list (couple of months ago), you could
take that and add a stack of specific queue implementations. That's
not much: one for strace-like beast (tracee readonly), one for
the debugger (r/w) and one for security application (tracee
readonly, multiple security levels on syscalls). Do you need more?

> I'm not trying to discourage you. I think some variation of this is a real
> winner, and I was planning to try my hand at it myself.

Nice. Do you have something working then?

> --Mike
>
>
> --
> Any sufficiently adverse technology is indistinguishable from Microsoft.
>

pk.
/*****************************************************************
in a world without walls and borders who needs windows and gates ?
*****************************************************************/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/