Running Untrusted Code in a Restricted Process

From: jesse hammons (jhammons@bigteam.org)
Date: Fri Jun 09 2000 - 11:10:13 EST


I'm looking for feedback and help on this feature. I've got an idea that
is basically implemented except I need some help with a couple of lines of
assembly in entry.S (my version attached below). I'm working off of a
2.2.12 kernel.

First, let me explain what I'm doing. I want to be able to run untrusted code
in a separate process. An example would be downloading a binary plugin. I
believe there is a way to create a Linux personality that restricts which
system calls a process is allowed to make. This actually sparked a debate with
an engineering friend of mine. He claims that there is probably a way to
execute a sequence of instructions that somehow leaves the processer in a bad
state. I disagree! If that were true, anyone could crash the linux (or any
x86 unix) kernel.

Anyway here is the idea. I added a new task flag PF_RESTRICTED. This bit
is set by setting yourself to the PER_RESTRICTED personality like so:

/* in host process */
switch(fork()) {
  case 0:
    read_file_into_ram("libunstrusted.so");
    personality(PER_RESTRICTED);
    dlopen_from_ram("libuntrusted.so");
    /* find entry point symbol and execut it */
    break;
  case -1:
    /* error handling */
    break;
  default :
    break;
}

This idea is so simple that I'm suprised that I couldn't find any
implementations after searching the web for a day. The capabilities stuff
is related, but seems to be focused on providing finer granularity for
root privledges. What I'm talking about is providing finer granularity for
what system calls a process can make. A process that can't make system calls
cannot delete files, make network connections, or make DOS attacks on RAM,
CPU, or other system resources. Ever action is monitored by the host process,
which is trusted code. The user can modify settings in the host process to
allow as much or as little resource usage as they want. For example the
untrusted code could be restricted from using more than 60% of CPU, or from
using more than 20Mb of RAM, etc.

Note above that a slightly modified version of dlopen() that doesn't make
system calls is required. In fact a lot of user space code is required.
I beleive the following system calls are sufficient to implement this
architecture:
client allowed to make the following system calls:
  msgsend() /* Restricted exec_domain validates message */
  msgrecv() /* to get replies from the host */
  shmat() /* a more restricted version which only allows the client to
             attach shm that the host process has explicitly allowed */
  exit() /* this is sometimes convenient! */
  brk() /* a more restricted version so that the client can allocate
         memory in it's own process instead of always using shared mem
         provided by the host. This is more tightly controled version
         however, with the host process having full control over how
         much ram is allowed to be allocated */
         

I will be publishing this patch when I get everything working. It consists of
about 10 lines of code to the kernel (for setting and testing for the
PF_RESTRICTED flag). The rest is implemented as a kernel module. Obviously,
to make anything useful (e.g. multimedia plugins) there will also have to be
a significant user space library. The idea is that the host and untrusted
client process will communicate using IPC messages, and the client will write
into IPC shared memory buffers provided by the host. This way the host can
keep track of all pointers and verify them, etc. Of course the host must
still assume that the client has written complete garbage into the buffers,
but that is handled in user space. My idea is to have the host process
provide audio and video buffers which are filled in by the client. If
necessary the host validates the buffers and finally makes the real system
call to send the data to the hardware.

It is conceivable that being able to enable/disable any of the 190 system
calls through an ioctl() might also be useful.

I think this could be a win in terms of having a non-Java Virtual
Machine. These plugins would run at the full speed of the processer
and use existing development tools. They could be optimized to
use MMX or whatever processor features. I also think it's possible
for other operating systems to implement this feature (but I won't
talk about that here :-). Linux could be the first to deliver it.

Anyway, I'm no operating systems expert, so I would like some
feedback on whether people think this is a useful feature or not. All I know
is the entire world is tired of "Click OK to install plugin which can
potentially trash your machine." At least I am. Also I couldn't find any
related work on this (using linux or any other OS). If anyone has some
references (online or books) that would be great.
  
Here is the relevant piece from my entry.S file. I am abusing the lcall7()
handler from struct exec_domain. I want the arguments to my handler to
look like this:
void my_fake_lcall7(int syscall_num, struct pt_regs *regs);

In other words I want the number of the system call that the process tried to
be in %eax, and I want a pointer to the rest of the arguments to the
system call in %ebx. The label I use is "sandbox" for historical reasons :-)

ENTRY(system_call)
  pushl %eax # save orig_eax
  SAVE_ALL
  GET_CURRENT(%ebx)
  cmpl $(NR_syscalls),%eax
  jae badsys
  testl $0x20000000,flags(%ebx) # PF_RESTRICTED
  jne sandbox
  testb $0x20,flags(%ebx) # PF_TRACESYS
  jne tracesys
  call *SYMBOL_NAME(sys_call_table)(,%eax,4)
  movl %eax,EAX(%esp) # save the return value
  ALIGN
  .globl ret_from_sys_call
  .globl ret_from_intr
ret_from_sys_call:
  movl SYMBOL_NAME(bh_mask),%eax
  andl SYMBOL_NAME(bh_active),%eax
  jne handle_bottom_half
ret_with_reschedule:
  cmpl $0,need_resched(%ebx)
  jne reschedule
  cmpl $0,sigpending(%ebx)
  jne signal_return
restore_all:
  RESTORE_ALL

  ALIGN
signal_return:
  sti # we can get here from an interrupt handler
  testl $(VM_MASK),EFLAGS(%esp)
  movl %esp,%eax
  jne v86_signal_return
  xorl %edx,%edx
  call SYMBOL_NAME(do_signal)
  jmp restore_all

  ALIGN
v86_signal_return:
  call SYMBOL_NAME(save_v86_state)
  movl %eax,%esp
  xorl %edx,%edx
  call SYMBOL_NAME(do_signal)
  jmp restore_all

  ALIGN
tracesys:
  movl $-ENOSYS,EAX(%esp)
  call SYMBOL_NAME(syscall_trace)
  movl ORIG_EAX(%esp),%eax
  call *SYMBOL_NAME(sys_call_table)(,%eax,4)
  movl %eax,EAX(%esp) # save the return value
  call SYMBOL_NAME(syscall_trace)
  jmp ret_from_sys_call
badsys:
  movl $-ENOSYS,EAX(%esp)
  jmp ret_from_sys_call
sandbox:
  movl exec_domain(%ebx),%edx # Get the execution domain
  movl 4(%edx),%edx # Get the lcall7 handler for the domain
  call *%edx
  movl %eax, EAX(%esp)
  jmp ret_from_sys_call

[ rest of entry.S truncated ]

My x86 assembly knowledge is awful, please help :-)

Also, should I be useing vm86() instead (I though that was for doing
16 bit stuff). Finally, I realize that this adds two instructions to
ALL system call overhead! Can anyone think of a way to avoid this?
Someone told me that Linux has the fastest null system call of any OS, I'm
sure some of you out there are grumbling about this patch, but I can't
think of any other way to do it. Maybe we can also set the PF_TRACE flag,
and require that restricted processes also be traced? But I don't know
much about how tracing works...

Thanks,
-Jesse Hammons (first post!)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jun 15 2000 - 21:00:19 EST