Re: [C/R ARM][PATCH 2/3] ARM: Add the eclone system call

From: Sukadev Bhattiprolu
Date: Wed Mar 24 2010 - 14:13:42 EST


Russell King - ARM Linux [linux@xxxxxxxxxxxxxxxx] wrote:
| > +asmlinkage int sys_eclone(unsigned flags_low, struct clone_args __user *uca,
| > + int args_size, pid_t __user *pids,
| > + struct pt_regs *regs)
| > +{
| > + int rc;
| > + struct clone_args kca;
| > + unsigned long flags;
| > + int __user *parent_tidp;
| > + int __user *child_tidp;
| > + unsigned long __user stack;
|
| __user on an integer type doesn't make any sense; integer types do not
| have address spaces.

Ah, will fix that for x86 32/64 bit implementations.

|
| > + unsigned long stack_size;
| > +
| > + rc = fetch_clone_args_from_user(uca, args_size, &kca);
| > + if (rc)
| > + return rc;
| > +
| > + /*
| > + * TODO: Convert 'clone-flags' to 64-bits on all architectures.
| > + * TODO: When ->clone_flags_high is non-zero, copy it in to the
| > + * higher word(s) of 'flags':
| > + *
| > + * flags = (kca.clone_flags_high << 32) | flags_low;
| > + */
| > + flags = flags_low;
| > + parent_tidp = (int *)(unsigned long)kca.parent_tid_ptr;
| > + child_tidp = (int *)(unsigned long)kca.child_tid_ptr;
|
| This will produce sparse errors. Is there a reason why 'clone_args'
| tid pointers aren't already pointers marked with __user ?

Making them pointers would make them 32-bit on some and 64-bit on other
architectures. We wanted the fields to be the same size on all architectures
for easier portability/extensibility. Given that we are copying-in a structure
from user-space, copying in a few extra bytes on the 32-bit architectures
would not be significant.

|
| > +
| > + stack_size = (unsigned long)kca.child_stack_size;
|
| Shouldn't this already be of integer type?
|
| > + if (stack_size)
| > + return -EINVAL;
|
| So the stack must have a zero size? Is this missing a '!' ?

Some architectures (IA64 ?) use the stack-size field. Those that don't
need it should error out. Again, its because we are trying to keep the
interface common across architectures.

|
| > +
| > + stack = (unsigned long)kca.child_stack;
| > + if (!stack)
| > + stack = regs->ARM_sp;
| > +
| > + return do_fork_with_pids(flags, stack, regs, stack_size, parent_tidp,
| > + child_tidp, kca.nr_pids, pids);
|
| Hmm, so let me get this syscall interface right. We have some arguments
| passed in registers and others via a (variable sized?) structure. It seems
| really weird to have, eg, a pointer to the pids and the number of pids
| passed in two separate ways.
|
| The grouping between what's passed in registers and via this clone_args
| structure seems to be random. Can it be sanitized?

:-) Well, we went through a lot of discussions on this. Here is one pointer
http://lkml.org/lkml/2009/9/11/92 (and that was version 6 of 13+ :-).

We wanted the first parameter of eclone(), flags, to remain 32-bit value to
avoid confusing user-space. (i.e if they accidentally pass a 64-bit flags
value to the clone() system call which takes 32-bit flags, the higher-32
bits would silently be dropped).

By sticking the higher clone-flags in 'struct clone_args', all architectures
would have to do the same extra work to set the higher flags so less
chance of error.

Re: passing the number of pids, nr_pids in the structure, I think it was
just that by passing it in the structure, we could avoid using an extra
register for the system call parameter.

'pid_t *' would be of different size on different architecutres. Passing
it as a separate parameter would avoid the pointer conversion. Note
that unlike the tid pointers above which are a few individual fields,
the pid_t array could in theory be large.

Hope that helps. Thanks the review comments.

Sukadev
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/