Re: [GIT PULL] Namespace file descriptors for 2.6.40

From: Geert Uytterhoeven
Date: Wed May 25 2011 - 04:36:11 EST


On Wed, May 25, 2011 at 10:25, Ingo Molnar <mingo@xxxxxxx> wrote:
> * Valdis.Kletnieks@xxxxxx <Valdis.Kletnieks@xxxxxx> wrote:
>
>> On Tue, 24 May 2011 09:16:28 +0200, Ingo Molnar said:
>> > * Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>> > > My gut feel says we should really implement an
>> > > include/asm-generic/unistd-common.h to include all new system calls.
>> > >
>> > > That way there would be only one file to touch instead of 50. Certainly it
>> > > works for include/asm-generic/unistd.h for the architectures that use it.
>> > > And all we really need is just a little abstraction on that concept.
>> >
>> > I suppose that could be tried, although in practice it would probably be
>> > somewhat complex due to the various compat syscall handling differences.
>>
>> Can somebody fill us newcomers in on the arch-aeology of why some syscalls have
>> different numbers on different archs? I know it's partially because some simply
>> didn't implement some syscalls so there were numbering mismatches, but would it
>> have been *that* hard to wire all of those skipped syscalls up to one stub
>> 'return -ENOSYS'?
>
> It was done so for hysterical raisons mostly, and once a bad ABI is done it's
> very hard to undo it: beyond pushing the 'good ABI' you'd also still have to
> deal with the bad ABI for a decade or more.
>
> So the background is that most architectures start out as quick concept
> prototypes, doing:
>
> Â Â Â Âcp -a arch/existingarch arch/newarch
>
> where 'existingarch' used to be arch/i386/ in the early days. Now i386 had a
> fair amount of x86 specific syscalls that were naturally removed from
> 'newarch'. Those created 'holes' in the numbers, which were then filled in with
> new syscalls - a nice idea in itself!
>
> Also sometimes 'newarch' did a 'clean', compressed list of syscall numbers
> straight away, reordering syscalls. Once the 'quick prototype' hack starts
> working on real hardware, once the syscall numbers get into the C library and
> binutils it's very hard to ever transition away: you'd break the world!
>
> An added source of noise that architectures tend to add new syscalls in a
> different order: some are more interesting to them - some less.
>
> So these syscall table hacks done very early during an arch's lifetime stick
> around and create wild numbering noise in 20+ syscall tables:
>
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â [ slightly edited for readability ]
>
> Âarch/alpha/include/asm/unistd.h: Â Â Â#define __NR_perf_event_open 493
> Âarch/arm/include/asm/unistd.h: Â Â Â Â#define __NR_perf_event_open 364
> Âarch/blackfin/include/asm/unistd.h: Â #define __NR_perf_event_open 369
> Âarch/frv/include/asm/unistd.h: Â Â Â Â#define __NR_perf_event_open 336
> Âarch/m68k/include/asm/unistd.h: Â Â Â #define __NR_perf_event_open 332
> Âarch/microblaze/include/asm/unistd.h: #define __NR_perf_event_open 366
> Âarch/mips/include/asm/unistd.h: Â Â Â #define __NR_perf_event_open 333
> Âarch/mips/include/asm/unistd.h: Â Â Â #define __NR_perf_event_open 292
> Âarch/mips/include/asm/unistd.h: Â Â Â #define __NR_perf_event_open 296
> Âarch/mn10300/include/asm/unistd.h: Â Â#define __NR_perf_event_open 337
> Âarch/parisc/include/asm/unistd.h: Â Â #define __NR_perf_event_open 318
> Âarch/powerpc/include/asm/unistd.h: Â Â#define __NR_perf_event_open 319
> Âarch/s390/include/asm/unistd.h: Â Â Â #define __NR_perf_event_open 331
> Âarch/sh/include/asm/unistd_32.h: Â Â Â#define __NR_perf_event_open 336
> Âarch/sh/include/asm/unistd_64.h: Â Â Â#define __NR_perf_event_open 364
> Âarch/sparc/include/asm/unistd.h: Â Â Â#define __NR_perf_event_open 327
> Âarch/x86/include/asm/unistd_32.h: Â Â #define __NR_perf_event_open 336
> Âarch/x86/include/asm/unistd_64.h: Â Â #define __NR_perf_event_open 298
>
> To fix this we'd create a new, clean offset defined by each architecture, and a
> generic enumeration of new syscalls.
>
> This would make it much easier to add new, generic syscalls to all
> architectures indeed.
>
> It would still leave compat syscall wrappers unaddressed though: those are
> often numbered differently and sometimes need arch specific wrapper entry
> functions, which then call the real generic syscall.
>
> But at least the primary, 'native' syscall table of every arch could be kept
> rather fresh via generic enumeration.

So we can start all over at offset 501 (alpha just started using 500)
with a unified,
clean, and compressed list of syscalls? Or do we have some more other-os-compat
syscalls around in this range?

Gr{oetje,eeting}s,

            Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
             Â Â -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/