Re: disabling group leader perf_event

From: Ingo Molnar
Date: Tue Sep 07 2010 - 00:07:07 EST



* Pekka Enberg <penberg@xxxxxxxxxx> wrote:

> Hi Ingo,
>
> On Mon, Sep 6, 2010 at 6:47 PM, Ingo Molnar <mingo@xxxxxxx> wrote:
> >> The actual language doesn't really matter.
> >
> > There are 3 basic categories:
> >
> >  1- Most (least abstract) specific code: a block of bytecode in the form
> >    of a simplified, executable, kernel-checked x86 machine code block -
> >    this is also the fastest form. [yes, this is actually possible.]
> >
> >  2- Least specific (most abstract) code: A subset/sideset of C - as it's
> >    the most kernel-developer-trustable/debuggable form.
> >
> >  3- Everything else little more than a dot on the spectrum between the
> >    first two points.
> >
> > I lean towards #2 - but #1 looks interesting too. #3 is distinctly
> > uninteresting as it cannot be as fast as #1 and cannot be as convenient
> > as #2.
>
> It's a question where you want to push the complexity of parsing the
> language and verifying the executed code. I'd image it's easier to
> evolve an ABI if we use an intermediate form ("bytecode") on the
> kernel side. Supporting multiple versions of a C-like language is
> probably going to be painful. [...]

Not really, as it's only extended. So there's really just one version to
support for every kernel - it's just that user-space will initially only
use 'older' elements of the language.

> [...] You also probably don't want to put heavy-weight compiler
> optimization passes in the kernel so with an intermediate form, you
> can do much of that in user-space.

The question of what can and cannot be done in the kernel is overrated.
We sure can put a C compiler into the kernel - 10 years down the line we
wont understand what the fuss was all about.

I still remember all the silly 'graphics code should never be in the
kernel, it's way too complex and fragile' arguments from 1996.

What matters is that it's a hugely flexible and hugely useful feature.
All our ad-hoc script engines in the kernel (trace-filter, selinux,
netfilter), etc. could be implemented via it.

And it would allow fantastic feature beyond existing code.

For example a new category of filesystem could be created: with a
'self-defining layout' - by storing the C code of the filesystem data
structures _on-disk_.

A filesystem could have a new, more optimal layout by simply having new
format routines defined in C, stored on disk (in the superblock, or in a
block referred to by inodes). Old filesystem layouts would be compatible
forever: the C code is on-disk and never lost as long as the data is
there - etc.

New filesystem features could be created in a very flexible way, without
risking old data.

Mixed mode filesystems would be possible: new files get the new logic,
old files the old logic. This would allow the gradual migration to a new
filesystem layout for example, without a reinstall.

etc.

Key is to have a kernel that can execute code as data and to embedd that
code in data structures.

> I'm guessing this thing is expected to work on all architectures? If
> that's true, I'd forget about JIT'ing for the time being and write an
> interpreter first because it's much easier to port. There are
> techniques in making an interpreter pretty fast too. Google for
> "inlining interpreter" if you're interested.

Yeah, i dont think speed is a primary concern - if overhead matters it
will be clearly measurable and people can iterate the optimizations ...

> As for the intermediate form, you might want to take a look at Dalvik:
>
> http://www.netmite.com/android/mydroid/dalvik/docs/dalvik-bytecode.html
>
> and probably ParrotVM bytecode too. The thing to avoid is stack-based
> instructions like in Java bytecode because although it's easy to write
> interpreters for them, it makes JIT'ing harder (which needs to convert
> stack-based representation to register-based) and probably doesn't
> lend itself well to stack-constrained kernel code.

_If_ we pass in any sort of machine code to the kernel (which bytecode
really is), then we should do the right thing and pass in raw x86
bytecode, and verify it in the kernel.

That way the compiler can be kept out of the kernel, and performance of
the thing will be phenomenal from day 1 on.

For non-x86 in most cases we can use a simple translator that runs
during the verification run - or of course they could have their own
native 'assembly bytecode' verifier and their user-space could compile
to those.

But i'd prefer C code really, as it's really 'abstract data' in the most
generic sense. That's why the trace filter engine started with a subset
of C.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/