Re: [Qemu-devel] [RFC] Next gen kvm api
From: Michael Ellerman
Date: Wed Feb 15 2012 - 20:04:35 EST
On Wed, 2012-02-15 at 22:21 +0000, Arnd Bergmann wrote:
> On Tuesday 07 February 2012, Alexander Graf wrote:
> > On 07.02.2012, at 07:58, Michael Ellerman wrote:
> >
> > > On Mon, 2012-02-06 at 13:46 -0600, Scott Wood wrote:
> > >> You're exposing a large, complex kernel subsystem that does very
> > >> low-level things with the hardware. It's a potential source of exploits
> > >> (from bugs in KVM or in hardware). I can see people wanting to be
> > >> selective with access because of that.
> > >
> > > Exactly.
> > >
> > > In a perfect world I'd agree with Anthony, but in reality I think
> > > sysadmins are quite happy that they can prevent some users from using
> > > KVM.
> > >
> > > You could presumably achieve something similar with capabilities or
> > > whatever, but a node in /dev is much simpler.
> >
> > Well, you could still keep the /dev/kvm node and then have syscalls operate on the fd.
> >
> > But again, I don't see the problem with the ioctl interface. It's nice, extensible and works great for us.
> >
>
> ioctl is good for hardware devices and stuff that you want to enumerate
> and/or control permissions on. For something like KVM that is really a
> core kernel service, a syscall makes much more sense.
Yeah maybe. That distinction is at least in part just historical.
The first problem I see with using a syscall is that you don't need one
syscall for KVM, you need ~90. OK so you wouldn't do that, you'd use a
multiplexed syscall like epoll_ctl() - or probably several
(vm/vcpu/etc).
Secondly you still need a handle/context for those syscalls, and I think
the most sane thing to use for that is an fd.
At that point you've basically reinvented ioctl :)
I also think it is an advantage that you have a node in /dev for
permissions. I know other "core kernel" interfaces don't use a /dev
node, but arguably that is their loss.
> I would certainly never mix the two concepts: If you use a chardev to get
> a file descriptor, use ioctl to do operations on it, and if you use a
> syscall to get the file descriptor then use other syscalls to do operations
> on it.
Sure, we use a syscall to get the fd (open) and then other syscalls to
do operations on it, ioctl and kvm_vcpu_run. ;)
But seriously, I guess that makes sense. Though it's a bit of a pity
because if you want a syscall for any of it, eg. vcpu_run(), then you
have to basically reinvent ioctl for all the other little operations.
cheers
Attachment:
signature.asc
Description: This is a digitally signed message part