Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.

From: Michael S. Tsirkin
Date: Wed Sep 15 2010 - 07:34:32 EST


On Wed, Sep 15, 2010 at 11:13:44AM +0800, Xin, Xiaohui wrote:
> >From: Michael S. Tsirkin [mailto:mst@xxxxxxxxxx]
> >Sent: Sunday, September 12, 2010 9:37 PM
> >To: Xin, Xiaohui
> >Cc: netdev@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> >mingo@xxxxxxx; davem@xxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxxx;
> >jdike@xxxxxxxxxxxxxxx
> >Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
> >
> >On Sat, Sep 11, 2010 at 03:41:14PM +0800, Xin, Xiaohui wrote:
> >> >>Playing with rlimit on data path, transparently to the application in this way
> >> >>looks strange to me, I suspect this has unexpected security implications.
> >> >>Further, applications may have other uses for locked memory
> >> >>besides mpassthru - you should not just take it because it's there.
> >> >>
> >> >>Can we have an ioctl that lets userspace configure how much
> >> >>memory to lock? This ioctl will decrement the rlimit and store
> >> >>the data in the device structure so we can do accounting
> >> >>internally. Put it back on close or on another ioctl.
> >> >Yes, we can decrement the rlimit in ioctl in one time to avoid
> >> >data path.
> >> >
> >> >>Need to be careful for when this operation gets called
> >> >>again with 0 or another small value while we have locked memory -
> >> >>maybe just fail with EBUSY? or wait until it gets unlocked?
> >> >>Maybe 0 can be special-cased and deactivate zero-copy?.
> >> >>
> >>
> >> How about we don't use a new ioctl, but just check the rlimit
> >> in one MPASSTHRU_BINDDEV ioctl? If we find mp device
> >> break the rlimit, then we fail the bind ioctl, and thus can't do
> >> zero copy any more.
> >
> >Yes, and not just check, but decrement as well.
> >I think we should give userspace control over
> >how much memory we can lock and subtract from the rlimit.
> >It's OK to add this as a parameter to MPASSTHRU_BINDDEV.
> >Then increment the rlimit back on unbind and on close?
> >
> >This opens up an interesting condition: process 1
> >calls bind, process 2 calls unbind or close.
> >This will increment rlimit for process 2.
> >Not sure how to fix this properly.
> >
> I can't too, can we do any synchronous operations on rlimit stuff?
> I quite suspect in it.
>
> >--
> >MST

Here's what infiniband does: simply pass the amount of memory userspace
wants you to lock on an ioctl, and verify that either you have
CAP_IPC_LOCK or this number does not exceed the current rlimit. (must
be on ioctl, not on open, as we likely want the fd passed around between
processes), but do not decrement rlimit. Use this on following
operations. Be careful if this can be changed while operations are in
progress.

This does mean that the effective amount of memory that userspace can
lock is doubled, but at least it is not unlimited, and we sidestep all
other issues such as userspace running out of lockable memory simply by
virtue of using the driver.

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/