Re: [PATCH] sg_io: allow UNMAP and WRITE SAME without CAP_SYS_RAWIO
From: Paolo Bonzini
Date: Tue Sep 11 2012 - 17:50:38 EST
[Al: you can jump down to "One problem:"]
Il 11/09/2012 22:01, Tejun Heo ha scritto:
> Hello, Paolo.
> On Tue, Sep 11, 2012 at 09:24:32PM +0200, Paolo Bonzini wrote:
>>> Couldn't it intercept some of them - e.g. RWs and discards?
>>> What's the benifit / use case of doing pure bypass?
>> Basically, using the same storage technology for bare metal and
>> virtualized systems. IMHO losing sense data is a no-no, but the above
>> solution could be feasible too.
> Either way, with or without virtualization, making detailed error
> information to userland is a valid goal. I *think* we're finally
> getting there after years of talking via structured printk. I don't
> know much about the details but heard about exposing sense data via
Wait wait, there is already a perfectly 1:1 solution for this, and it's
I think error processing falls roughly in two categories: "I need each
command's precise state" and "I need to know if/when something bad
happens". Luckily, I/O also falls roughly in the same two categories:
"I need precise control of each commands" and "I just care of getting
this to disk". The former can use SG_IO, the latter can use logs.
So, let's not complicate the problem further. We have a perfectly sane
API that (with different names) is even provided by almost every
operating system in existence. There's just this little detail of
filtering that is done for unprivileged processes; I hoped to fix 50% of
the problem with this 3-line patch but it's not the end of the world if
it's rejected constructively.
The solution I outlined in my previous email:
>> Enabling/disabling the filters from a privileged
>> program and passing the unfiltered fd via SCM_RIGHTS would be enough.
would entail some userland coding, but nothing paramount at all (and
closer to my usual territory :)). And we would have to do it anyway for
the reservations case.
Basically it would be a ioctl(fd, SG_SET_FILTER_ENABLED, arg) where arg
-1 for "enable/disable based on CAP_SYS_RAWIO" (default)
0 for always enable filter
1 for always disable filter
And also a dual ioctl(fd, SG_GET_FILTER_ENABLED, arg).
One problem: to do this, I need to access some "struct file" member in
SG_IO, and thus change the ioctl member from block_device/fmode to
block_device/file. This would partially undo the 2007 switch from
inode/file by Al Viro. He was already asked about it in
https://lkml.org/lkml/2012/6/12/414, let's try again here.
>>> Can't you make use of the existing disk events mechanism for that?
>>> Block layer already knows how to watch readiness of a device and tell
>>> the userland about it via uevent.
>> How? But anyway i don't want to divert the discussion from the actual
> Disk events mechanism is there to watch (either via async notification
> or polling) media change and device readiness and generates the usual
> uevents when it detects them. For sd devices, it basically issues TUR
> periodically, so it's already doing pretty much what you need.
Ah, no, we can't do that because the device should be opened with
O_EXCL. It is not right now, but it's a bug. It's not very different
from burning a CD (in fact, it's absolutely the same if you burn a CD
inside a guest :)).
> I guess the repeating question is whether to solve the problem within
> the framework the underlying OS is providing or having direct access
> to the raw hardware. I don't know the answer.
> Accessing the "raw" hardware does have its advantages but managing
> multiple users
In this case, the constraints pretty much guarantee that you have only
one user. To stick with everyday hardware, if you pass your CD drive to
a guest you can well expect that the host will not be able to use it.
Or, if you have more than one user, that they know what they are doing
> I personally hope "raw" to be strictly confined to specific areas
> where performance impact of having kernel inbetween is prohibitive but
> that's just me hoping.
Well, it's not just about performance but also about precision sometimes.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/