Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

From: Linus Torvalds
Date: Thu Mar 03 2016 - 12:55:45 EST


On Thu, Mar 3, 2016 at 9:02 AM, Theodore Ts'o <tytso@xxxxxxx> wrote:
>
> There is a massive bug in the SATA specs about trim, which is that it
> is considered advisory. So the storage device can throw it away
> whenever it feels like it. (In practice, when it's too busy doing
> other things).

Ugh.

But that essentially says that we shouldn't expose this interface at
all (unless we trust our white-lists - I'm sure they are getting
better, but if nobody has ever really _relied_ on the zeroing behavior
of trim, then I guess there could be tons of bugs lurking).

Or maybe we should expose it, but not call it BLKZEROOUT, and make it
*much* more generic.

That migth actually put some of my complaints to rest: if this is more
a general "manage this range of blocks" model, then the flags make
more sense to me.

So what are people actually wanting to do?

If they don't care horribly about the zeroing, they might then say
"using trim is ok". But why wouldn't they use the BLKDISCARD ioctl
then?

So just looking at this more, that "trim is ok" flag still doesn't
make much sense to me.

I see two cases: either we guarantee zero-out behavior with discard
set to true (and we trust our whitelists), or we don't. Can anybody
see a third alternative?

And if we don't guarantee zero-out behavior from
blkdev_issue_zeroout() with "discard" set to true, then why would we
expose such a random interface to user space? No sane user space could
*possibly* use it: if they care about zeroing, it's the wrong thing to
do, and if they *don't* care about zeroing it's still the wrong thing
to do.

In other words, I still don't see how that flag can possibly make
sense in any possible scenario.

Put succinctly:

"Either we trust trim and and our whitelists (in which case _not_
using trim makes no sense), or we do (in which case exposing a random
untrustworthy user interface is pointless, since any user would be
fundamentally broken and should just have used BLKDISCARD)"

See where I'm coming from?

Now, the reason I think a more generic model that *isn't* hung up
about zeroing the buffer migth be ok is that maybe it would be a good
thing to have a more unified itnerface for doing all those things
people do want to do:

- flush caches

- discard (our current BLKDISCARD doesn't flush caches either, so
together with flushing caches this is something new)

- zero out

- synchronous/asynchronous

- other things?

So I do see a case for passing in multiple flags, but a lot of that
case ends up depending on the zeroing out *not* being the most central
feature.

I very much could see wanting "discard these blocks and flush caches".
And I could see just "flush caches", with or without zeroing. But I do
*not* see the point of "discard blocks and zero" for the reasons
outlined above.

Linus