Re: [PATCH, 3.7-rc7, RESEND] fs: revert commit bbdd6808 to fallocateUAPI

From: Ric Wheeler
Date: Fri Dec 07 2012 - 16:42:04 EST


On 12/07/2012 04:09 PM, Chris Mason wrote:
On Fri, Dec 07, 2012 at 01:43:25PM -0700, Theodore Ts'o wrote:
On Fri, Dec 07, 2012 at 02:03:06PM -0500, Chris Mason wrote:
That's not what happened though, and the right way forward from here is
to give the bit to the feature, maybe with a generic name like
FALLOCATE_WITHOUT_BEING_HORRIBLY_SLOW.
I don't think that's a good idea, because the current name explicitly
calls out the fact that we are making a tradeoff between what
***might*** be a security exposure in some cases (but which might be
perfectly fine in others) for performance. Using the generic name
would hide the fact that this tradeoff is being made, and the
arguments (which I've never seen backed up with a specific design) is
that it's possible to speed up random writes into preallocated space
on a flash device without making any kind of tradeoff that might imply
a security tradeoff.
Grin, we're really good at debating names. But I do see what you mean.
I'd hope that whatever generic facility we put in doesn't have the
security implications.

I would suggest a name like "let me see other peoples data, pronto"

If indeed it is possible to speed up this particular workload without
making any kind of no-hide-stale tradeoff, then we won't need the bit
--- writes into fallocated space will just get faster, with or without
the bit

I am sure it will be possible to do this in some cases (for example if
you have a device that supports persistent trim which can quickly
zeroize the blocks in question), but I would be very surprised if it's
possible to completely eliminate the performance degradation for all
devices and workloads. (Not all storage devices support persistent
trim, just for starters.)
Persistent trim is what I had in mind, but there are other ideas that do
imply a change in behavior as well. Can we safely assume this feature
won't matter on spinning media? New features like persistent
trim do make it much easier to solve securely, and using a bit for it
means we can toss back an error to the app if the underlying storage
isn't safe.

If google wants to have a block device patch that pretends to persistent
trim on devices that can't, great.

The other things that I think we should try would be to convert over larger chunks as we discussed on the list back in the summer (just because the user writes 4KB does not mean that we cannot flip over 1MB and zero that).


In answer's to Linus's question, the reason why people are
hyperventilating so badly about this is that in some circles,
revealing stale data is so horrible that anyone who even tries to
suggest this should be excommunicated. The mere existence of the
code, or that people are using it, horribly offends those people.
So I've always said this was a real performance problem and that it
isn't just limited to ext4. But can we please move past this part? I
don't think it is completely accurate.

The thing that bothers me is that no one wants to use this "feature" to see the stale data, just to benefit from a coincidental performance bump

Most features need to have a defined use case as opposed to a side effect as their motivation.

Let's focus on fixing the performance in a way that would be useful to a broader swath of users. To be clear, I certainly would never ship this in a distro I was involved in.

With or without the bit, we need to fix this properly if it is a meaningful workload.

ric



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/