Re: [PATCH, 3.7-rc7, RESEND] fs: revert commit bbdd6808 to fallocateUAPI

From: Theodore Ts'o
Date: Mon Dec 10 2012 - 13:20:47 EST


A sentence or two got chopped out during an editing pass. Let me try
that again so it's a bit clearer what I was trying to say....

Sure, but if the block device supports WRITE_SAME or persistent
discard, then presumably fallocate() should do this automatically all
the time, and not require a flag to request this behavior. The only
reason why you might not is if the WRITE_SAME is more costly. That is
when a seek plus writing 1MB does take more time than the amount of
disk time fraction that it consumes if you compare it to a seek plus
writing 4k or 32k.

Ext4 currently uses a threshold of 32k for this break point (below
that, we will use sb_issue_zeroout; above that, we will break apart an
uninitialized extent when writing into a preallocated region). It may
be that 32k is too low, especailly for certain types of devices (i.e.,
SSD's versus RAID 5, where it should be aligned on a RAID strip,
etc.). More of an issue might be that there will be some disagreement
about whether people want to the system to automatically tune for
average throughput vs 99.9 percentile latency.

Regardless, this is actually something which I think the file system
should try to do automatically if at all possible, via some kind of
auto-tuning hueristic, instead of using an explicit fallocate(2) flag.
(See, I don't propose using a new fallocate flag for everything. :-)

- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/