Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - goingbeyond 4096 bytes

From: Ric Wheeler
Date: Wed Jan 22 2014 - 11:45:53 EST


On 01/22/2014 11:03 AM, James Bottomley wrote:
On Wed, 2014-01-22 at 15:14 +0000, Chris Mason wrote:
On Wed, 2014-01-22 at 09:34 +0000, Mel Gorman wrote:
On Tue, Jan 21, 2014 at 10:04:29PM -0500, Ric Wheeler wrote:
One topic that has been lurking forever at the edges is the current
4k limitation for file system block sizes. Some devices in
production today and others coming soon have larger sectors and it
would be interesting to see if it is time to poke at this topic
again.

Large block support was proposed years ago by Christoph Lameter
(http://lwn.net/Articles/232757/). I think I was just getting started
in the community at the time so I do not recall any of the details. I do
believe it motivated an alternative by Nick Piggin called fsblock though
(http://lwn.net/Articles/321390/). At the very least it would be nice to
know why neither were never merged for those of us that were not around
at the time and who may not have the chance to dive through mailing list
archives between now and March.

FWIW, I would expect that a show-stopper for any proposal is requiring
high-order allocations to succeed for the system to behave correctly.

My memory is that Nick's work just didn't have the momentum to get
pushed in. It all seemed very reasonable though, I think our hatred of
buffered heads just wasn't yet bigger than the fear of moving away.

But, the bigger question is how big are the blocks going to be? At some
point (64K?) we might as well just make a log structured dm target and
have a single setup for both shingled and large sector drives.
There is no real point. Even with 4k drives today using 4k sectors in
the filesystem, we still get 512 byte writes because of journalling and
the buffer cache.

I think that you are wrong here James. Even with 512 byte drives, the IO's we send down tend to be 4k or larger. Do you have traces that show this and details?


The question is what would we need to do to support these devices and
the answer is "try to send IO in x byte multiples x byte aligned" this
really becomes an ioscheduler problem, not a supporting large page
problem.

James


Not that simple.

The requirement of some of these devices are that you *never* send down a partial write or an unaligned write.

Also keep in mind that larger block sizes allow us to track larger files with smaller amounts of metadata which is a second win.

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/