Re: Implementing NVMHCI...

From: Robert Hancock
Date: Sun Apr 12 2009 - 14:36:16 EST


Linus Torvalds wrote:
IOW, when you allocate a new 32kB cluster, you will have to allocate 8 pages to do IO on it (since you'll have to initialize the diskspace), but you can still literally treat those pages as _individual_ pages, and you can write them out in any order, and you can free them (and then look them up) one at a time.

Notice? The cluster size really only ends up being a disk-space allocation issue, not an issue for actually caching the end result or for the actual size of the IO.

Right.. I didn't realize we were actually that smart (not writing out the entire cluster when dirtying one page) but I guess it makes sense.


The hardware sector size is very different. If you have a 32kB hardware sector size, that implies that _all_ IO has to be done with that granularity. Now you can no longer treat the eight pages as individual pages - you _have_ to write them out and read them in as one entity. If you dirty one page, you effectively dirty them all. You can not drop and re-allocate pages one at a time any more.

Linus

I suspect that in this case trying to gang together multiple pages inside the VM to actually handle it this way all the way through would be insanity. My guess is the only way you could sanely do it is the read-modify-write approach when writing out the data (in the block layer maybe?) where the read can be optimized away if the pages for the entire hardware sector are already in cache or the write is large enough to replace the entire sector. I assume we already do this in the md code somewhere for cases like software RAID 5 with a stripe size of >4KB..

That obviously would have some performance drawbacks compared to a smaller sector size, but if the device is bound and determined to use bigger sectors internally one way or the other and the alternative is the drive does R-M-W internally to emulate smaller sectors - which for some devices seems to be the case - maybe it makes more sense to do it in the kernel if we have more information to allow us to do it more efficiently. (Though, at least on the normal ATA disk side of things, 4K is the biggest number I've heard tossed about for a future expanded sector size, but flash devices like this may be another story..)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/