Re: [00/17] Large Blocksize Support V3

From: Eric W. Biederman
Date: Thu Apr 26 2007 - 06:55:44 EST

Next message: Johannes Berg: "Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2:hang in atomic copy)"
Previous message: Matthias Kaehlcke: "[PATCH] use mutex instead of semaphore in CAPI 2.0 interface"
In reply to: Mel Gorman: "Re: [00/17] Large Blocksize Support V3"
Next in thread: Mel Gorman: "Re: [00/17] Large Blocksize Support V3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

mel@xxxxxxxxx (Mel Gorman) writes:

> On (26/04/07 18:55), Nick Piggin didst pronounce:
>> Mel Gorman wrote:
>> >On (26/04/07 16:50), Nick Piggin didst pronounce:
>>
>> >>Fragmentation is the problem. The anti-frag patches don't actually
>> >>guarantee anything about fragmentation, and even if they did, then
>> >
>> >
>> >The grouping pages by mobility do not guarantee anything but the memory
>> >partition (kernelcore= boot parameter) does give hard guarantees about
>> >the amount of memory that is "movable". Of course, the partition requires
>> >configuration at boot-time so it's less than ideal but it does give hard
>> >guarantees.
>>
>> For the hugepages people, I can understand that's a solution.
>
> I think that's the closest you have ever come to saying fragmentation
> avoidance is not a terrible idea :)
>
>> But that's
>> the last thing you want to do on a system with a limited amount of memory,
>> or a regular Joe's desktop/server.
>>
>
> Regular Joe is not going to be creating a filesystem with large blocks or
> overly concerned with saturating all the disks hanging off his RAID array.
> At most, he'll care about faster DVD writing and considering the number and
> duration of those allocations, the system will be able to handle it. So I
> don't think Regular Joe will generally care.

The practical question is if this a special purpose hack, or is
this general infrastructure that we expect filesystems to seriously
use.

If this is a special purpose hack then the larger pages and
fragmentation are minor issues, but it's very existence is a problem.

If this is not a special purpose hack and people do use this a lot
then the fragmentation is a problem.

Your reply indicates that fragmentation is a concern, as does the
initial posting and the more positive threads. Therefore either
this is a special purpose hack in which case it's existence is
questionable or it fails a general solution and it's existence
is questionable.

>> >Indeed but then you have to deal with internal fragmentation
>> >for pages-larger-than-TLB-page. I'm not saying it's wrong but it does
>> >come with it's own set of issues.
>>
>> None of them is perfect (the ways to increase the size of pagecache pages,
>> that is).
>>
>> I think in the long term, TLB page sizes will probably increase a little
>> bit... but if a given page size is "good enough" for a CPU, they really
>> should be good enough for other hardware. I mean, come on, the CPU's TLB
>> has to have a good hit ratio and handle several lookups per cycle with a
>> 3-cycle latency on 3GHz+ hardware... surely a an IO controller's
>> scatter-gather engine or IOMMU that has to do a few lookups per disk IO
>> is nowhere near so critical as a CPU's datapath: just add a few more
>> entries to it, they've already got hundreds of megs of cache, so that
>> isn't an issue either.
>>
>
> I cannot speak with authority on what IO controllers are really capable
> of so maybe someone else will comment on this more.

Well here is some reality. Using an FPGA (slow by definition)
connected to a hypertransport interface I have exceeded a gigabyte a
second, without even setting up scatter gather.

All of the I/O on all of the peripheral busses happens in sub
1 page chunks.

At the rates we are talking for disks the issue is really how to
bury the disk latency. For fast arrays how do you submit enough
request to bury the latency.

I can't possibly believe any of this is about the cost of processing
a request, but rather the problem that some devices don't have a large
enough pool of requests to keep them busy if you submit the requests
in a 4K page sizes.

This all sounds like a device design issue rather than anything more
significant, and it doesn't sound like a long term trend. Market
pressure should fix the hardware.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Johannes Berg: "Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2:hang in atomic copy)"
Previous message: Matthias Kaehlcke: "[PATCH] use mutex instead of semaphore in CAPI 2.0 interface"
In reply to: Mel Gorman: "Re: [00/17] Large Blocksize Support V3"
Next in thread: Mel Gorman: "Re: [00/17] Large Blocksize Support V3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]