Re: [problem] raid performance loss with 2.6.26-rc8 on 32-bit x86(bisected)

From: Dan Williams
Date: Tue Jul 01 2008 - 16:29:52 EST



On Tue, 2008-07-01 at 12:07 -0700, Mel Gorman wrote:
> On (01/07/08 18:58), Andy Whitcroft didst pronounce:
> > > > Neil suggested CONFIG_NOHIGHMEM=y, I will give that a shot tomorrow.
> > > > Other suggestions / experiments?
> > > >
> >
> > Looking at the commit in question (54a6eb5c) there is one slight anomoly
> > in the conversion. When nr_free_zone_pages() was converted to the new
> > iterators it started using the offset parameter to limit the zones
> > traversed; which is not unreasonable as that appears to be the
> > parameters purpose. However, if we look at the original implementation
> > of this function (reproduced below) we can see it actually did nothing
> > with this parameter:
> >
> > static unsigned int nr_free_zone_pages(int offset)
> > {
> > /* Just pick one node, since fallback list is circular */
> > unsigned int sum = 0;
> >
> > struct zonelist *zonelist = node_zonelist(numa_node_id(), GFP_KERNEL);
> > struct zone **zonep = zonelist->zones;
> > struct zone *zone;
> >
> > for (zone = *zonep++; zone; zone = *zonep++) {
> > unsigned long size = zone->present_pages;
> > unsigned long high = zone->pages_high;
> > if (size > high)
> > sum += size - high;
> > }
> >
> > return sum;
> > }
> >
>
> This looks kinda promising and depends heavily on how this patch was
> tested in isolation. Dan, can you post the patch you use on 2.6.25
> because the commit in question should not have applied cleanly please?
>
> To be clear, 2.6.25 used the offset parameter correctly to get a zonelist with
> the right zones in it. However, with two-zonelist, there is only one that
> gets filtered so using GFP_KERNEL to find a zone is equivilant as it gets
> filtered based on offset. However, if this patch was tested in isolation,
> it could result in bogus values of vm_total_pages. Dan, can you confirm
> in your dmesg logs that the line like the following has similar values
> please?
>
> Built 1 zonelists in Zone order, mobility grouping on. Total pages: 258544

The system is booted with mem=1024M on the kernel command line and with
or without Andy's patch this reports:

Built 1 zonelists in Zone order, mobility grouping on. Total pages: 227584

Performance is still sporadic with the change. Moreover this condition
is reproducing even with CONFIG_NOHIGHMEM=y.

Let us take commit 8b3e6cdc out of the equation and just look at raid0
performance:

revision 2.6.25.8-fc8 54a6eb5c 54a6eb5c-nohighmem 2.6.26-rc8
279 278 273 277
281 278 275 277
281 113 68.7 66.8
279 69.2 277 73.7
278 75.6 62.5 80.3
MB/s (avg) 280 163 191 155
% change 0% -42% -32% -45%
result base bad bad bad

These numbers are taken from the results of:
for i in `seq 1 5`; do dd if=/dev/zero of=/dev/md0 bs=1024k count=2048; done

Where md0 is created by:
mdadm --create /dev/md0 /dev/sd[b-e] -n 4 -l 0

I will try your debug patch next Mel, and then try to collect more data
with blktrace.

--
Dan





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/