Re: [patch] aio: Don't zero out the pages array inside struct dio

From: Andrew Morton
Date: Fri Oct 30 2009 - 17:18:37 EST


On Fri, 30 Oct 2009 09:39:55 -0400
Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:

> Hi,
>
> Intel reported a performance regression caused by the following commit:
>
> commit 848c4dd5153c7a0de55470ce99a8e13a63b4703f
> Author: Zach Brown <zach.brown@xxxxxxxxxx>
> Date: Mon Aug 20 17:12:01 2007 -0700
>
> dio: zero struct dio with kzalloc instead of manually
>
> This patch uses kzalloc to zero all of struct dio rather than
> manually trying to track which fields we rely on being zero. It
> passed aio+dio stress testing and some bug regression testing on
> ext3.
>
> This patch was introduced by Linus in the conversation that lead up
> to Badari's minimal fix to manually zero .map_bh.b_state in commit:
>
> 6a648fa72161d1f6468dabd96c5d3c0db04f598a
>
> It makes the code a bit smaller. Maybe a couple fewer cachelines to
> load, if we're lucky:
>
> text data bss dec hex filename
> 3285925 568506 1304616 5159047 4eb887 vmlinux
> 3285797 568506 1304616 5158919 4eb807 vmlinux.patched
>
> I was unable to measure a stable difference in the number of cpu
> cycles spent in blockdev_direct_IO() when pushing aio+dio 256K reads
> at ~340MB/s.
>
> So the resulting intent of the patch isn't a performance gain but to
> avoid exposing ourselves to the risk of finding another field like
> .map_bh.b_state where we rely on zeroing but don't enforce it in the
> code.
>
> Zach surmised that zeroing out the page array was what caused most of
> the problem, and suggested the approach taken in the attached patch for
> resolving the issue. Intel re-tested with this patch and saw a 0.6%
> performance gain (the original regression was 0.5%).
>
> Comments, as always, are appreciated.
>

You forgot something:

--- a/fs/direct-io.c~aio-dont-zero-out-the-pages-array-inside-struct-dio-fix
+++ a/fs/direct-io.c
@@ -130,6 +130,12 @@ struct dio {
unsigned head; /* next page to process */
unsigned tail; /* last valid page + 1 */
int page_errors; /* errno from get_user_pages() */
+
+ /*
+ * pages[] (and any fields placed after it) are not zeroed out at
+ * allocation time. Don't add new fields after pages[] unless you
+ * wish that they not be zeroed.
+ */
struct page *pages[DIO_PAGES]; /* page buffer */
};

_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/