Re: Disabling in-memory write cache for x86-64 in Linux II
From: Artem S. Tashkinov
Date: Fri Oct 25 2013 - 04:31:03 EST
Oct 25, 2013 02:18:50 PM, Linus Torvalds wrote:
On Fri, Oct 25, 2013 at 8:25 AM, Artem S. Tashkinov wrote:
>> On my x86-64 PC (Intel Core i5 2500, 16GB RAM), I have the same 3.11 kernel
>> built for the i686 (with PAE) and x86-64 architectures. What's really troubling me
>> is that the x86-64 kernel has the following problem:
>> When I copy large files to any storage device, be it my HDD with ext4 partitions
>> or flash drive with FAT32 partitions, the kernel first caches them in memory entirely
>> then flushes them some time later (quite unpredictably though) or immediately upon
>> invoking "sync".
>Yeah, I think we default to a 10% "dirty background memory" (and
>allows up to 20% dirty), so on your 16GB machine, we allow up to 1.6GB
>of dirty memory for writeout before we even start writing, and twice
>that before we start *waiting* for it.
>On 32-bit x86, we only count the memory in the low 1GB (really
>actually up to about 890MB), so "10% dirty" really means just about
>90MB of buffering (and a "hard limit" of ~180MB of dirty).
>And that "up to 3.2GB of dirty memory" is just crazy. Our defaults
>come from the old days of less memory (and perhaps servers that don't
>much care), and the fact that x86-32 ends up having much lower limits
>even if you end up having more memory.
>You can easily tune it:
> echo $((16*1024*1024)) > /proc/sys/vm/dirty_background_bytes
> echo $((48*1024*1024)) > /proc/sys/vm/dirty_bytes
>or similar. But you're right, we need to make the defaults much saner.
>Wu? Andrew? Comments?
My feeling is that vm.dirty_ratio/vm.dirty_background_ratio should _not_ be
percentage based, 'cause for PCs/servers with a lot of memory (say 64GB or
more) this value becomes unrealistic (13GB) and I've already had some
unpleasant effects due to it.
I.e. when I dump a large MySQL database (its dump weighs around 10GB)
- it appears on the disk almost immediately, but then, later, when the kernel
decides to flush it to the disk, the server almost stalls and other IO requests
take a lot more time to complete even though mysqldump is run with ionice -c3,
so the use of ionice has no real effect.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/