Re: attempting to format brd device results in OOM kills

From: Jens Axboe
Date: Sun Jun 18 2017 - 18:28:09 EST


On 06/18/2017 04:21 PM, Jens Axboe wrote:
> On 06/18/2017 10:30 AM, Jeff Layton wrote:
>> I've run across a regression from v4.11. If I boot a v4.12-rc1 or later
>> kernel, make a large brd device and try to format it, it quickly slows
>> down to a crawl and then the OOM killer kicks in.
>>
>> I ran a bisect and it landed here:
>>
>> commit f09a06a193d942a12c1a33c153388b3962222006 (HEAD, refs/bisect/bad)
>> Author: Christoph Hellwig <hch@xxxxxx>
>> Date: Wed Apr 5 19:21:16 2017 +0200
>>
>> brd: remove discard support
>>
>> It's just a in-driver reimplementation of writing zeroes to the pages,
>> which fails if the discards aren't page aligned.
>>
>> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
>> Reviewed-by: Hannes Reinecke <hare@xxxxxxxx>
>> Signed-off-by: Jens Axboe <axboe@xxxxxx>
>>
>>
>> I've been reproducing it in a VM with ~8G allocated to it:
>>
>> I have a modprobe.d file with this in it:
>>
>> options brd rd_nr=1 rd_size=1073741824
>>
>> I then just:
>>
>> # modprobe brd
>> # mkfs -t ext2 /dev/ram0
>>
>> It keels over pretty quickly after that.
>
> Just checked, and creating a 1TB ram disk and then running mkfs.ext2 on it
> writes 16851MiB of data. I can't say I'm surprised you OOM, if you run that
> in a 8G VM, as you're about 8G short.
>
> I'm puzzled as to why the discard change would make any difference, however.

Reverted the patch, and I see identical behavior. The only difference is that
the whole device is trimmed first, as expected. But it still writes ~16G
afterwards.

Are you sure this commit is what broke things for you? Honestly, I don't see
how it could ever work with 1TB ram disk, 8G of RAM, and 16G of data written.

--
Jens Axboe