Re: [PATCH] Describe race of direct read and fork for unaligned buffers

From: Michael Kerrisk (man-pages)
Date: Sat May 05 2012 - 07:28:33 EST


On Thu, May 3, 2012 at 7:25 AM, KOSAKI Motohiro
<kosaki.motohiro@xxxxxxxxx> wrote:
> On Wed, May 2, 2012 at 3:23 PM, Jan Kara <jack@xxxxxxx> wrote:
>> On Wed 02-05-12 15:14:33, KOSAKI Motohiro wrote:
>>> Hello,
>>>
>>> >> I see what you mean.
>>> >>
>>> >> I'm not sure, though. For most apps it's bad practice I think. If you get into
>>> >> realm of sophisticated, performance critical IO/storage managers, it would
>>> >> not surprise me if such concurrent buffer modifications could be allowed.
>>> >> We allow exactly such a thing in our pagecache layer. Although probably
>>> >> those would be using shared mmaps for their buffer cache.
>>> >>
>>> >> I think it is safest to make a default policy of asking for IOs against private
>>> >> cow-able mappings to be quiesced before fork, so there are no surprises
>>> >> or reliance on COW details in the mm. Do you think?
>>> >    Yes, I agree that (and MADV_DONTFORK) is probably the best thing to have
>>> > in documentation. Otherwise it's a bit too hairy...
>>>
>>> I neglected this issue for years because Linus asked who need this and
>>> I couldn't
>>> find real world usecase.
>>>
>>> Ah, no, not exactly correct. Fujitsu proprietary database had such
>>> usecase. But they quickly fixed it. Then I couldn't find alternative usecase.
>>  One of our customers hit this bug recently which is why I started to look
>> at this. But they also modified their application not to hit the problem.
>>
>>> I'm not sure why you say "hairy". Do you mean you have any use case of this?
>>  I meant that if we should describe conditions like "if you have page
>> aligned buffer and you don't write to it while the IO is running, the
>> problem also won't occur", then it's already too detailed and might
>> easily change in future kernels...

So, am I correct to assume that right text to add to the page is as below?

Nick, can you clarify what you mean by "quiesced"?

[[
O_DIRECT IOs should never be run concurrently with fork(2) system call,
when the memory buffer is anonymous memory, or comes from mmap(2)
with MAP_PRIVATE.

Any such IOs, whether submitted with asynchronous IO interface or from
another thread in the process, should be quiesced before fork(2) is called.
Failure to do so can result in data corruption and undefined behavior in
parent and child processes.

This restriction does not apply when the memory buffer for the O_DIRECT
IOs comes from mmap(2) with MAP_SHARED or from shmat(2).
Nor does this restriction apply when the memory buffer has been advised
as MADV_DONTFORK with madvise(2), ensuring that it will not be available
to the child after fork(2).
]]

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/