Re: [PATCH AUTOSEL 5.5 542/542] pipe: use exclusive waits when reading or writing

From: Linus Torvalds
Date: Thu Mar 05 2020 - 13:41:14 EST


On Thu, Mar 5, 2020 at 12:20 PM Andrei Vagin <avagin@xxxxxxxxx> wrote:
>
> After this change, one more criu test became flaky. This is due to one
> of corner cases, so I am not sure that we need to fix something in the
> kernel. I have fixed this issue in the test. I am not sure that this
> will affect any real applications.

It's an interesting test-case, but it's really not doing anything you
should rely on.

The code basically expects a pipe write() to be "atomic" for something
bigger than PIPE_BUF. We've never really guaranteed that (and POSIX
doesn't), but we had this special case where readers would continue to
read as long as there was an active writer, which kind of approximated
that for some cases (and your test-case in particular).

A reader that wants to read everything should do multiple read() calls
until it gets an EOF (or gets the expected buffer size). A regular
read() on a pipe can simply always return a partial buffer (it will
always do so in the case of signals, but what you're seeing is that it
will also now do it if the kernel buffers emptied even when there was
a writer that was ready to fill them again).

And I suspect it wasn't actually the commit you point to that changes
the behavior, I think it's actually the "pipe: remove
'waiting_writers' merging logic" that changed behavior for your test

But because it's then timing-dependent on whether the reader gets all
the data or not, it might bisect to any commit after that point.
Particularly since some of the other commits change timing too..

We could re-introduce the "continue reading while there are active
writers" logic, but if this is the only test that triggers that, I'd
prefer to wait until some real user notices...

But if CRIU itself depends on this behavior (rather than just a test),
then I guess we need to.

So is it just a test-case, or does CRIU itself depend on that "reads
get full buffers"? As mentioned, that really _is_ fundamentally broken
if there is any chance of signals..

Linus