Re: Report 2 in ext4 and journal based on v5.17-rc1

From: Byungchul Park
Date: Thu Mar 03 2022 - 20:57:22 EST


On Thu, Mar 03, 2022 at 10:54:56AM +0100, Jan Kara wrote:
> On Thu 03-03-22 10:00:33, Byungchul Park wrote:
> > Unfortunately, it's neither perfect nor safe without another wakeup
> > source - rescue wakeup source.
> >
> > consumer producer
> >
> > lock L
> > (too much work queued == true)
> > unlock L
> > --- preempted
> > lock L
> > unlock L
> > do work
> > lock L
> > unlock L
> > do work
> > ...
> > (no work == true)
> > sleep
> > --- scheduled in
> > sleep
> >
> > This code leads a deadlock without another wakeup source, say, not safe.
>
> So the scenario you describe above is indeed possible. But the trick is
> that the wakeup from 'consumer' as is doing work will remove 'producer'
> from the wait queue and change the 'producer' process state to
> 'TASK_RUNNING'. So when 'producer' calls sleep (in fact schedule()), the
> scheduler will just treat this as another preemption point and the
> 'producer' will immediately or soon continue to run. So indeed we can think
> of this as "another wakeup source" but the source is in the CPU scheduler
> itself. This is the standard way how waitqueues are used in the kernel...

Nice! Thanks for the explanation. I will take it into account if needed.

> > Lastly, just for your information, I need to explain how Dept works a
> > little more for you not to misunderstand Dept.
> >
> > Assuming the consumer and producer guarantee not to lead a deadlock like
> > the following, Dept won't report it a problem:
> >
> > consumer producer
> >
> > sleep
> > wakeup work_done
> > queue work
> > sleep
> > wakeup work_queued
> > do work
> > sleep
> > wakeup work_done
> > queue work
> > sleep
> > wakeup work_queued
> > do work
> > sleep
> > ... ...
> >
> > Dept does not consider all waits preceeding an event but only waits that
> > might lead a deadlock. In this case, Dept works with each region
> > independently.
> >
> > consumer producer
> >
> > sleep <- initiates region 1
> > --- region 1 starts
> > ... ...
> > --- region 1 ends
> > wakeup work_done
> > ... ...
> > queue work
> > ... ...
> > sleep <- initiates region 2
> > --- region 2 starts
> > ... ...
> > --- region 2 ends
> > wakeup work_queued
> > ... ...
> > do work
> > ... ...
> > sleep <- initiates region 3
> > --- region 3 starts
> > ... ...
> > --- region 3 ends
> > wakeup work_done
> > ... ...
> > queue work
> > ... ...
> > sleep <- initiates region 4
> > --- region 4 starts
> > ... ...
> > --- region 4 ends
> > wakeup work_queued
> > ... ...
> > do work
> > ... ...
> >
> > That is, Dept does not build dependencies across different regions. So
> > you don't have to worry about unreasonable false positives that much.
> >
> > Thoughts?
>
> Thanks for explanation! And what exactly defines the 'regions'? When some
> process goes to sleep on some waitqueue, this defines a start of a region
> at the place where all the other processes are at that moment and wakeup of
> the waitqueue is an end of the region?

Yes. Let me explain it more for better understanding.
(I copied it from the talk I did with Matthew..)


ideal view
-----------
context X context Y

request event E ...
write REQUESTEVENT when (notice REQUESTEVENT written)
... notice the request from X [S]

--- ideally region 1 starts here
wait for the event ...
sleep if (can see REQUESTEVENT written)
it's on the way to the event
...
...
--- ideally region 1 ends here

finally the event [E]

Dept basically works with the above view with regard to wait and event.
But it's very hard to identify the ideal [S] point in practice. So Dept
instead identifies [S] point by checking WAITSTART with memory barriers
like the following, which would make Dept work conservatively.


Dept's view
------------
context X context Y

request event E ...
write REQUESTEVENT when (notice REQUESTEVENT written)
... notice the request from X

--- region 2 Dept gives up starts
wait for the event ...
write barrier
write WAITSTART read barrier
sleep when (notice WAITSTART written)
ensure the request has come [S]

--- region 2 Dept gives up ends
--- region 3 starts here
...
if (can see WAITSTART written)
it's on the way to the event
...
...
--- region 3 ends here

finally the event [E]

In short, Dept works with region 3.

Thanks,
Byungchul