On Thu, 2010-01-21 at 11:18 -0800, David Daney wrote:Steven Rostedt wrote:Peter Zijlstra and I were doing a look over of places that assign[...]
current->state = TASK_*INTERRUPTIBLE, by simply looking at places with:
$ git grep -A1 'state[[:space:]]*=[[:space:]]*TASK_[^R]'
and it seems there are quite a few places that looks like bugs. To be on
the safe side, everything outside of a run queue lock that sets the
current state to something other than TASK_RUNNING (or dead) should be
using set_current_state().
current->state = TASK_INTERRUPTIBLE;
schedule();
is probably OK, but it would not hurt to be consistent. Here's a few
examples of likely bugs:
This may be a bit off topic, but exactly which type of barrier should set_current_state() be implying?
On MIPS, set_mb() (which is used by set_current_state()) has a full mb().
Some MIPS based processors have a much lighter weight wmb(). Could wmb() be used in place of mb() here?
Nope, wmb() is not enough. Below is an explanation.
If not, an explanation of the required memory ordering semantics here would be appreciated.
I know the documentation says:
set_current_state() includes a barrier so that the write of
current->state is correctly serialised wrt the caller's subsequent
test of whether to actually sleep:
set_current_state(TASK_UNINTERRUPTIBLE);
if (do_i_need_to_sleep())
schedule();
Since the current CPU sees the memory accesses in order, what can be happening on other CPUs that would require a full mb()?
Lets look at a hypothetical situation with:
add_wait_queue();
current->state = TASK_UNINTERRUPTIBLE;
smp_wmb();
if (!x)
schedule();
Then somewhere we probably have:
x = 1;
smp_wmb();
wake_up(queue);
CPU 0 CPU 1
------------ -----------
add_wait_queue();
(cpu pipeline sees a load
of x ahead, and preloads it)
x = 1;
smp_wmb();
wake_up(queue);
(task on CPU 0 is still at
TASK_RUNNING);
current->state = TASK_INTERRUPTIBLE;
smp_wmb(); <<-- does not prevent early loading of x
if (!x) <<-- returns true
schedule();
Now the task on CPU 0 missed the wake up.
Note, places that call schedule() are not fast paths, and probably not
called often. Adding the overhead of smp_mb() to ensure correctness is a
small price to pay compared to search for why you have a stuck task that
was never woken up.
Read Documentation/memory-barriers.txt, it will be worth the time you
spend doing so.