Crash in schedule path after worker thread dies

From: Todd Poynor
Date: Tue Jul 19 2011 - 20:07:50 EST


After a worker thread died due to a bug in a work function, a NULL
dereference was seen when schedule calls wq_worker_sleeping calls
kthread_data:

return to_kthread(task)->data;

mm_release has apparently already set the task's vfork_done = NULL,
causing to_kthread to return a bad address (on 3.0-rc7 on ARM).

I haven't tried a fix because I'm not sure if avoiding this case is
enough to properly recover from death of a worker thread, or if this
has already been discussed and rejected in the past. I searched
around a little and found some mentions of problems in worker
functions that were probably followed by the kthread_data crash,
but didn't turn up any specific discussion of this crash. So I
thought I'd start by mentioning this here, and can help fix or test if
needed.


Todd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/