Re: Possible Oprofile crash/race when stopping

From: Robert Richter
Date: Wed Jul 28 2010 - 08:21:34 EST


On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:
> Hi folks !
>
> We've hit a strange crash internally, that we -think- we have tracked
> down to an oprofile bug. It's hard to hit, so I can't guarantee yet that
> we have fully smashed it but I'd like to share our findings in case you
> guys have a better idea.
>
> So the initial observation is a spinlock bad magic followed by a crash
> in the spinlock debug code:

Benjamin,

thanks for reporting this. I was trying to reproduce this with various
loads and scenarios, but without success so far. Can you give me a
hint of the load you have (number of processes running, cpu load, do
you switch off oprofile while many processes are still running)? Are
you able to regularly trigger it?

> I think the right sequence however requires breaking up end_sync. Ie, we
> need to do in that order:
>
> - cancel the workqueues
> - unregister the notifier
> - process the mortuary
>
> What do you think ?

This could potentially fix it, I will have to look deeper into the
code. Try to do this next week.

Thanks,

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/