Re: divide error: bdi_dirty_limit+0x5a/0x9e

From: Srivatsa S. Bhat
Date: Mon Sep 24 2012 - 14:55:13 EST


On 09/24/2012 06:26 PM, Borislav Petkov wrote:
> On Mon, Sep 24, 2012 at 08:29:00PM +0800, Fengguang Wu wrote:
>> On Mon, Sep 24, 2012 at 02:20:53PM +0200, Borislav Petkov wrote:
>>> On Mon, Sep 24, 2012 at 07:34:47PM +0800, Fengguang Wu wrote:
>>>> Will you test such a line? At least the generic do_div() only uses the
>>>> lower 32bits for division.
>>>>
>>>> WARN_ON(!(den & 0xffffffff));
>>>
>>> But, but, the asm output says:
>>>
>>> 28: 48 89 c8 mov %rcx,%rax
>>> 2b:* 48 f7 f7 div %rdi <-- trapping instruction
>>> 2e: 31 d2 xor %edx,%edx
>>>
>>> and this version of DIV does an unsigned division of RDX:RAX by the
>>> contents of a *64-bit register* ... in our case %rdi.
>>>
>>> Srivatsa's oops shows the same:
>>>
>>> 28: 48 89 f0 mov %rsi,%rax
>>> 2b:* 48 f7 f7 div %rdi <-- trapping instruction
>>> 2e: 41 8b 94 24 74 02 00 mov 0x274(%r12),%edx
>>>
>>> Right?
>>
>> Right, that's why I said "at least". As for x86, I'm as clueless as you..
>
> Right, both oopses are on x86 so I don't think it is the bitness of the
> division.
>
> Another thing those two have in common is that both happen when a CPU
> comes online. Srivatsa's is when CPU9 comes online (oops is detected on
> CPU9) and in our case CPU4 comes online but the oops says CPU0.
>

I had posted another dump from one of my tests. That one triggers while
offlining a CPU (CPU 9).

https://lkml.org/lkml/2012/9/14/235

> So it has to be hotplug-related.

Regards,
Srivatsa S. Bhat


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/