Re: frequent lockups in 3.18rc4

From: John Stultz
Date: Mon Jan 05 2015 - 20:18:04 EST


On Sun, Jan 4, 2015 at 11:46 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Jan 2, 2015 at 4:27 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote:
>>
>> So I sent out a first step validation check to warn us if we end up
>> with idle periods that are larger then we expect.
>
> .. not having tested it, this is just from reading the patch, but it
> would *seem* that it doesn't actually validate the clock reading much
> at all.
>
> Why? Because most of the time, for crap clocks like HPET, the real
> limitation will be not the multiplication overflow, but the "mask",
> which is just 32-bit (or worse - I think the ACPI PM timer might be
> just 24 bits).
>
> So then you effectively "validate" that the timer difference value
> fits in mask, but that isn't any validation at all - it's just a
> truism. Since we by definition mask the difference to just the valid
> bitmask.
>
> So I really think that the maximum valid clock needs to be narrowed
> down from the "technically, this clock can count to X".
>
> But maybe I'm wrong, and the multiplication overflow is actually often
> the real limit. What are the actual values for real timer sources?


As you point out, for clocksources where the mask is 32bits or under,
we shouldn't have any risk of multiplication overflow, since mult is a
32bit. So yes, the max_cycles on those probably should be the same as
the mask, and isn't useful on those clocksources to test if we run
over (though warning if we're within the 12% margin could be useful).
But for clocksources who have larger masks, it still could be a useful
check (@2ghz tscs overflow 32bits in 2 seconds), although the mult
value is targets an mult overflow at ~10 minutes, so its less likely
that we really hit it.

However, it ends up the calculations we use are a little more
conservative, treating the result as signed and so they avoid
multiplications that could run into a that sign bit. This looks like
an error to me (the code is also used for clockevents, so I haven't
run through to see if the requirements are different there), but a
conservative one, which results in the maximum idle interval to be
~halved what they could be (and we add yet another 12% margin on that
- so we probably need to just pick one or the other, not both).

So even on 32bit masks, max_cycles in my patch is smaller then 32bits.
That's why I was able to hit both warnings in my testing with the hpet
by sending SIGSTOP to qemu.

Anyway, It may be worth keeping the 50% margin (and dropping the 12%
reduction to simplify things), since I've not heard recent complaints
about timekeeping limiting idle lengths (but could be wrong here).
This would give you something closer to the 1/8th of the mask that you
were using in your patch (and on larger mask clocksources, we do
already cap the interval at 10 minutes - so most really crazy values
would be caught for clocksources like the TSC - and maybe we can make
this more configurable so we can shorten it as done in your patch to
try to debug things).

I've also got a capping patch that I'm testing that keeps time reads
from passing that interval. The only thing I'm really cautious about
with that change is that we have to make sure the hrtimer that
triggers update_wall_clock is always set to expire within that cap (I
need to review it again) or else we'll hang ourselves.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/