Re: [PATCH] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times

From: Rafael J. Wysocki
Date: Wed Aug 17 2016 - 21:04:20 EST


On Wednesday, August 17, 2016 12:18:50 PM John Stultz wrote:
> It was reported that hibernation could fail on the 2nd attempt,
> where the system hangs at hibernate() -> syscore_resume() ->
> i8237A_resume() -> claim_dma_lock(), because the lock has
> already been taken.
>
> However there is actually no other process would like to grab
> this lock on that problematic platform.
>
> Further investigation showed that the problem is triggered by
> setting /sys/power/pm_trace to 1 before the 1st hibernation.
>
> Since once pm_trace is enabled, the rtc becomes unmeaningful
> after suspend, and meanwhile some BIOSes would like to adjust
> the 'invalid' tsc(e.g, smaller than 1970) to the release date
> of that motherboard during POST stage, thus after resumed, it
> may seem that the system had a significant long sleep time might
> due to meaningless tsc or RTC delta.
>
> Then in timekeeping_resume -> tk_debug_account_sleep_time, if
> the bit31 of the sleep time happened to be set to 1, the fls
> returns 32 and then we add 1 to sleep_time_bin[32], which
> caused a memory overwritten.
>
> As depicted by System.map:
> ffffffff81c9d080 b sleep_time_bin
> ffffffff81c9d100 B dma_spin_lock
> the dma_spin_lock.val is set to 1, which caused this problem.
>
> This patch adds a sanity check in tk_debug_account_sleep_time()
> to ensure we don't index past the sleep_time_bin array.
>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Rafael J. Wysocki <rjw@xxxxxxxxxxxxx>
> Cc: Janek Kozicki <cosurgi@xxxxxxxxx>
> Cc: Chen Yu <yu.c.chen@xxxxxxxxx>
> Cc: Xunlei Pang <xpang@xxxxxxxxxx>
> Cc: Zhang Rui <rui.zhang@xxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Cc: linux-pm@xxxxxxxxxxxxxxx
> Reported-by: Janek Kozicki <cosurgi@xxxxxxxxx>
> Reported-by: Chen Yu <yu.c.chen@xxxxxxxxx>
> [jstultz: Problem diagnosed and original patch by Chen Yu, I've
> solved the issue slightly differently, but borrowed his excelent
> explanation of of the issue here.]
> Signed-off-by: John Stultz <john.stultz@xxxxxxxxxx>
> ---
> kernel/time/timekeeping_debug.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
> index f6bd652..107310a6 100644
> --- a/kernel/time/timekeeping_debug.c
> +++ b/kernel/time/timekeeping_debug.c
> @@ -23,7 +23,9 @@
>
> #include "timekeeping_internal.h"
>
> -static unsigned int sleep_time_bin[32] = {0};
> +#define NUM_BINS 32
> +
> +static unsigned int sleep_time_bin[NUM_BINS] = {0};
>
> static int tk_debug_show_sleep_time(struct seq_file *s, void *data)
> {
> @@ -69,6 +71,9 @@ late_initcall(tk_debug_sleep_time_init);
>
> void tk_debug_account_sleep_time(struct timespec64 *t)
> {
> - sleep_time_bin[fls(t->tv_sec)]++;
> + /* Cap bin index so we don't overflow the array */
> + int bin = min(fls(t->tv_sec), NUM_BINS-1);
> +
> + sleep_time_bin[bin]++;
> }
>
>


If pm_trace_enabled is set, we can (or maybe even should) just skip
timekeeping_inject_sleeptime() entirely in rtc_resume() at least, because
sleep_time is almost certainly bogus in that case, even if it doesn't
overflow.

Of course, the above is still needed then.

Thanks,
Rafael