tg_load_down NULL pointer dereference

From: AMIT NAGAL
Date: Mon Jun 20 2016 - 02:15:13 EST


Hi
I am using Linux kernel version 3.10.28 (ARM platform) .
I am getting NULL pointer dereference in tg_load_down() .
At the time of error , tg->parent->cfs_rq value is 0 and tg->se value is 0x00000400 . ( refer to backtrace in 5) ).

1)
Problematic statement is in line 5814 in tg_load_down() :
line 5814 :: load = tg->parent->cfs_rq[cpu]->h_load;
tg->parent->cfs_rq value is 0 (register r2) due to which null dereference error comes .

line 5815 :: load *= tg->se[cpu]->load.weight;
tg->se pointer value value is 0x00000400 (register r3) . (tg->se is computed early , refer disas below ).

PC = 0x0xc00741b0 at the time of NULL dereference error .
(gdb) list *(0xc00741b0)
0xc00741b0 is in tg_load_down (kernel/sched/fair.c:5814).
5809 long cpu = (long)data;
5810
5811 if (!tg->parent) {
5812 load = cpu_rq(cpu)->load.weight;
5813 } else {
5814 load = tg->parent->cfs_rq[cpu]->h_load;
5815 load *= tg->se[cpu]->load.weight;
5816 load /= tg->parent->cfs_rq[cpu]->load.weight + 1;
5817 }

(gdb) disas tg_load_down
Dump of assembler code for function tg_load_down:
0xc007418c <+0>: mov r12, sp
0xc0074190 <+4>: push {r11, r12, lr, pc}
0xc0074194 <+8>: sub r11, r12, #4
0xc0074198 <+12>: ldr r3, [r0, #80] ; 0x50
0xc007419c <+16>: cmp r3, #0
0xc00741a0 <+20>: beq 0xc00741e8 <tg_load_down+92>
0xc00741a4 <+24>: ldr r2, [r3, #36] ; 0x24
0xc00741a8 <+28>: lsl lr, r1, #2
0xc00741ac <+32>: ldr r3, [r0, #32]
0xc00741b0 <+36>: ldr r12, [r2, r1, lsl #2]

2)
tg_lock_group() first argument ( struct task_group *tg) is stored in r0 .
Line 5811 : if (!tg->parent) {
so first , tg->parent is stored in r3 .
c0074198: e5903050 ldr r3, [r0, #80] ; 0x50

and tg->parent is checked for NULL .
c007419c: e3530000 cmp r3, #0

Line 5814 : load = tg->parent->cfs_rq[cpu]
c00741a4: e5932024 ldr r2, [r3, #36] ; 0x24
here tg->parent->cfs_rq is stored in r2 .

Line 5815 : load *= tg->se[cpu]
c00741ac: e5903020 ldr r3, [r0, #32]
here tg->se is stored in r3 .

both tg->parent->cfs_rq and tg->se are double pointers .
however when we see r2 ( tg->parent->cfs_rq) and r3 (tg->se) values in register dumps at the time of kernel crash , they have these values as shown in backtrace below in 5).
r2=00000000 r3 = 00000400

after this , ldr r12, [r2, r1, lsl #2] is executed which causes kernel crash with NULL pointer dereference error as r2 value is 0

3)
rcu lock protection is already there while tg_load_down is executing .

static void update_h_load(long cpu)
{
struct rq *rq = cpu_rq(cpu);
unsigned long now = jiffies;

if (rq->h_load_throttle == now)
return;

rq->h_load_throttle = now;

rcu_read_lock();
walk_tg_tree(tg_load_down, tg_nop, (void *)cpu);
rcu_read_unlock();
}

4)relevant Backtrace related to problem is as follows :
pc : [<c00741b0> ( tg_load_down + 36 )] lr : [<00000008>] psr: a0070093
ip : dc4c3d08 fp : dc4c3d04
r10: c047e418 r9 : c007418c r8 : 00000000
r7 : c006f44c r6 : c069cb28 r5 : 00000002 r4 : d5d53a08
r3 : 00000400 r2 : 00000000 r1 : 00000002 r0 : d5d53a08

Function entered at [<c007418c>](tg_load_down) from [<c006f3c0>](walk_tg_tree_from + 48)
Function entered at [<c006f390>](walk_tg_tree_from) from [< c007a694>](load_balance +668)

5) Is there any scenario by which tg->parent->cfs_rq be 0 (r2) and tg->se can get corrupted to value 00000400 (r3).

Regards
Amit Nagal