Re: Fw: Badness around put_cred()

From: Robert ÅwiÄcki
Date: Thu Jan 26 2012 - 18:18:38 EST


On Thu, Jan 26, 2012 at 5:28 PM, Serge Hallyn
<serge.hallyn@xxxxxxxxxxxxx> wrote:
> Thanks, Andrew, I missed this in the list. Âcc:ing David as he seems
> the most likely to know offhand what's going on.
>
> Robert, (please feel free to add lkml back to cc:) can you tell any
> more about exactly what/how you were fuzzing?

I'm fuzzing with http://code.google.com/p/iknowthis/ - it's a general
syscall fuzzer.

So, the problerm is that the counter reaches -1 in the cred structure.
I added code there (in put_cred()) which BUG_ON in the following code.

312 static inline void put_cred(const struct cred *_cred)
313 {
314 struct cred *cred = (struct cred *) _cred;
315
316 int r = atomic_dec_if_positive(&(cred)->usage);
...
... my_code_for_stack_dumping
...
342 BUG_ON(r == -1);
343
344 if (r == 0)
345 __put_cred(cred);
346 }

I also added a code which registers stacktrace every counter
increas/decrease. I reserved 127 slots there (for PUT/GET) to find out
why number of get_cred()'s doesn't match number of put_cred()'s and
when it crashes all those slots are used, I must add them to find out
what's going on really. All in all, the stack dump is here:

http://alt.swiecki.net/linux_kernel/stacktrace_3.2-cred.txt

under GET there are stacktraces of all places where the cred counter
is increased, in PUT all places where it decreased. Unfortunately all
slots (128) are used, so it's not a full dump (just first 127 counter
increases/decreases).

Short kgdb session:
(gdb) target remote /dev/ttyS0
Remote debugging using /dev/ttyS0
put_cred (_cred=0xffff880054bb2180) at include/linux/cred.h:342
342 BUG_ON(r == -1);

(gdb) bt
#0 put_cred (_cred=0xffff880054bb2180) at include/linux/cred.h:342
#1 0xffffffff810c31c6 in exit_creds (tsk=0xffff88006b8ca700) at
kernel/cred.c:199
#2 0xffffffff8109dad6 in __put_task_struct (tsk=0xffff88006b8ca700)
at kernel/fork.c:192
#3 0xffffffff810a0214 in put_task_struct (t=0x286) at
include/linux/sched.h:1753
#4 0xffffffff810a245d in delayed_put_task_struct (rhp=<optimized
out>) at kernel/exit.c:162
#5 0xffffffff810ffca6 in __rcu_reclaim (rn=<optimized out>,
head=<optimized out>) at kernel/rcu.h:81
#6 rcu_do_batch (rdp=<optimized out>, rsp=<optimized out>) at
kernel/rcutree.c:1292
#7 invoke_rcu_callbacks (rdp=<optimized out>, rsp=<optimized out>) at
kernel/rcutree.c:1577
#8 __rcu_process_callbacks (rsp=0xffffffff82c3c500,
rdp=0xffff88012bc4e960) at kernel/rcutree.c:1549
#9 0xffffffff810ffe89 in rcu_process_callbacks (unused=<optimized
out>) at kernel/rcutree.c:1558
#10 0xffffffff810a4624 in __do_softirq () at kernel/softirq.c:238
#11 0xffffffff8206283c in ?? () at arch/x86/kernel/entry_64.S:1205
#12 0xffffffff8104c2e3 in do_softirq () at arch/x86/kernel/irq_64.c:83
#13 0xffffffff810a42f8 in invoke_softirq () at kernel/softirq.c:329
#14 irq_exit () at kernel/softirq.c:348
#15 0xffffffff820631ef in smp_apic_timer_interrupt (regs=<optimized
out>) at arch/x86/kernel/apic/apic.c:882
#16 0xffffffff820610b3 in ?? () at arch/x86/kernel/entry_64.S:973


(gdb) p *(struct cred*)$rbx
$6 = {usage = {counter = 0}, uid = 1003, gid = 1003, suid = 1003, sgid
= 1003, euid = 1003, egid = 1003, fsuid = 1003, fsgid = 1003,
securebits = 0, cap_inheritable = {cap = {0, 0}}, cap_permitted = {cap
= {0, 0}}, cap_effective = {cap = {0, 0}}, cap_bset = {cap =
{4294967295,
4294967295}}, jit_keyring = 0 '\000', thread_keyring = 0x0,
request_key_auth = 0x0, tgcred = 0xffff88004333e940, security =
0xffffffff83164a10, user = 0xffff8801214aa140, user_ns =
0xffffffff82c262f0, group_info = 0xffff8801097355c0, rcu = {next =
0x0,
func = 0xffffffff810c3590 <put_cred_rcu>}

The counter is 0, cause I used
atomic_dec_if_positive(&(cred)->usage); - after that it would be -1.

[1]kdb> summary
sysname Linux
release 3.2.0
version #10 SMP Wed Jan 25 15:15:41 CET 2012
machine x86_64
nodename ise-test
domainname (none)
ccversion CCVERSION
date 2012-01-26 22:51:34 tz_minuteswest -60
uptime 1 day 07:41
load avg 0.65 0.22 0.14

MemTotal: 992228 kB
MemFree: 652233 kB
Buffers: 34076 kB


I'll increase the number of slots for stack saving in the cred
structure, and will try to repeat it tomorrow.

> thanks,
> -serge
>
> Quoting Andrew Morton (akpm@xxxxxxxxxxxxxxxxxxxx):
>>
>> your code broke ;)
>>
>>
>> Begin forwarded message:
>>
>> Date: Tue, 17 Jan 2012 16:55:41 +0100
>> From: Robert ÅwiÄcki <robert@xxxxxxxxxxx>
>> To: linux-kernel@xxxxxxxxxxxxxxx
>> Subject: Badness around put_cred()
>>
>>
>> Hi,
>>
>> I was fuzzing linux kernel for some time, and there seems to be a bug,
>> which kicks in relatively quickly (a few hours at most), which ends up
>> with warn() or panic() - depending on options compiled in
>> (CONFIG_DEBUG_CREDENTIALS, preemption mode). I was looking briefly
>> through kernel code, and I think it might be related to the
>> include/linux/cred.h::
>>
>> static inline void put_cred(const struct cred *_cred)
>> {
>> Â Â Â Â struct cred *cred = (struct cred *) _cred;
>>
>> Â Â Â Â validate_creds(cred);
>> Â Â Â Â if (atomic_dec_and_test(&(cred)->usage))
>> Â Â Â Â Â Â Â Â __put_cred(cred);
>> }
>>
>> which checks whether the usage counter is different than 0, and maybe
>> it should be checking whether it is >0.
>>
>> All in all, I don't understand the whole cred/rcu code yet, so just
>> dumping the data, in case somebody else can spot the problem quicker.
>> The kernel versions are 2.6.39 and 3.2
>>
>> Config: CONFIG_DEBUG_CREDENTIALS=n, CONFIG_PREEMPT_RCU=n,
>> CONFIG_PREEMPT=n, kernel=3.2
>>
>> (gdb) bt
>> #0 Âkgdb_breakpoint () at kernel/debug/debug_core.c:960
>> #1 Âkgdb_panic_event (self=0x286, val=0, data=0x0) at
>> kernel/debug/debug_core.c:766
>> #2 Â0xffffffff810c1eff in notifier_call_chain (nl=<optimized out>,
>> val=0, v=0xffffffff8301a9b0, nr_to_call=-1, nr_calls=<optimized out>)
>> at kernel/notifier.c:93
>> #3 Â0xffffffff810c1f57 in __atomic_notifier_call_chain
>> (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized
>> out>, val=<optimized out>, nh=<optimized out>) at
>> kernel/notifier.c:190
>> #4 Âatomic_notifier_call_chain (nh=<optimized out>, val=0, v=0x0) at
>> kernel/notifier.c:191
>> #5 Â0xffffffff8205abc5 in panic (fmt=0xffffffff829c20f5 "CRED:
>> put_cred_rcu() sees %p with usage %d\n") at kernel/panic.c:100
>> #6 Â0xffffffff810c3365 in put_cred_rcu (rcu=0xffff880118552b88) at
>> kernel/cred.c:139
>> #7 Â0xffffffff810ffa52 in __rcu_reclaim (rn=<optimized out>,
>> head=<optimized out>) at kernel/rcu.h:81
>> #8 Ârcu_do_batch (rdp=<optimized out>, rsp=<optimized out>) at
>> kernel/rcutree.c:1292
>> #9 Âinvoke_rcu_callbacks (rdp=<optimized out>, rsp=<optimized out>) at
>> kernel/rcutree.c:1577
>> #10 __rcu_process_callbacks (rsp=0xffffffff82c34500,
>> rdp=0xffff88012bc4e960) at kernel/rcutree.c:1549
>> #11 0xffffffff810ffc35 in rcu_process_callbacks (unused=<optimized
>> out>) at kernel/rcutree.c:1558
>> #12 0xffffffff810a4630 in __do_softirq () at kernel/softirq.c:238
>> #13 0xffffffff820602fc in ?? () at arch/x86/kernel/entry_64.S:1205
>> #14 0xffffffff8104c2e3 in do_softirq () at arch/x86/kernel/irq_64.c:83
>> #15 0xffffffff810a4304 in invoke_softirq () at kernel/softirq.c:329
>> #16 irq_exit () at kernel/softirq.c:348
>> #17 0xffffffff82060caf in smp_apic_timer_interrupt (regs=<optimized
>> out>) at arch/x86/kernel/apic/apic.c:882
>> #18 0xffffffff8205eb73 in ?? () at arch/x86/kernel/entry_64.S:973
>> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>> (gdb) up
>> #1 Âkgdb_panic_event (self=0x286, val=0, data=0x0) at
>> kernel/debug/debug_core.c:766
>> 766 Â Â Â Â Â kgdb_breakpoint();
>> (gdb) up
>> #2 Â0xffffffff810c1eff in notifier_call_chain (nl=<optimized out>,
>> val=0, v=0xffffffff8301a9b0, nr_to_call=-1, nr_calls=<optimized out>)
>> at kernel/notifier.c:93
>> 93 Â Â Â Â Â Â Â Â Â Âret = nb->notifier_call(nb, val, v);
>> (gdb) up
>> #3 Â0xffffffff810c1f57 in __atomic_notifier_call_chain
>> (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized
>> out>, val=<optimized out>, nh=<optimized out>) at
>> kernel/notifier.c:190
>> 190 Â {
>> (gdb) up
>> #4 Âatomic_notifier_call_chain (nh=<optimized out>, val=0, v=0x0) at
>> kernel/notifier.c:191
>> 191 Â Â Â Â Â return __atomic_notifier_call_chain(nh, val, v, -1, NULL);
>> (gdb)
>> #5 Â0xffffffff8205abc5 in panic (fmt=0xffffffff829c20f5 "CRED:
>> put_cred_rcu() sees %p with usage %d\n") at kernel/panic.c:100
>> 100 Â Â Â Â Â atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>> (gdb)
>> #6 Â0xffffffff810c3365 in put_cred_rcu (rcu=0xffff880118552b88) at
>> kernel/cred.c:139
>> 139 Â Â Â Â Â Â Â Â Â panic("CRED: put_cred_rcu() sees %p with usage %d\n",
>> (gdb) list
>> 134 Â Â Â Â Â Â Â Â Â Â Â Â cred, cred->magic, cred->put_addr,
>> 135 Â Â Â Â Â Â Â Â Â Â Â Â atomic_read(&cred->usage),
>> 136 Â Â Â Â Â Â Â Â Â Â Â Â read_cred_subscribers(cred));
>> 137 Â #else
>> 138 Â Â Â Â Â if (atomic_read(&cred->usage) != 0)
>> 139 Â Â Â Â Â Â Â Â Â panic("CRED: put_cred_rcu() sees %p with usage %d\n",
>> 140 Â Â Â Â Â Â Â Â Â Â Â Â cred, atomic_read(&cred->usage));
>> 141 Â #endif
>> 142
>> 143 Â Â Â Â Â security_cred_free(cred);
>> (gdb) p cred
>> $1 = (struct cred *) 0xffff880118552b00
>> (gdb) p *cred
>> $2 = {usage = {counter = -1}, uid = 1001, gid = 1001, suid = 1001,
>> sgid = 1001, euid = 1001, egid = 1001, fsuid = 1001, fsgid = 1001,
>> securebits = 0, cap_inheritable = {cap = {0, 0}}, cap_permitted = {cap
>> = {0, 0}}, cap_effective = {
>> Â Â cap = {0, 0}}, cap_bset = {cap = {4294967295, 4294967295}},
>> jit_keyring = 0 '\000', thread_keyring = 0x0, request_key_auth = 0x0,
>> tgcred = 0xffff880116a3ed80, security = 0xffffffff8315ca10, user =
>> 0xffff880122242a40,
>> Â user_ns = 0xffffffff82c262f0, group_info = 0xffff88011a5d9240, rcu =
>> {next = 0xffff880111a4e348, func = 0xffffffff810c332e <put_cred_rcu>}}
>>
>>
>>
>>
>>
>>
>>
>> Config: CONFIG_DEBUG_CREDENTIALS=y, CONFIG_PREEMPT_RCU=y,
>> CONFIG_PREEMPT=y, kernel=2.6.39
>>
>> =====================================================================
>> KDB
>> =====================================================================
>>
>> <3>[49754.391126] CRED: Invalid credentials
>> <3>[49754.394811] CRED: At include/linux/cred.h:260
>> <3>[49754.399181] CRED: Specified credentials: ffff8801156c8b00
>> <3>[49754.404676] CRED: ->magic=44656144, put_addr=ffffffff81166dc5
>> <3>[49754.410430] CRED: ->usage=0, subscr=0
>> <3>[49754.414102] CRED: ->*uid = { 65534,65534,65534,65534 }
>> <3>[49754.419250] CRED: ->*gid = { 65534,65534,65534,65534 }
>> <3>[49754.424397] CRED: ->security is ffff8801113069c0
>> <3>[49754.429021] CRED: ->security {1, 1}
>>
>> [1]kdb> bt
>> Stack traceback for pid 16009
>> 0xffff880115819770 Â Â16009 Â Â 8897 Â1 Â Â1 Â R Â0xffff880115819bf0 *iknowthis2
>> <c> ffff88012bc43e00<c> 0000000000000000<c> ffff880100000104<c>
>> ffffffff8267436f<c>
>> <c> ffffffff81b611cd<c> ffff8801156c8b00<c> ffff88012bc43e40<c>
>> ffffffff81166d8e<c>
>> <c> ffff8801156c8b00<c> ffff880121e13540<c> ffff88012bc43e60<c>
>> ffffffff81166db3<c>
>> Call Trace:
>> Â<IRQ> Â[<ffffffff81b611cd>] ? wq_free_rcu+0x12/0x14
>> Â[<ffffffff81166d8e>] ? __validate_creds.clone.9+0x2d/0x32
>> Â[<ffffffff81166db3>] ? file_free_rcu+0x20/0x46
>> Â[<ffffffff810eeb82>] ? __rcu_process_callbacks+0x18d/0x2af
>> Â[<ffffffff810eed24>] ? rcu_process_callbacks+0x80/0x87
>> Â[<ffffffff8109c013>] ? __do_softirq+0xeb/0x1cc
>> Â[<ffffffff81044977>] ? native_sched_clock+0x35/0x37
>> Â[<ffffffff810b67dd>] ? sched_clock_local+0x12/0x75
>> Â[<ffffffff81edbd9c>] ? call_softirq+0x1c/0x30
>> Â[<ffffffff81040510>] ? do_softirq+0x4b/0x9f
>> Â[<ffffffff8109c380>] ? irq_exit+0x5f/0xb6
>> Â[<ffffffff81edc6d1>] ? smp_apic_timer_interrupt+0x7d/0x8b
>> Â[<ffffffff81edb553>] ? apic_timer_interrupt+0x13/0x20
>> Â<EOI> Â[<ffffffff81093f75>] ? dup_mm+0x1f2/0x468
>> Â[<ffffffff81151cdb>] ? arch_local_irq_restore+0x6/0xd
>> Â[<ffffffff81154ceb>] ? __slab_alloc.clone.36+0xf0/0x38b
>> Â[<ffffffff81093f75>] ? dup_mm+0x1f2/0x468
>> Â[<ffffffff81155143>] ? kmem_cache_alloc+0x4a/0xe7
>> Â[<ffffffff8135b0ce>] ? selinux_vm_enough_memory+0x48/0x4d
>> Â[<ffffffff81093f75>] ? dup_mm+0x1f2/0x468
>> Â[<ffffffff81094c56>] ? copy_process+0xa3e/0x1230
>> Â[<ffffffff81095592>] ? do_fork+0x10f/0x29d
>> Â[<ffffffff813d8dba>] ? trace_hardirqs_off_thunk+0x3a/0x6c
>> Â[<ffffffff8107e5b9>] ? sys32_clone+0x26/0x28
>> Â[<ffffffff81edc585>] ? ia32_ptregs_common+0x25/0x4b
>>
>> [dumpcommon]kdb> Â -summary
>>
>> sysname  ÂLinux
>> release  Â2.6.39
>> version  Â#3 SMP PREEMPT Fri May 27 15:27:03 CEST 2011
>> machine  Âx86_64
>> nodename  ise-test
>> domainname (none)
>> ccversion ÂCCVERSION
>> date    2011-05-28 03:20:03 tz_minuteswest -120
>> uptime   13:49
>> load avg  19.38 20.17 22.96
>>
>> MemTotal: Â Â Â Â 993059 kB
>> MemFree: Â Â Â Â Â458493 kB
>> Buffers: Â Â Â Â Â 23981 kB
>>
>>
>> =====================================================================
>> KGDB
>> =====================================================================
>>
>> (gdb) bt
>> #0 Â__invalid_creds (cred=0xffff8801156c8b00, file=<value optimized
>> out>, line=<value optimized out>)
>> Â Â at kernel/cred.c:812
>> #1 Â0xffffffff81166d8e in __validate_creds (cred=0xffff8801156c8b00, line=260,
>> Â Â file=0xffffffff8267436f "include/linux/cred.h") at include/linux/cred.h:186
>> #2 Â0xffffffff81166db3 in put_cred (head=<value optimized out>) at
>> include/linux/cred.h:260
>> #3 Âfile_free_rcu (head=<value optimized out>) at fs/file_table.c:49
>> #4 Â0xffffffff810eeb82 in rcu_do_batch (rsp=0xffffffff82a2f500,
>> rdp=0xffff88012bc502f0) at kernel/rcutree.c:1146
>> #5 Â__rcu_process_callbacks (rsp=0xffffffff82a2f500,
>> rdp=0xffff88012bc502f0) at kernel/rcutree.c:1386
>> #6 Â0xffffffff810eed24 in rcu_preempt_process_callbacks (unused=<value
>> optimized out>) at kernel/rcutree_plugin.h:544
>> #7 Ârcu_process_callbacks (unused=<value optimized out>) at
>> kernel/rcutree.c:1404
>> #8 Â0xffffffff8109c013 in __do_softirq () at kernel/softirq.c:238
>> #9 Â0xffffffff81edbd9c in ?? () at arch/x86/kernel/entry_64.S:1210
>> #10 0xffffffff81040510 in do_softirq () at arch/x86/kernel/irq_64.c:80
>> #11 0xffffffff8109c380 in invoke_softirq () at kernel/softirq.c:325
>> #12 irq_exit () at kernel/softirq.c:340
>> #13 0xffffffff81edc6d1 in smp_apic_timer_interrupt (regs=<value
>> optimized out>) at arch/x86/kernel/apic/apic.c:861
>> #14 <signal handler called>
>> #15 0x00cf9b000000ffff in __brk_reservation_fn_dmi_alloc__ ()
>> Cannot access memory at address 0xcffb000000ffff
>>
>> Cannot access memory at address 0xcffb000000ffff
>> (gdb) up
>> #1 Â0xffffffff81166d8e in __validate_creds (cred=0xffff8801156c8b00, line=260,
>> Â Â file=0xffffffff8267436f "include/linux/cred.h") at include/linux/cred.h:186
>> 186 Â Â Â Â Â Â Â Â Â __invalid_creds(cred, file, line);
>> (gdb) p *cred
>> $1 = {usage = {counter = 0}, subscribers = {counter = 0}, put_addr =
>> 0xffffffff81166dc5, magic = 1147494724,
>> Â uid = 65534, gid = 65534, suid = 65534, sgid = 65534, euid = 65534,
>> egid = 65534, fsuid = 65534, fsgid = 65534,
>> Â securebits = 0, cap_inheritable = {cap = {0, 0}}, cap_permitted =
>> {cap = {0, 0}}, cap_effective = {cap = {0, 0}},
>> Â cap_bset = {cap = {4294967295, 4294967295}}, jit_keyring = 0 '\000',
>> thread_keyring = 0x0, request_key_auth = 0x0,
>> Â tgcred = 0xffff88011492b088, security = 0xffff8801113069c0, user =
>> 0xffff880121c4b000, user_ns = 0xffffffff82a21a80,
>> Â group_info = 0xffff880104cec420, rcu = {next = 0x0, func =
>> 0xffffffff810b6c97 <put_cred_rcu>}}
>>
>> (gdb) p (char[4])cred->magic
>> $8 = "DaeD"
>>
>>
>>
>> --
>> Robert ÅwiÄcki
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at Âhttp://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at Âhttp://www.tux.org/lkml/



--
Robert ÅwiÄcki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/