Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01

From: Peter Zijlstra
Date: Sun Jul 31 2005 - 03:06:50 EST


Hi Ingo,

Still on -02;

I'm now running the thing and having some trouble fixing it.
Attached is a patch to the new CFQ iosched that I had lying about in an
old -mm port of the RT patch.

This trace is giving me a headache to fix:

BUG: scheduling with irqs disabled: IRQ 14/0x20000000/772
caller is __down_mutex+0x446/0x600
[<c0104693>] dump_stack+0x23/0x30 (20)
[<c036defc>] schedule+0xac/0x120 (28)
[<c036f086>] __down_mutex+0x446/0x600 (120)
[<c0370a48>] _spin_lock_irqsave+0x28/0x60 (28)
[<c0121220>] __wake_up+0x20/0x70 (44)
[<c013fab9>] __wake_up_bit+0x39/0x40 (24)
[<c013fae4>] wake_up_bit+0x24/0x30 (16)
[<c01790c6>] unlock_buffer+0x16/0x20 (8)
[<c0179d19>] end_buffer_async_write+0x69/0x200 (68)
[<c017d234>] end_bio_bh_io_sync+0x34/0x70 (20)
[<c017eccd>] bio_endio+0x5d/0x90 (32)
[<c02890f7>] __end_that_request_first+0xd7/0x250 (52)
[<c0289297>] end_that_request_first+0x27/0x30 (20)
[<c029f9ca>] __ide_end_request+0x6a/0x1b0 (36)
[<c029fb9a>] ide_end_request+0x8a/0xb0 (40)
[<c02a9ad0>] ide_dma_intr+0xa0/0xf0 (32)
[<c02a1ba5>] ide_intr+0xb5/0x160 (36)
[<c01544b6>] handle_IRQ_event+0x76/0x110 (52)
[<c0154d09>] do_hardirq+0x59/0x110 (44)
[<c0154e87>] do_irqd+0xc7/0x1e0 (36)
[<c013f426>] kthread+0xb6/0xf0 (48)
[<c01015f9>] kernel_thread_helper+0x5/0xc (271781916)
---------------------------
| preempt count: 20000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
.....[<c014b0d7>] .. ( <= print_traces+0x17/0x60)

------------------------------
| showing all locks held by: | (IRQ 14/772 [efef4660, 53]):
------------------------------


because end_buffer_async_read/write use bit_spin_(un)lock and I do not
know how those interact with -RT.

=-=-=-=-=-=-=-=-=-=-=-=

Also (just for fun) I enabled CONFIG_HOTPLUG_CPU and tried to offline a
cpu. This is what happened:

kstopmachine/14814[CPU#1]: BUG in __activate_idle_task at kernel/sched.c:871
[<c0104693>] dump_stack+0x23/0x30 (20)
[<c0128473>] __WARN_ON+0x63/0x80 (44)
[<c01230bd>] sched_idle_next+0xfd/0x100 (40)
[<c014c276>] take_cpu_down+0x16/0x20 (8)
[<c0153e5e>] do_stop+0x6e/0x80 (20)
[<c013f426>] kthread+0xb6/0xf0 (48)
[<c01015f9>] kernel_thread_helper+0x5/0xc (288305180)
---------------------------
| preempt count: 20000003 ]
| 3-level deep critical section nesting:
----------------------------------------
.. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
.....[<c012300d>] .. ( <= sched_idle_next+0x4d/0x100)
.. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
.....[<c0128427>] .. ( <= __WARN_ON+0x17/0x80)
.. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
.....[<c014b0d7>] .. ( <= print_traces+0x17/0x60)

------------------------------
| showing all locks held by: | (kstopmachine/14814 [ee9226a0, 0]):
------------------------------

BUG: scheduling with irqs disabled: kstopmachine/0x00000000/14814
caller is do_stop+0x3c/0x80
[<c0104693>] dump_stack+0x23/0x30 (20)
[<c036defc>] schedule+0xac/0x120 (28)
[<c0153e2c>] do_stop+0x3c/0x80 (20)
[<c013f426>] kthread+0xb6/0xf0 (48)
[<c01015f9>] kernel_thread_helper+0x5/0xc (288305180)
---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
.....[<c014b0d7>] .. ( <= print_traces+0x17/0x60)

------------------------------
| showing all locks held by: | (kstopmachine/14814 [ee9226a0, 0]):
------------------------------

CPU 1 is now offline
bash/14800[CPU#0]: BUG in dec_rt_tasks at kernel/sched.c:647
[<c0104693>] dump_stack+0x23/0x30 (20)
[<c0128473>] __WARN_ON+0x63/0x80 (44)
[<c011e01d>] dequeue_task+0x8d/0xa0 (28)
[<c011e4c4>] deactivate_task+0x24/0x40 (20)
[<c0123441>] migration_call+0xe1/0x2e0 (44)
[<c0136ad5>] notifier_call_chain+0x25/0x40 (32)
[<c014c40c>] cpu_down+0x18c/0x2d0 (52)
[<c028194f>] store_online+0x3f/0xa0 (24)
[<c027e533>] sysdev_store+0x33/0x40 (20)
[<c01b5a67>] flush_write_buffer+0x47/0x50 (32)
[<c01b5ac9>] sysfs_write_file+0x59/0x80 (32)
[<c0177d52>] vfs_write+0xe2/0x1b0 (40)
[<c0177ef0>] sys_write+0x50/0x80 (44)
[<c0103698>] sysenter_past_esp+0x61/0x89 (-4020)
---------------------------
| preempt count: 00000003 ]
| 3-level deep critical section nesting:
----------------------------------------
.. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
.....[<c0154632>] .. ( <= __do_IRQ+0xe2/0x170)
.. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
.....[<c0119c26>] .. ( <= unmask_IO_APIC_irq+0x16/0x50)
.. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
.....[<c014b0d7>] .. ( <= print_traces+0x17/0x60)

------------------------------
| showing all locks held by: | (bash/14800 [ee044660, 115]):
------------------------------

#001: [eaff1078] {(struct semaphore *)(&buffer->sem)}
... acquired at: sysfs_write_file+0x27/0x80

#002: [c03d0204] {cpucontrol.lock}
... acquired at: cpu_down+0x21/0x2d0


and a shitload of smp_processor_id in preemptible section thingies.
And as expected getting the cpu back online didn't work :-)

I haven't looked at these traces yet; but since I am writing this email
anyway I might as well include them.

Kind regards,

Peter Zijlstra
kernel: BUG: scheduling with irqs disabled: uname/0x20000000/983
kernel: caller is __down_mutex+0x446/0x600
kernel: [<c0104693>] dump_stack+0x23/0x30 (20)
kernel: [<c036defc>] schedule+0xac/0x120 (28)
kernel: [<c036f086>] __down_mutex+0x446/0x600 (120)
kernel: [<c0370888>] _spin_lock+0x28/0x50 (28)
kernel: [<c0292df4>] cfq_exit_single_io_context+0x34/0xc0 (32)
kernel: [<c0292ebc>] cfq_exit_io_context+0x3c/0x50 (24)
kernel: [<c028975c>] exit_io_context+0x8c/0xb0 (24)
kernel: [<c012ac41>] do_exit+0x411/0x460 (44)
kernel: [<c012ad2e>] do_group_exit+0x3e/0xd0 (40)
kernel: [<c012adda>] sys_exit_group+0x1a/0x20 (12)
kernel: [<c0103698>] sysenter_past_esp+0x61/0x89 (-4020)
kernel: ---------------------------
kernel: | preempt count: 20000001 ]
kernel: | 1-level deep critical section nesting:
kernel: ----------------------------------------
kernel: .. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
kernel: .....[<c014b0d7>] .. ( <= print_traces+0x17/0x60)
kernel:
kernel: ------------------------------
kernel: | showing all locks held by: | (uname/983 [ee9fa720, 114]):
kernel: ------------------------------
kernel:

--- linux-2.6.13-rc4-RT-V0.7.52-02/drivers/block/cfq-iosched.c~ 2005-07-30 20:45:57.000000000 +0200
+++ linux-2.6.13-rc4-RT-V0.7.52-02/drivers/block/cfq-iosched.c 2005-07-31 09:38:35.000000000 +0200
@@ -1376,7 +1376,7 @@
struct cfq_data *cfqd = cic->cfqq->cfqd;
request_queue_t *q = cfqd->queue;

- WARN_ON(!irqs_disabled());
+ WARN_ON_NONRT(!irqs_disabled());

spin_lock(q->queue_lock);

@@ -1400,7 +1400,7 @@
struct list_head *entry;
unsigned long flags;

- local_irq_save(flags);
+ local_irq_save_nort(flags);

/*
* put the reference this task is holding to the various queues
@@ -1411,7 +1411,7 @@
}

cfq_exit_single_io_context(cic);
- local_irq_restore(flags);
+ local_irq_restore_nort(flags);
}

static struct cfq_io_context *