On 3/18/08, Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:Hum, you misunderstood the point.
You are right Peter, that fs/file.c contains some leftover from previous
implementation of defer queue,
that was using a timer.
So we can probably provide a patch that :
- Use spin_lock() & spin_unlock() instead of spin_lock_bh() &
spin_unlock_bh() in free_fdtable_work()
since we dont anymore use a softirq (timer) to reschedule the workqueue.
( this timer was deleted by the following patch :
http://readlist.com/lists/vger.kernel.org/linux-kernel/50/251040.html
But, you cannot avoid use of spin_lock()/spin_unlock() because
schedule_work() makes no garantee that the work will be done by this cpu.
Ah.....u have hit the nail....and combine with Johannes Weiner's
explanation, I have pieced together the full scenario:
First, the following is possible:
fddef = &get_cpu_var(fdtable_defer_list);
spin_lock(&fddef->lock);
fdt->next = fddef->next;
fddef->next = fdt;==============>executing at CPU A
/* vmallocs are handled from the workqueue context */
schedule_work(&fddef->wq);
spin_unlock(&fddef->lock);==============>executing at CPU B
put_cpu_var(fdtable_defer_list);
where the execution can switch CPU after the schedule_work() API, then
LOGICALLY u definitely need the spin_lock(), and the per_cpu data is
really not necessary.
But without the per_cpu structure, then the following "dedicated
chunk" can only execute on one processor, with the possibility of
switching to another processor after schedule_work():
So then we introduce the per_cpu structure - so that the "dedicatedschedule_work() cannot sleep. It only queues a work to be done later by a special thread.
chunk" can be executing on multiple processor ALL AT THE SAME TIME,
without interferring each other, as fddef are per-cpu (rightfully
owned only before schedule_work() is called, but after schedule_work()
is called, an arbitrary CPU will be executing this fddef).
spin_lock() is necessary because of the possibility of CPU switch
(schedule_work()).
and per_cpu is so that the same chunk of code can be executing at
multiple CPUs all at the same time.
Now the key issue rises up - as I have just asked before u answered my question:
http://mail.nl.linux.org/kernelnewbies/2008-03/msg00236.html
can schedule_work() sleep? (just like schedule(), whcih can sleep right?)
schedule_work() is guaranteed to execute the work queue at least once,
and so this thread may or may not sleep. correct? Or wrong?
Problem is when u sleep and never wake up, then the spin_lock becomeNot exactly :) , but please continue to learn :)
permanently locked, and when later the same CPU (have to be the same
fddef CPU) is being reschedule to execute the get_cpu_var() again, it
will spin_lock() infinitely, resulting in 100% CPU utilization error.
To prevent these types of error, spin_lock are always not to be used
with to wrap around functions that can sleep, and can only containing
short routines between lock and unlock.
Is my analysis correct?