pagecache 2.3.7

Andrea Arcangeli (andrea@suse.de)
Sat, 19 Jun 1999 19:39:53 +0200 (CEST)


I am experiencing deadlocks due a race in ___wait_on_page:

void ___wait_on_page(struct page *page)
{
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk);

add_wait_queue(&page->wait, &wait);
tsk->state = TASK_UNINTERRUPTIBLE;
run_task_queue(&tq_disk);
if (PageLocked(page)) {
do {
tsk->state = TASK_UNINTERRUPTIBLE;
run_task_queue(&tq_disk);
*** here must check if the page is locked before
go to sleep after we set the state to
task uninterruptible ***
schedule();
} while (PageLocked(page));
}
tsk->state = TASK_RUNNING;
remove_wait_queue(&page->wait, &wait);
}

Here it is a fix for the race (against pre-9):

--- linux/mm/filemap.c Sat Jun 19 16:06:22 1999
+++ /tmp/filemap.c Sat Jun 19 19:15:13 1999
@@ -522,15 +522,13 @@
DECLARE_WAITQUEUE(wait, tsk);

add_wait_queue(&page->wait, &wait);
- tsk->state = TASK_UNINTERRUPTIBLE;
- run_task_queue(&tq_disk);
- if (PageLocked(page)) {
- do {
- tsk->state = TASK_UNINTERRUPTIBLE;
- run_task_queue(&tq_disk);
- schedule();
- } while (PageLocked(page));
- }
+ do {
+ run_task_queue(&tq_disk);
+ tsk->state = TASK_UNINTERRUPTIBLE;
+ if (!PageLocked(page))
+ break;
+ schedule();
+ } while (PageLocked(page));
tsk->state = TASK_RUNNING;
remove_wait_queue(&page->wait, &wait);
}

Then I had also a different trouble, I got this message on the
test-machine while logging out from rsh.

hm, no brw_page(%p) because IO already started.

And after that obviously all tasks that tried to access the page
deadlocked (since the I/O got not started and the page was locked).

Now I changed the code to do:

[..]
if (page->buffers)
goto just_read;
[..]
return 0;

just_read:
printk("hm, no brw_page(%p) because IO already started.\n", page);
BUG();
return 1;

Now I'll try to get a stack-trace.... (note: if I would had kdb compiled
into the kernel I wouldn't had to recompile, reboot, and try to reproduce
to get a stack trace). But don't take this report as a bug report of
pre-2.3.7-9 since I have so many page-cache differences in my tree that I
may have missed something as well. But if you know that the problem is
pre-2.3.7-9 specific I like to know :)). Thanks.

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/