Re: IPC on 2.0.x -- kernel bug and fix

Bernd Schmidt (crux@pool.informatik.rwth-aachen.de)
Mon, 23 Sep 1996 09:57:29 +0200 (MET DST)


On 16 Sep 1996, Dragon Slayer wrote:
> Is anyone else seeing the following messages:
> shm_swap: bad pgmid! id=2 start=401e5000 idx=2130

I've written a small test program that uses shared memory a lot, and after a
while these messages start to happen. This is on a 2.0.20 kernel, and
absolutely reproducable (I can post the program if there's interest).

I started digging in old patches, compiled a few kernels and found out that
this started to happen in kernel 1.3.26. After staring at the patch for a
while, I came up with the following fix. This seems to eliminate the problem
completely:

--- linux/kernel/fork.c~ Sat Sep 21 16:45:52 1996
+++ linux/kernel/fork.c Sat Sep 21 17:20:36 1996
@@ -100,12 +100,12 @@
mpnt->vm_next_share = tmp;
tmp->vm_prev_share = mpnt;
}
- if (tmp->vm_ops && tmp->vm_ops->open)
- tmp->vm_ops->open(tmp);
if (copy_page_range(mm, current->mm, tmp)) {
exit_mmap(mm);
return -ENOMEM;
}
+ if (tmp->vm_ops && tmp->vm_ops->open)
+ tmp->vm_ops->open(tmp);
*p = tmp;
p = &tmp->vm_next;
}

(The function is dup_mmap(), called when forking). Here's what seems to happen:
The old code (1.3.25) copied the pages first, then called the vm_open routine.
This order was changed in 1.3.26. Now, open is called first and if the area
belongs to the shm code, it will be put into a list which is read by shm_swap
to determine which pages can be swapped out. The problem is that before the
area is completely copied, copy_page_range() may need new pages and therefore
shm_swap() gets called. It finds the information that was inserted by the
open call, but it can't find the pages because they haven't been copied yet.

As far as I can tell, the bug is harmless and shouldn't result in any real
problems, but it still ought to get fixed.

Bernd