Re: downgrade_write replacement in remap_file_pages

From: Andrea Arcangeli
Date: Tue Jun 08 2004 - 17:36:12 EST


On Tue, Jun 08, 2004 at 06:05:08PM +0100, David Howells wrote:
>
> > Apparently downgrade_write deadlocks the kernel in the mmap_sem
> > under load.
>
> Which implementation of rwsems is your kernel using? The spinlock-based one or
> the XADD based one? Have you tried the other version?

stock 2.6 rwsem implementation compiled for PII (I still have tons of
patches to forward port from 2.4-aa, if I would port my rwsem to 2.6 I
would have never noticed this race)

> Have you more than 32767 processes?

no, there should be around 10k processes, sure not more than 20k.

> Do you have any stack traces?

yes:

strace:

open("/proc/6022/stat", O_RDONLY) = 6
read(6,

SYSRQ+T

[<c01fa365>] rwsem_down_read_failed+0x85/0x121
[<c01a0da9>] .text.lock.array+0x49/0xd0
[<c01e4412>] avc_has_perm+0x62/0x78
[<c01e5ba3>] inode_has_perm+0x53/0x90
[<c019d3b1>] proc_info_read+0x51/0x150
[<c016ada1>] vfs_read+0xe1/0x130
[<c016b001>] sys_read+0x91/0xf0

it's the down_read in proc_pid_stat. the workload running at the same
time is heavy remap_file_pages. They're processes so the only race
happens against the /proc filesystem and that's why it hangs there.
Somehow downgrade_write in remap_file_pages races with down_read in
/proc. My patch workarounds the deadlock by not calling downgrade_write,
but I posted it to l-k because my code is better anyways since there's
no good reason to ever call down_write in the fast path (and if we don't
start down_write we don't need downgrade_write anymore). The only thing
bitten by downgrade_write left is xfs.

You can imagine which is the critical apps that triggers this deadlock,
not many apps are using remap_file_pages in production yet, and very few
are going to call it in a flood.

I agree with Andrew the limit of 32k processes needs fixing, but nobody
noticed yet with any real app, so it's a low prio matter.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/