ncpfs 2.2.x/2.3.x lockup [PATCH]

Petr Vandrovec Ing. VTEI (VANDROVE@vc.cvut.cz)
Fri, 14 May 1999 20:51:24 MET-1


Hi,
yesterday Raul Miller reported that there is deadlock problem with ncpfs.
I tracked it down to deadlock between mmap and read/write. You can download
patch I attached for Linus from
ftp://platan.vc.cvut.cz/pub/linux/ncpfs/latest/ncpfs-2.2.0.15-kernel-?.?.?.gz,
where ?.?.? is 2.2.9 or 2.3.1.
Except problems below, I have problem with starvation of other tasks when
some tasks run 'dd if=<ncpfs_file> of=/dev/null bs=4M' (or some large value).
Problem is that ncpfs does
down, read 1KB, up, copy_to_user, down, read another 1KB, up, copy_to_user, ..
When another task is interested in ncp connection, it fails to 'down'.
And because of 'dd' is for 99% of time sleeping in 'read 1KB' with
semaphore held, other tasks have almost no chance to acquire lock.
Is there some primitive I overlooked, which (for example) schedules
another task in 'up' immediately? I'm writting 'ncprpc', which can queue
request to server into real queue, but I do not think that it will work
in production quality before September :-( (and there is another question,
whether network gurus ever approve it - it plays dirty games with TCP, UDP
and IPX stacks to save unneded data moves).
Thanks,
Petr Vandrovec
vandrove@vc.cvut.cz

------- Forwarded Message Follows -------

From: vandrove@vc.cvut.cz
To: torvalds@transmeta.com
Subject: ncpfs 2.2.x/2.3.x lockup [PATCH]
Copies to: rdm@test.legislate.com
Date sent: Fri, 14 May 1999 20:35:34 MET-1

[FYI, I'm sending it to linux-kernel@vger.rutgers.edu without patches]
Hi Linus,
Raul Miller <rdm@test.legislate.com> reported yesterday that sometime task
(egrep) accessing ncpfs lockups in 'D' state when doing large egrep. I've
found that ncpfs has problem that in 'read' and 'write', copy_{to|from}_user
is called when connection to server is locked. Thus if you were
reading/writting from/to memory region mmaped, mmap_nopage occurs
and when this tries to read contents of underlying page to memory,
deadlock occurs, because of connection is already locked by 'read' or
'write'.
I'll fix it in 2.3 by switching to using pagecache (as it looks like that
ncpfs is almost only filesystem not using it yet... Unfortunately, as we
are using it with record locking, I have to investigate, how to prevent
interstation deadlocks if one station locks some part of page and
another station tries to lock and read another part of that page. I hope,
that I'll find some nice solution.)
Except this absolutely needed fix (this bug is here since ncpfs can mmap,
for at least two years...), I also:
+ change connection lock from waitqueue+variable to mutex. old
'while (lock) sleep_on(&queue); lock=1;' and 'lock=0; wake_up(&queue);'
was not OK for SMP (I thought that problem is here...).
+ move NCP_{MIN,MAX}_SYMLINK_SIZE from userspace visible place to internal
header, as userspace is not interested in this value. It was public only
in 2.2.7-2.2.9, so it should not cause any headaches.
2.2.9 patch is fully tested, 2.3.1 boots, passed testcase and worked for
about hour (I did not prepare 2.3.x developer machine yet), so I think it
works too.
Best regards,
Petr Vandrovec
vandrove@vc.cvut.cz

Attached files: ncp229.gz = patch for 2.2.9, gzipped unified diff
ncp231.gz = patch for 2.3.1, gzipped unified diff

Attachments removed for linux-kernel@vger.rutgers.edu

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/