mandatory file locking

Christian Ehrhardt (ehrhardt@mathematik.uni-ulm.de)
Wed, 1 Sep 1999 14:13:32 +0200 (MET DST)


Hi,
there are some problems related to file locking, especially
mandatory file locking. I invested some thoughts in a fix but I
couldn't come up with a clean and elegant solution, yet.
I hope this is not well known, I didn't find anything in the archives.
I'd be interested in any ideas how this should be fixed.

Code from 2.2.12: sys_write ():
ret = locks_verify_area(FLOCK_VERIFY_WRITE, inode, file,
file->f_pos, count);
if (ret)
goto out;
ret = -EINVAL;
if (!file->f_op || !(write = file->f_op->write))
goto out;
down(&inode->i_sem);
ret = write(file, buf, count, &file->f_pos);
up(&inode->i_sem);

The first Problem is the use of file->f_pos in the call to
locks_verify_area. The value may be changed by another
process sharing the file pointer while we sleep in locks_verify_area.
write will then use the modified value of file->f_pos and the
area we write to is not verified. I could successfully write data
to any mandatory locked area if there is any unlocked area in the file
(tested on 2.2.10).
The implementation of the O_APPEND flag has a similar problem.
A possible fix would be to use a local copy of file->f_pos (breaks
pipe_read and friends) and handle O_APPEND before locks_verify_area in the
VFS layer (there are good reasons not to do in the VFS layer).

The second (potential) Problem is that locks_verify_area only
guarantees that the area is unlocked until we sleep for the first
time but this may already happen in down(). Thus another process might
get a mandatory lock for the area before we even start writing to it.
It is probably hard to trigger this problem because the inode semaphore
is only held for relativly short times.
I think this can be fixed if locks_mandatory_area adds a special
posix lock for the affected area that is is removed after the call
to f_op->write.

Other things I noticed wrt file locking (don't know if these are real
problems):
* I can't see a spin lock for the file_lock_table
* posix deadlock detection might not work properly if CLONE_PID
and CLONE_FILES were used together on a call to sys_clone.
(I know that the man page says that CLONE_PID is not recommended).

-Christian

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/