Re: [PATCH] VFS: lseek(fd, 0, SEEK_CUR) race condition

From: Andrew Morton
Date: Mon Nov 10 2008 - 20:08:58 EST


On Thu, 6 Nov 2008 21:21:12 +0100
Alain Knaff <alain@xxxxxxxx> wrote:

> This patch fixes a race condition in lseek. While it is expected that
> unpredictable behaviour may result while repositioning the offset of a
> file descriptor concurrently with reading/writing to the same file
> descriptor, this should not happen when merely *reading* the file
> descriptor's offset.
>
> Unfortunately, the only portable way in Unix to read a file
> descriptor's offset is lseek(fd, 0, SEEK_CUR); however executing this
> concurrently with read/write may mess up the position, as shown by the
> testcase below:
>
> #include <sys/types.h>
> #include <stdio.h>
> #include <pthread.h>
> #include <unistd.h>
> #include <errno.h>
> #include <string.h>
> #include <pthread.h>
>
>
> void *loop(void *ptr)
> {
> fprintf(stderr, "Starting seek thread\n");
> while(1) {
> if(lseek(0, 0LL, SEEK_CUR) < 0LL)
> perror("seek");
> }
> }
>
> int main(int argc, char **argv) {
> long long l=0;
> int r;
> char buf[4096];
>
> pthread_t thread;
> pthread_create(&thread, 0, loop, 0);
>
> for(r=0; 1 ; r++) {
> int n = read(0, buf, 4096);
> if(n == 0)
> break;
> if(n < 4096) {
> fprintf(stderr, "Short read %d %s\n", n, strerror(errno));
> }
> l+= n;
> }
> fprintf(stderr, "Read %lld bytes\n", l);
>
> return 0;
> }
>
> Compile this and run it on a multi-processor machine as
> ./a.out <bigFile
>
> where bigFile is a 1 Gigabyte file. It should print 1073741824.
> However, on a buggy kernel, it usually produces a bigger number. The
> problem only happens on a multiprocessor machine. This is because an
> lseek(fd, 0, SEEK_CUR) running concurrently with a read() or write()
> will reset the position back to what it used to be when the read()
> started.
>
> This behavior was observed "in the wild" when using udpcast which uses
> lseek to monitor progress of reading/writing the uncompressed data.
>
> The patch below fixes the issue by "special-casing" the lseek(fd, 0,
> SEEK_CUR) pattern.
>
> Apparently, an attempt was already made to fix the issue by the
> following code:
>
> if (offset != file->f_pos) {
> file->f_pos = offset;
> file->f_version = 0;
> }
>
> However, this doesn't work if file->f_pos was changed (by read() or
> write()) between the time offset was computed, and the time where it
> considers writing it back.
>
> Signed-off-by: Alain Knaff <alain@xxxxxxxx>
>
> ---
>
> diff -pur kernel.orig/fs/read_write.c kernel/fs/read_write.c
> --- kernel.orig/fs/read_write.c 2008-10-11 14:12:07.000000000 +0200
> +++ kernel/fs/read_write.c 2008-11-06 19:55:59.000000000 +0100
> @@ -42,6 +42,8 @@ generic_file_llseek_unlocked(struct file
> offset += inode->i_size;
> break;
> case SEEK_CUR:
> + if(offset == 0)
> + return file->f_pos;
> offset += file->f_pos;
> }
> retval = -EINVAL;

OK, I think that a concurrent lseek(fd, 0, SEEK_CUR) is a sufficiently
sane operation that this is worth doing. As you point out, there is no
other way of userspace doing what is effectively a read-only operation
- userspace would be entitled to wonder "ytf did the kernel rewrite the
file offset for that?".


Do the below additions look OK?

From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>

- fix coding-style
- fix default_llseek() as well
- add comments

Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Alain Knaff <alain@xxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

fs/read_write.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

diff -puN fs/read_write.c~vfs-lseekfd-0-seek_cur-race-condition-fix fs/read_write.c
--- a/fs/read_write.c~vfs-lseekfd-0-seek_cur-race-condition-fix
+++ a/fs/read_write.c
@@ -50,6 +50,14 @@ generic_file_llseek_unlocked(struct file
offset += inode->i_size;
break;
case SEEK_CUR:
+ /*
+ * Here we special-case the lseek(fd, 0, SEEK_CUR)
+ * position-querying operation. Avoid rewriting the "same"
+ * f_pos value back to the file because a concurrent read(),
+ * write() or lseek() might have altered it
+ */
+ if (offset == 0)
+ return file->f_pos;
offset += file->f_pos;
break;
}
@@ -105,8 +113,14 @@ loff_t default_llseek(struct file *file,
offset += i_size_read(file->f_path.dentry->d_inode);
break;
case SEEK_CUR:
- if(offset == 0)
- return file->f_pos;
+ /*
+ * See SEEK_CUR description in
+ * generic_file_llseek_unlocked()
+ */
+ if (offset == 0) {
+ retval = file->f_pos;
+ goto out;
+ }
offset += file->f_pos;
}
retval = -EINVAL;
@@ -117,6 +131,7 @@ loff_t default_llseek(struct file *file,
}
retval = offset;
}
+out:
unlock_kernel();
return retval;
}
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/