Re: [PATCH 2.5.64 2/2] i_size atomic access

From: Andrea Arcangeli (andrea@suse.de)
Date: Mon Mar 10 2003 - 08:16:11 EST


On Fri, Mar 07, 2003 at 08:33:40PM -0800, Andrew Morton wrote:
> Jan Harkes <jaharkes@cs.cmu.edu> wrote:
> >
> > On Fri, Mar 07, 2003 at 05:26:31PM -0800, Daniel McNeil wrote:
> > > On Fri, 2003-03-07 at 16:30, Andrew Morton wrote:
> > > > Daniel McNeil <daniel@osdl.org> wrote:
> > > > > This adds i_seqcnt to inode structure and then uses i_size_read() and
> > > > > i_size_write() to provide atomic access to i_size.
> > > >
> > > > Ho hum. Everybody absolutely hates this, but I guess we should do it :(
> >
> > I am really curious whether this patch is really all that useful, has
> > anyone ever noticed enough lock contention on inode semaphore caused by
> > accessing i_size? Whenever i_size changes it needs to be locked down
> > either way because mappings have to be extended or truncated.
>
> The problem is not lock contention. The problem is that the read() paths are
> performing nonatomic reads of a 64-bit value. If a writer is updating i_size
> at the same time the reader can see grossly incorrect values.
>
> It's such a remote problem that nobody really has the heart to do anything
> about it. But it's there...

well really this is fixed in my tree and in some distribution kernels
for half an year, it's true only the major fs are been taken care of,
but definitely somebody had the heart to do something about it 8)

> > A quick grep shows that there are 619 references to ->i_size in the
> > various filesystem subdirs.
>
> Most of these are not inode->i_size. Yes, there are i_size references in
> filesystems, but not many. And the infrastructure is there to mop those up.
>
> If we choose to. I'm still not sure I want to do this :(

There is no other way, some cpu can't even do it atomically (hence the
need of the sequence number approch).

Also note that the atomicity isn't needed everywhere, for example if you
read i_size in the write paths you don't need to use i_size_read, but
you can read with inode->i_size as usual, which is faster and in turn
recommended.

I described the locking rules here:

        http://groups.google.com/groups?q=i_size_read&hl=en&lr=&ie=UTF-8&selm=20020717225504.GA994%40dualathlon.random&rnum=2

          The rules are: 1) i_size_write must be used for all i_size
          updates (at least when there can be potential parallel readers
          outside the i_sem), 2) i_size_read must be used for all lockless
          reads when an i_size change can happen from under us.
          
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Mar 15 2003 - 22:00:21 EST