Re: Are there u32 atomic bitops? (or dealing w/ i_flags)

From: Andy Lutomirski
Date: Tue Dec 18 2012 - 17:20:18 EST


On Tue, Dec 18, 2012 at 1:30 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Mon, Dec 17, 2012 at 06:42:44PM -0800, Andy Lutomirski wrote:
>> On Mon, Dec 17, 2012 at 5:57 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>> > On Mon, Dec 17, 2012 at 05:10:21PM -0800, Andy Lutomirski wrote:
>> >> I want to change inode->i_flags access to be atomic -- there are some
>> >> locking oddities right now, I think, and I want to use a new inode
>> >> flag to signal mtime updates from page_mkwrite. The problem is that
>> >> i_flags is an unsigned int, and making it an unsigned long seems like
>> >> a waste, but there aren't any u32 atomic bitops.
>> >
>> > ... and atomic accesses cost more. A lot more on some architectures.
>> > FWIW, atomic_t *is* 32bit on 32bit architectures, which still doesn't
>> > make it a good idea.
>>
>> Are atomic_set_mask and atomic_clear_mask as fast as set_bit and
>> friends on all archs?
>>
>> In any case, i_flags looks like it's rarely written, so I find it a
>> bit hard to believe that making it atomic would hurt. Isn't
>> atomic_read equivalent to non-atomic reads everywhere?
>>
>> I want page_mkwrite to set a flag (without taking i_mutex) but *not*
>> call file_update_time and then to have the writeback paths update the
>> inode time.
>
> Deadlocks ahoy!
>
> We don't currently take the i_mutex anywhere in the writeback path
> as the writeback path is nested inside the i_mutex. Hence this seems
> like an extremely dangerous thing to do...

Hmm.

This can all be made fs-specific: page_mkclean_file sets a bit in the
inode flags or inode state or address_space_flags. (The latter might
be nice -- it's already atomic and I think there are plenty of bits
free.) Then a filesystem could stop calling file_update_time in
->page_mkwrite and instead test-clear that bit in ->writepage.

At that point, maybe the fs knows it's safe to mark the inode dirty
and update times. (The actual time fields don't seem to be uniformly
protected by any lock.) Can ext4_mark_inode_dirty be called from
->writepage?

Filesystems that haven't been converted can will continue to update
times in ->page_mkwrite.

>
>> (This, along with stable pages, is the major cause of
>> long sleeps in my application.) OTOH, maybe I should just use i_state
>> and i_lock for this.
>
> Or, perhaps, export O_CMTIME to fcntl and/or open?

That would work, but it would be kind of nice for the time stamps on
my files to get updated correctly. I'd argue that it's more correct
for the timestamp to be after the update (at page cleaning time,
probably) than at page_mkwrite time.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/