Re: [PATCH 0/4] change sb_writers to use percpu_rw_semaphore

From: Dave Chinner
Date: Mon Aug 10 2015 - 18:42:02 EST

Next message: Guenter Roeck: "Re: [PATCH] hwmon:change sht15_reverse()"
Previous message: Bjorn Andersson: "Re: [PATCH] input: gpio_keys: Don't report events on gpio failure"
In reply to: Jan Kara: "Re: [PATCH 0/4] change sb_writers to use percpu_rw_semaphore"
Next in thread: Oleg Nesterov: "Re: [PATCH 0/4] change sb_writers to use percpu_rw_semaphore"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Aug 10, 2015 at 04:59:42PM +0200, Jan Kara wrote:
> On Fri 07-08-15 21:55:52, Oleg Nesterov wrote:
> > I'll try to re-check/re-test, but so far I think that this and the
> > previous series are correct. However 3/4 from the previous series
> > (attached at the end just in case) uncovers (I think) some problems
> > in xfs locking.
....
>
> Hum, I had a look at the lockdep report below and it's one of the
> peculiarities of the freeze protection. For record let me repeat the full
> argument for why I don't think there's a possibility for a real deadlock.
> Feel free to skip to the end if you get bored.
>
> One would like to construct the lock chain as:
>
> CPU0 (chown foo dir) CPU1 (readdir dir) CPU2 (page fault)
> process Y process X, thread 0 process X, thread 1
>
> get ILOCK for dir
> gets freeze protection
> starts transaction in xfs_setattr_nonsize
> waits to get ILOCK on 'dir'
> get mmap_sem for X
> wait for mmap_sem for process X
> in filldir()
> wait for freeze protection in
> xfs_page_mkwrite
>
> and CPU3 then being in freeze_super() blocking CPU2 and waiting for CPU0 to
> finish it's freeze-protected section. But this cannot happen. The reason is
> that we block writers level-by-level and thus while there are writers at
> level X, we do not block writers at level X+1. So in this particular case
> freeze_super() will block waiting for CPU0 to finish its freeze protected
> section while CPU2 is free to continue.
>
> In general we have a chain like
>
> freeze L0 -> freeze L1 -> freeze L2 -> ILOCK -> mmap_sem --\
> A |
> \------------------------------------------/
>
> But since ILOCK is always acquired with freeze protection at L0 and we can
> block at L1 only after there are no writers at L0, this loop can never
> happen.
>
> Note that if we use the property of freezing that lock at level X+1 cannot
> block when we hold lock at level X, we can as well simplify the dependency
> graph and track in it only the lowest level of freeze lock that is
> currently acquired (since the levels above it cannot block and do not in
> any way influence blocking of other processes either and thus are
> irrelevant for the purpose of deadlock detection). Then the dependency
> graph we'd get would be:
>
> freeze L0 -> ILOCK -> mmap_sem -> freeze L1
>
> and we have a nice acyclic graph we like to see... So probably we have to
> hack the lockdep instrumentation some more and just don't tell lockdep
> about freeze locks at higher levels if we already hold a lock at lower
> level. Thoughts?

The XFS directory ilock->filldir->might_fault locking path has been
generating false positives in quite a lot of places because of
things we do on one side of the mmap_sem in filesystem paths vs
thigs we do on the other side of the mmap_sem in the page fault
path.

I'm getting sick of these stupid lockdep false positives. I think I
need to rework the XFS readdir locking patch I wrote a while back:

http://oss.sgi.com/archives/xfs/2014-03/msg00146.html

At the time it wasn't clear what the best way forward was. Right now
I think the method I originally used (IOLOCK for directory data and
hence readdir, ILOCK for everything else) will be sufficient; if we
need to do anything smarter that can be addressed further down the
road.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Guenter Roeck: "Re: [PATCH] hwmon:change sht15_reverse()"
Previous message: Bjorn Andersson: "Re: [PATCH] input: gpio_keys: Don't report events on gpio failure"
In reply to: Jan Kara: "Re: [PATCH 0/4] change sb_writers to use percpu_rw_semaphore"
Next in thread: Oleg Nesterov: "Re: [PATCH 0/4] change sb_writers to use percpu_rw_semaphore"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]