Re: [RFC/PATCH] mm/futex: Fix futex writes on archs with SW trackingof dirty & young

From: Shan Hai
Date: Thu Jul 21 2011 - 21:56:42 EST


On 07/22/2011 06:59 AM, Andrew Morton wrote:
On Fri, 22 Jul 2011 08:52:06 +1000
Benjamin Herrenschmidt<benh@xxxxxxxxxxxxxxxxxxx> wrote:

On Thu, 2011-07-21 at 15:36 -0700, Andrew Morton wrote:
On Tue, 19 Jul 2011 14:29:22 +1000
Benjamin Herrenschmidt<benh@xxxxxxxxxxxxxxxxxxx> wrote:

The futex code currently attempts to write to user memory within
a pagefault disabled section, and if that fails, tries to fix it
up using get_user_pages().

This doesn't work on archs where the dirty and young bits are
maintained by software, since they will gate access permission
in the TLB, and will not be updated by gup().

In addition, there's an expectation on some archs that a
spurious write fault triggers a local TLB flush, and that is
missing from the picture as well.

I decided that adding those "features" to gup() would be too much
for this already too complex function, and instead added a new
simpler fixup_user_fault() which is essentially a wrapper around
handle_mm_fault() which the futex code can call.

Signed-off-by: Benjamin Herrenschmidt<benh@xxxxxxxxxxxxxxxxxxx>
---

Shan, can you test this ? It might not fix the problem
um, what problem. There's no description here of the user-visible
effects of the bug hence it's hard to work out what kernel version(s)
should receive this patch.
Shan could give you an actual example (it was in the previous thread),
but basically, livelock as the kernel keeps trying and trying the
in_atomic op and never resolves it.

What kernel version(s) should receive this patch?
I haven't dug. Probably anything it applies on as far as we did that
trick of atomic + gup() for futex.
You're not understanding me.

I need a good reason to merge this into 3.0.

The -stable maintainers need even better reasons to merge this into
earlier kernels.

Please provide those reasons!


Summary:
- Encountered a 100% CPU system usage problem on pthread_mutex allocated in a
shared memory region, and the problem occurs only on setting PRIORITY_INHERITANCE
to the pthread_mutex.
- ftrace result reveals that an infinite loop in the futex_lock_pi caused high CPU usage.
- The powerpc e500 was affected but the x86 was not.
I have not tested on other archs so I am not sure whether the other archs are attacked
by the problem.
- Tested it on 2.6.34 and 3.0-rc7, both are affected, earlier versions might be affected.

Please refer the threads "[PATCH 0/1] Fixup write permission of TLB on powerpc e500 core"
and "[PATCH 1/1] Fixup write permission of TLB on powerpc e500 core" for the whole story.
Provided the test case code in the [PATH 0/1].

Thanks
Shan Hai

(Documentation/stable_kernel_rules.txt, 4th bullet)

(And it's not just me and -stable maintainers. Distro maintainers will
also look at this patch and wonder whether they should merge it)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/