Re: [PATCH v3] futex: Fix regression with read only mappings

From: Shawn Bohrer
Date: Wed Jun 29 2011 - 10:57:05 EST


On Tue, Jun 28, 2011 at 04:55:59PM -0700, Darren Hart wrote:
>
>
> On 06/28/2011 10:38 AM, Shawn Bohrer wrote:
> > On Tue, Jun 28, 2011 at 07:52:06AM -0700, Darren Hart wrote:
> >>
> >>
> >> On 06/28/2011 03:54 AM, Peter Zijlstra wrote:
> >>> On Mon, 2011-06-27 at 17:22 -0500, Shawn Bohrer wrote:
> >>>> commit 7485d0d3758e8e6491a5c9468114e74dc050785d (futexes: Remove rw
> >>>> parameter from get_futex_key()) in 2.6.33 introduced a user-mode
> >>>> regression in that it additionally prevented futex operations on a
> >>>> region within a read only memory mapped file. For example this breaks
> >>>> workloads that have one or more reader processes doing a FUTEX_WAIT on a
> >>>> futex within a read only shared mapping, and a writer processes that has
> >>>> a writable mapping issuing the FUTEX_WAKE.
> >>>>
> >>>> This fixes the regression for futex operations that should be valid on
> >>>> RO mappings by trying a RO get_user_pages_fast() when the RW
> >>>> get_user_pages_fast() fails so as not to slow down the common path of
> >>>> writable anonymous maps and bailing when we used the RO path on
> >>>> anonymous memory.
> >>>>
> >>>> While fixing the regression this patch opens up two possible bad
> >>>> scenarios as identified by KOSAKI Motohiro:
> >>>>
> >>>> 1) This patch also allows FUTEX_WAIT on RO private mappings which have
> >>>> the following corner case.
> >>>>
> >>>> Thread-A: call futex(FUTEX_WAIT, memory-region-A).
> >>>> get_futex_key() return inode based key.
> >>>> sleep on the key
> >>>> Thread-B: call mprotect(PROT_READ|PROT_WRITE, memory-region-A)
> >>>> Thread-B: write memory-region-A.
> >>>> COW happen. This process's memory-region-A become related
> >>>> to new COWed private (ie PageAnon=1) page.
> >>>> Thread-B: call futex(FUETX_WAKE, memory-region-A).
> >>>> get_futex_key() return mm based key.
> >>>> IOW, we fail to wake up Thread-A.
> >>>>
> >>>> 2) Current futex code doesn't handle zero page properly.
> >>>>
> >>>> Read mode get_user_pages() can return zero page, but current futex
> >>>> code doesn't handle it at all. Then, zero page makes infinite loop
> >>>> internally.
> >>>>
> >>>> This Patch is based on Peter Zijlstra's initial patch with modifications to
> >>>> only allow RO mappings for futex operations that need VERIFY_READ access.
> >>>>
> >>>> Reported-by: David Oliver <david@xxxxxxxxxxxxxxx>
> >>>> Signed-off-by: Shawn Bohrer <sbohrer@xxxxxxxxxxxxxxx>
> >>>
> >>> Acked-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> >>
> >> I think we need to address #2 above first.
> >
> > I believe the following contrived case triggers the zero page problem:
> >
> > #include <errno.h>
> > #include <fcntl.h>
> > #include <stdint.h>
> > #include <linux/futex.h>
> > #include <sys/mman.h>
> > #include <sys/syscall.h>
> > #include <unistd.h>
> > #include <stdio.h>
> >
> >
> > int main(int argc, char *argv[])
> > {
> > int fd, *futex, rc, val = 42;
> > struct timespec ts = {.tv_sec = 2, .tv_nsec = 0 };
> >
> > fd = open("/tmp/futex_test", O_RDWR|O_CREAT, 0644);
> > write(fd, &val, 4);
> > futex = (int *)mmap(0, sizeof(int), PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> > rc = syscall(SYS_futex, futex, FUTEX_WAIT, val, &ts, 0, 0);
> > printf("rc=%d errno=%d\n", rc, errno);
> > }
> >
> > Running the program above with my patch applied will spin in the
> > kernel at 100% CPU usage.
> >
> >> 1) Is the loop killable?
> >
> > Yes, SIGINT causes the program to return.
>
> OK, so while it is killable... I'd really rather not introduce a busy loop in the
> kernel. Especially not after we removed it with:
>
> http://lkml.org/lkml/2010/1/13/136
>
> How about adding something like this. I _think_ the only way to get a ZERO_PAGE
> here now is with the contrived testcase below.
>
> $ git diff
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 1737a66..a5417c2 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -316,6 +316,13 @@ again:
> if (!page_head->mapping) {
> unlock_page(page_head);
> put_page(page_head);
> + /*
> + * ZERO_PAGE pages don't have a mapping. Avoid a busy loop
> + * trying to find one. RW mapping would have COW'd (and thus
> + * have a mapping) so this page is RO and won't ever change.
> + */
> + if ((page_head == ZERO_PAGE(address)))
> + return -EFAULT;
> goto again;
> }
>
>
> This returns EFAULT for the following testcase which spun in get_futex_key with
> your V3 patch.

I've tested and confirmed this as well. I can send a v4 patch with
this included.

>
> #include <errno.h>
> #include <fcntl.h>
> #include <stdint.h>
> #include <linux/futex.h>
> #include <sys/mman.h>
> #include <sys/syscall.h>
> #include <unistd.h>
> #include <stdio.h>
>
>
> int main(int argc, char *argv[])
> {
> int fd, *futex, rc;
> struct timespec ts = {.tv_sec = 2, .tv_nsec = 0 };
>
> futex = (int *)mmap(0, sizeof(int), PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> printf("futex @ %p\n", futex);
> rc = syscall(SYS_futex, futex, FUTEX_WAIT, 0, &ts, 0, 0);
> printf("rc=%d errno=%d\n", rc, errno);
> }
>
> Thanks,
>
> Darren
>
> >
> >> 2) If not, and we try to catch zero_page scenario, we need to ensure RO
> >> sparse file mapping work. Peter notes that these are probably fine,
> >> filemap.c doesn't seem to use the zero page - but we need to test.
> >>
> >> It's agreed, I believe, between myself, Peter, and Shawn that we will
> >> document Private RW mappings as unsupported.

--
Shawn


---------------------------------------------------------------
This email, along with any attachments, is confidential. If you
believe you received this message in error, please contact the
sender immediately and delete all copies of the message.
Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/