Re: Change in functionality of futex() system call.

From: Andrew Lutomirski
Date: Tue Jun 07 2011 - 16:12:53 EST

Next message: Andrew Morton: "Re: writeback merge status, was Re: [PATCH 00/18] writeback fixesand cleanups for 2.6.40 (v3)"
Previous message: Matt Mackall: "Re: ketchup script and 3.0"
In reply to: David Oliver: "Re: Change in functionality of futex() system call."
Next in thread: Kyle Moffett: "Re: Change in functionality of futex() system call."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Jun 7, 2011 at 4:04 PM, David Oliver <david@xxxxxxxxxxxxxxx> wrote:
> ïOn Tue, Jun 7, 2011 at 2:53 PM, Andrew Lutomirski <luto@xxxxxxx> wrote:
>> On Tue, Jun 7, 2011 at 3:33 PM, David Oliver <david@xxxxxxxxxxxxxxx> wrote:
>>> On Tue, Jun 7, 2011 at 2:19 PM, Andrew Lutomirski <luto@xxxxxxx> wrote:
>>>> On Tue, Jun 7, 2011 at 3:10 PM, David Oliver <david@xxxxxxxxxxxxxxx> wrote:
>>>>> On Tue, Jun 7, 2011 at 1:43 PM, Andrew Lutomirski <luto@xxxxxxx> wrote:
>>>>>> On Tue, Jun 7, 2011 at 11:58 AM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
>>>>>>> Le mardi 07 juin 2011 Ã 10:44 -0400, Andy Lutomirski a Ãcrit :
>>>>>>>> On 06/06/2011 11:13 PM, Darren Hart wrote:
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On 06/06/2011 11:11 AM, Eric Dumazet wrote:
>>>>>>>> >> Le lundi 06 juin 2011 Ã 10:53 -0700, Darren Hart a Ãcrit :
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>> If I understand the problem correctly, RO private mapping really doesn't
>>>>>>>> >>> make any sense and we should probably explicitly not support it, while
>>>>>>>> >>> special casing the RO shared mapping in support of David's scenario.
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >> We supported them in 2.6.18 kernels, apparently. This might sounds
>>>>>>>> >> stupid but who knows ?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > I guess this is actually the key point we need to agree on to provide a
>>>>>>>> > solution. This particular case "worked" in 2.6.18 kernels, but that
>>>>>>>> > doesn't necessarily mean it was supported, or even intentional.
>>>>>>>> >
>>>>>>>> > It sounds to me that we agree that we should support RO shared mappings.
>>>>>>>> > The question remains about whether we should introduce deliberate
>>>>>>>> > support of RO private mappings, and if so, if the forced COW approach is
>>>>>>>> > appropriate or not.
>>>>>>>> >
>>>>>>>>
>>>>>>>> I disagree.
>>>>>>>>
>>>>>>>> FUTEX_WAIT has side-effects. ÂSpecifically, it eats one wakeup sent by
>>>>>>>> FUTEX_WAKE. ÂSo if something uses futexes on a file mapping, then a
>>>>>>>> process with only read access could (if the semantics were changed) DoS
>>>>>>>> the other processes by spawning a bunch of threads and FUTEX_WAITing
>>>>>>>> from each of them.
>>>>>>>>
>>>>>>>> If there were a FUTEX_WAIT_NOCONSUME that did not consume a wakeup and
>>>>>>>> worked on RO mappings, I would drop my objection.
>>>>>>>
>>>>>>> If a group of cooperating processes uses a memory segment to exchange
>>>>>>> critical information, do you really think this memory segment will be
>>>>>>> readable by other unrelated processes on the machine ?
>>>>>>
>>>>>> Depends on the design.
>>>>>>
>>>>>> I have some software I'm working on that uses shared files and could
>>>>>> easily use futexes.
>>>>>>
>>>>> I have software which currently uses shared files for a one way
>>>>> transfer of information, which is modeled precisely by the futex (as
>>>>> contrasted to the mutex) model. In this case, the number of receivers
>>>>> is undetermined, so the number of wakeups is set to maxint.
>>>>>
>>>>> The receivers are minimally trusted: they have read access to the
>>>>> files, so they cannot accidentally affect other processes use of the
>>>>> data. Requiring my files to be writeable by all clients would require
>>>>> a serious increase in the amount of software needing to be trusted.
>>>>
>>>> What's wrong with adding a FUTEX_WAIT_NOCONSUME flag then? ÂYour
>>>> program can use it to get exactly the semantics it wants and my
>>>> program can use it or not depending on which semantics it wants.
>>>>
>>> 1. I would prefer not to require my programs have to check for kernel
>>> version (code named "working", "regressed", and "altered") to decide
>>> which parameters need to be sent to the futex call.
>>
>> You don't have to check for kernel version. ÂJust try
>> FUTEX_WAIT_NOCONSUME first and retry with FUTEX_WAIT if it returns
>> -EINVAL.
>>
> ... and punt if that gives me an EFAULT. Possible but clumsy.
> Fortunately, I'm not writing code for general consumption.
>
>> I think you've already lost on regressed kernels regardless :-/
>>
>>> 2. Doing FUTEX_WAIT_NOCONSUME would change the semantics of
>>> futex_wake() between the "working" and "altered" kernels, as it would
>>> no longer return the number of processes woken.
>>
>> True, but that change couldn't affect old code because old code
>> wouldn't use FUTEX_WAIT_NOCONSUME.
>>
> So, how would I find out the number of processes awakened by the
> futex_wake() - I only care for statistical purposes.

Add a FUTEX_WAKE_COUNT_NOCONSUME or some such magic flag. Yeah, not so pretty.

>
>>>
>>> It seems that FUTEX_WAIT_NOCONSUME would be rather like a
>>> non-consuming read on a pipe.
>>
>> More like a nonconsuming read on an eventfd, which sounds very useful.
>> Â(Actually, I'm porting code from Windows to Linux right now that
>> wants that feature...)
>>
>> The reason I bring this up now is that I've been annoyed that
>> FUTEX_WAIT can be used on an R/O mapping to interfere with futexes in
>> that mapping. ÂUnder the original semantics this would have been
>> pretty much impossible to fix, but the regression has been there for
>> long enough that we have the option right now to fix it better instead
>> of restoring the original behavior.
>>
> Not being a kernel developer, the change seems very recent - about
> when I started finding my code failing with EFAULTs.
>
> From my perspective, that's a real case of my futexes being interfered with :).

Fair enough. But it's a little late to prevent the regression.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Morton: "Re: writeback merge status, was Re: [PATCH 00/18] writeback fixesand cleanups for 2.6.40 (v3)"
Previous message: Matt Mackall: "Re: ketchup script and 3.0"
In reply to: David Oliver: "Re: Change in functionality of futex() system call."
Next in thread: Kyle Moffett: "Re: Change in functionality of futex() system call."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]