Re: can select with infinite timeout return 0

Tristan Savatier (tristan@mpegtv.com)
Thu, 20 Aug 1998 01:01:27 -0700


This is a multi-part message in MIME format.
--------------D1DAE88D4FB3F11309F7F034
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Brian McCauley wrote:
>
> Tristan Savatier <tristan@mpegtv.com> writes:
>
> > Another thing bothers me: when select returns 0,
> > one of the fd showed data ready (i.e. FD_ISSET(fd)
> > was TRUE), even though no data was immediately
> > available on that fd. Is that legal ? IMHO, when select returns 0,
> > FD_ISSET(fd) should be 0 on all the fd in the set.
> >
> > Could someone comment on that ?
>
> Just curious, did anyone comment on this bit?
>
> I've looked at the Linux kernel code and it clearly quite deliberately
> leaves the sets unchanged rather than clearing them when it returns
> zero.
>
> I agree with you that intuatively it should clear them.

Here are the answers that I got on that (attached).

-t
--------------D1DAE88D4FB3F11309F7F034

Return-Path: <kenyap@research.canon.com.au>
Received: from kwanon.research.canon.com.au (kwanon.research.canon.com.au [203.12.172.254])
by paris.bok.net (8.8.7/8.8.7) with SMTP id UAA25722
for <tristan@mpegtv.com>; Wed, 5 Aug 1998 20:39:32 -0700
Received: (qmail 24461 invoked from network); 6 Aug 1998 03:40:58 -0000
Received: from grainger.research.canon.com.au (203.12.174.130)
by kwanon-le1.research.canon.com.au with SMTP; 6 Aug 1998 03:40:58 -0000
Received: (qmail 22840 invoked from network); 6 Aug 1998 03:40:57 -0000
Received: from sid.research.canon.com.au (203.12.174.126)
by grainger.research.canon.com.au with SMTP; 6 Aug 1998 03:40:57 -0000
Received: from research.canon.com.au (mydland.research.canon.com.au [10.2.0.136])
by sid.research.canon.com.au (8.8.5/8.8.5) with ESMTP id NAA06880
for <tristan@mpegtv.com>; Thu, 6 Aug 1998 13:42:21 +1000 (EST)
Message-Id: <199808060342.NAA06880@sid.research.canon.com.au>
To: Tristan Savatier <tristan@mpegtv.com>
Reply-To: ken.yap@research.canon.com.au (Ken Yap)
Subject: Re: can select with infinite timeout return 0
Content-type: text/plain; charset="iso-8859-1"
X-Newsgroups: comp.os.linux.development.system 47239
In-Reply-To: <35C917EA.9543646A@mpegtv.com>
X-Snail: CISRA, 1 Thomas Holt Drive, North Ryde NSW 2113, Australia
X-Phone: (+61 2) 9805-2790 Fax: (+61 2) 9805-2929
X-Face: bak'McMAD{%JrA$mQ(j_Ex_o?a/F8/Nt<W\sbPkh?,)PF7TK1{Lh"HJLuQfhE}(Dj!g:c(U =wh/r[<MJUF}hXzR*URO0e/Lh'mn_YUpU+;ycf6:0>ng*t2KX(NcfGalVs^Ke^C61:F
Date: Thu, 06 Aug 1998 13:44:10 +1000
From: Ken Yap <kenyap@research.canon.com.au>

Hi Tristan,

|My understanding is that with a NULL timeout, select should
|never return 0. It should only return n > 0 (or -1 in case
|of error or EINTR).

I'm not sure what the relevant standard says, but from reading the man
page, it seems murky. It says select *can* block indefinitely if the
timeout is NULL. I interpret that to mean it's not required to block. If
that is the case then I suppose the program has to be prepared to deal
with the possibility and retry the select. Do post what you find out.

|Another thing bothers me: when select returns 0,
|one of the fd showed data ready (i.e. FD_ISSET(fd)
|was TRUE), even though no data was immediately
|available on that fd. Is that legal ? IMHO, when select returns 0,
|FD_ISSET(fd) should be 0 on all the fd in the set.

I think this is allowed. Since the return is 0, the sets should not
even be examined. It might have been changed in the time between select
returning and fdset being examined.

Cheers, Ken

--------------D1DAE88D4FB3F11309F7F034

Return-Path: <kenyap@research.canon.com.au>
Received: from kwanon.research.canon.com.au (kwanon.research.canon.com.au [203.12.172.254])
by paris.bok.net (8.8.7/8.8.7) with SMTP id AAA31203
for <tristan@mpegtv.com>; Thu, 6 Aug 1998 00:21:17 -0700
Received: (qmail 16877 invoked from network); 6 Aug 1998 07:23:47 -0000
Received: from grainger.research.canon.com.au (203.12.174.130)
by kwanon-le1.research.canon.com.au with SMTP; 6 Aug 1998 07:23:47 -0000
Received: (qmail 25865 invoked from network); 6 Aug 1998 07:23:46 -0000
Received: from sid.research.canon.com.au (203.12.174.126)
by grainger.research.canon.com.au with SMTP; 6 Aug 1998 07:23:46 -0000
Received: from research.canon.com.au (mydland.research.canon.com.au [10.2.0.136])
by sid.research.canon.com.au (8.8.5/8.8.5) with ESMTP id RAA25703
for <tristan@mpegtv.com>; Thu, 6 Aug 1998 17:25:11 +1000 (EST)
Message-Id: <199808060725.RAA25703@sid.research.canon.com.au>
To: Tristan Savatier <tristan@mpegtv.com>
Subject: Re: can select with infinite timeout return 0
Reply-To: Ken.Yap@research.canon.com.au (Ken Yap)
In-reply-to: Your message of Thu, 06 Aug 1998 00:09:55 -0700.
<35C956C3.74CDA81@mpegtv.com>
Content-type: text/plain; charset="iso-8859-1"
X-Snail: CISRA, 1 Thomas Holt Drive, North Ryde 2113, Australia
X-Phone: (+61 2) 9805-2790 Fax: (+61 2) 9805-2929
X-Face: bak'McMAD{%JrA$mQ(j_Ex_o?a/F8/Nt<W\sbPkh?,)PF7TK1{Lh"HJLuQfhE}(Dj!g:
c(U=wh/r[<MJUF}hXzR*URO0e/Lh'mn_YUpU+;ycf6:0>ng*t2KX(NcfGalVs^Ke^C61:F
Date: Thu, 06 Aug 1998 17:26:57 +1000
From: Ken Yap <kenyap@research.canon.com.au>

>> |Another thing bothers me: when select returns 0,
>> |one of the fd showed data ready (i.e. FD_ISSET(fd)
>> |was TRUE), even though no data was immediately
>> |available on that fd. Is that legal ? IMHO, when select returns 0,
>> |FD_ISSET(fd) should be 0 on all the fd in the set.
>>
>> I think this is allowed. Since the return is 0, the sets should not
>> even be examined. It might have been changed in the time between select
>> returning and fdset being examined.
>
>The Linux man page says:
>
> On success, select returns the number of descriptors con­
> tained in the descriptor sets, which may be zero if the
> timeout expires before anything interesting happens. On
> error, -1 is returned, and errno is set appropriately; the
> sets and timeout become undefined, so do not rely on their
> contents after an error.
>
>it also says:
>
> Four macros are provided to manipulate the sets. FD_ZERO
> will clear a set. FD_SET and FD_CLR add or remove a given
> descriptor from a set. FD_ISSET tests to see if a
> descriptor is part of the set; this is useful after select
> returns.
>
>it does not says that FD_ISSET should be used ONLY if select returns >0.
>It just says that FD_ISSET should not be used after an error i.e.
>select returns -1.

Yes, but nowhere does it say that a fdset has to be consistent with the
return value when the return value is <= 0. In fact, I can see that the
spec leaves the door open that more descriptors may actually be ready
than the return value says. If the return value says no descriptors are
ready, the program should not examine any sets at all. I don't see that
this is a hardship.

--------------D1DAE88D4FB3F11309F7F034

Return-Path: <root>
Received: by tristan (Smail3.1.29.1 #29)
id m0z4V26-000QXvC; Thu, 6 Aug 98 11:45 PDT
Received: from oxygen.dynamic-realities.com ([207.170.36.196])
by paris.bok.net (8.8.7/8.8.7) with SMTP id LAA19012
for <tristan@mpegtv.com>; Thu, 6 Aug 1998 11:40:11 -0700
Received: (from andys@localhost) by oxygen.dynamic-realities.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) id NAA17635; Thu, 6 Aug 1998 13:42:42 -0500
Message-ID: <19980806134241.A17515@oxygen.dynamic-realities.com>
Date: Thu, 6 Aug 1998 13:42:41 -0500
From: Andy Sloane <andude@guildsoftware.com>
To: linux-kernel@vger.rutgers.edu
Cc: tristan@mpegtv.com
Subject: Re: can select with infinite timeout return 0
References: <35C917EA.9543646A@mpegtv.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.94.2i
In-Reply-To: <35C917EA.9543646A@mpegtv.com>; from Tristan Savatier on Wed, Aug 05, 1998 at 07:41:46PM -0700

On Wed, Aug 05, 1998 at 07:41:46PM -0700, Tristan Savatier wrote:

> We observed that select sometimes returns 0 even though
> a NULL pointer is passed for the timeout.
>
> According to the man page and all our books, select returns
> 0 only if no fd is ready when the timeout expires.
> Consequently if should never return 0 if a NULL pointer is
> passed for the timeout (infinite timeout).

select returns when either the timeout is reached, a filedescriptor
becomes readable or writable, or _when a signal interrupts it_. You're
using a multithreading package (pthreads, I'm assuming, although even
linuxthreads uses signals to communicate between threads if I'm not
mistaken) which delivers signals to other threads to force context
switches. That's why it only happens during multithreading. What you can
do is check errno for EINTR and repeat the select call if necessary.

Incidentally, I like mpegtv. Keep up the good work. :)

-- 
Andy Sloane
andude@guildsoftware.com

--------------D1DAE88D4FB3F11309F7F034

Return-Path: <root> Received: by tristan (Smail3.1.29.1 #29) id m0z4Wz5-000QXvC; Thu, 6 Aug 98 13:50 PDT Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by paris.bok.net (8.8.7/8.8.7) with ESMTP id NAA23986 for <tristan@mpegtv.com>; Thu, 6 Aug 1998 13:45:01 -0700 Received: from the-village.bc.nu (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id VAA31463 for <tristan@mpegtv.com>; Thu, 6 Aug 1998 21:47:24 +0100 Received: by the-village.bc.nu (Smail3.1.29.1 #2) id m0z4Wtv-000aNFC; Thu, 6 Aug 98 21:44 BST Message-Id: <m0z4Wtv-000aNFC@the-village.bc.nu> From: alan@lxorguk.ukuu.org.uk (Alan Cox) Subject: Re: can select with infinite timeout return 0 To: tristan@mpegtv.com (Tristan Savatier) Date: Thu, 6 Aug 1998 21:44:43 +0100 (BST) In-Reply-To: <35C917EA.9543646A@mpegtv.com> from "Tristan Savatier" at Aug 5, 98 07:41:46 pm Content-Type: text

> I use 2.0.27 but our problem was also seen on Red-Hat 5.0 > systems (x86).

Select hasnt changed until 2.0.34 where there is a bug fix - but not quite for what you describe. It may however be the same thing. < 2.0.34 mishandled one thread closing an fd the other was selecting on

Please test 2.0.34 or 2.0.35 kernel. If you can duplicate it in that mail me directly and I'll try and figure out whats going on. If you happen to have a test program to show the bug then include that too

Alan

--------------D1DAE88D4FB3F11309F7F034

Return-Path: <root> Received: by tristan (Smail3.1.29.1 #29) id m0z4ZFS-000QXvC; Thu, 6 Aug 98 16:15 PDT Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by paris.bok.net (8.8.7/8.8.7) with ESMTP id QAA29149 for <tristan@mpegtv.com>; Thu, 6 Aug 1998 16:08:42 -0700 Received: from the-village.bc.nu (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id AAA01440 for <tristan@mpegtv.com>; Fri, 7 Aug 1998 00:10:52 +0100 Received: by the-village.bc.nu (Smail3.1.29.1 #2) id m0z4Z8k-000aNFC; Fri, 7 Aug 98 00:08 BST Message-Id: <m0z4Z8k-000aNFC@the-village.bc.nu> From: alan@lxorguk.ukuu.org.uk (Alan Cox) Subject: Re: can select with infinite timeout return 0 To: tristan@mpegtv.com (Tristan Savatier) Date: Fri, 7 Aug 1998 00:08:09 +0100 (BST) Cc: alan@lxorguk.ukuu.org.uk In-Reply-To: <35CA3753.F6A31801@mpegtv.com> from "Tristan Savatier" at Aug 6, 98 04:08:03 pm Content-Type: text

> Q1: Is it possible (i.e. legal) that select returns 0 > when given an infinite (NULL) timeout ?

Not as I understand it

> Q2: The man page just says that FD_ISSET should not be used > after an error i.e. select returns -1. Is it legal to use > FD_ISSET when select returns 0 ? (of course, if it is, then it > should return FALSE on all fd's).

Thats a very interesting question, and not one I know a definitive answer to.

> In the case we're talking about, select returns 0 > when given an infinite (NULL) timeout, and > FD_ISSET returns TRUE on one fd after > select returned 0.

Ok I'll look into how that could occur

--------------D1DAE88D4FB3F11309F7F034

Return-Path: <tristan> Received: by tristan (Smail3.1.29.1 #29) id m0z4qRB-000QXvC; Fri, 7 Aug 98 10:36 PDT Received: from waldorf.informatik.uni-dortmund.de (waldorf.informatik.uni-dortmund.de [129.217.4.42]) by paris.bok.net (8.8.7/8.8.7) with ESMTP id CAA14344 for <tristan@mpegtv.com>; Fri, 7 Aug 1998 02:43:37 -0700 Received: from issan.informatik.uni-dortmund.de (issan.informatik.uni-dortmund.de [129.217.27.163]) by waldorf.informatik.uni-dortmund.de with SMTP id LAA06540; Fri, 7 Aug 1998 11:45:57 +0200 (MES) Received: by issan.informatik.uni-dortmund.de id AA05025; Fri, 7 Aug 98 11:45:56 +0200 To: Andy Sloane <andude@guildsoftware.com> Cc: linux-kernel@vger.rutgers.edu, tristan@mpegtv.com Subject: Re: can select with infinite timeout return 0 References: <35C917EA.9543646A@mpegtv.com> <19980806134241.A17515@oxygen.dynamic-realities.com> X-Yow: I feel better about world problems now! From: Andreas Schwab <schwab@issan.informatik.uni-dortmund.de> Date: 07 Aug 1998 11:45:55 +0200 In-Reply-To: Andy Sloane's message of "Thu, 6 Aug 1998 13:42:41 -0500" Message-Id: <vyzemut15ws.fsf@issan.informatik.uni-dortmund.de> X-Mailer: Gnus v5.6.27/Emacs 19.34

Andy Sloane <andude@guildsoftware.com> writes:

|> On Wed, Aug 05, 1998 at 07:41:46PM -0700, Tristan Savatier wrote: |> |> > We observed that select sometimes returns 0 even though |> > a NULL pointer is passed for the timeout. |> > |> > According to the man page and all our books, select returns |> > 0 only if no fd is ready when the timeout expires. |> > Consequently if should never return 0 if a NULL pointer is |> > passed for the timeout (infinite timeout). |> |> select returns when either the timeout is reached, a filedescriptor |> becomes readable or writable, or _when a signal interrupts it_. You're |> using a multithreading package (pthreads, I'm assuming, although even |> linuxthreads uses signals to communicate between threads if I'm not |> mistaken) which delivers signals to other threads to force context |> switches. That's why it only happens during multithreading. What you can |> do is check errno for EINTR and repeat the select call if necessary.

But if select is interrupted it returns -1, not 0.

-- 
Andreas Schwab                                      "And now for something
schwab@issan.informatik.uni-dortmund.de              completely different"
schwab@gnu.org

--------------D1DAE88D4FB3F11309F7F034

Return-Path: <root> Received: by tristan (Smail3.1.29.1 #29) id m0z4xRP-000QXvC; Fri, 7 Aug 98 18:05 PDT Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by paris.bok.net (8.8.7/8.8.7) with ESMTP id SAA07949 for <tristan@mpegtv.com>; Fri, 7 Aug 1998 18:01:48 -0700 Received: from the-village.bc.nu (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id CAA30733 for <tristan@mpegtv.com>; Sat, 8 Aug 1998 02:04:12 +0100 Received: by the-village.bc.nu (Smail3.1.29.1 #2) id m0z4xNe-000aNFC; Sat, 8 Aug 98 02:01 BST Message-Id: <m0z4xNe-000aNFC@the-village.bc.nu> From: alan@lxorguk.ukuu.org.uk (Alan Cox) Subject: Re: can select with infinite timeout return 0 To: tristan@mpegtv.com (Tristan Savatier) Date: Sat, 8 Aug 1998 02:01:09 +0100 (BST) In-Reply-To: <35CBA337.43A7AA59@mpegtv.com> from "Tristan Savatier" at Aug 7, 98 06:00:39 pm Content-Type: text

> It is not really a surprise: it looks like some sort of > race condition, probably related to using threads. The case > can be reproduced with our MPEG Player, but only the first > time we run it after a reboot (and not even each time... it > seems to depend on the machine, the speed of the processor etc). > > Would you be interested to get the program that can cause > this situation to occur ?

Looking at the kernel code it cannot occur in kernel space. It appears to be a small bug in the pthreads library when one thread is in select and another gets a signal and handles it the select can return 0.

I suspect an strace will show the kernel select doesnt return 0

--------------D1DAE88D4FB3F11309F7F034

Return-Path: <glen.turner@adelaide.edu.au> Received: from jarrah.itd.adelaide.edu.au (jarrah.itd.adelaide.edu.au [129.127.134.1]) by paris.bok.net (8.8.7/8.8.7) with ESMTP id LAA04719 for <tristan@mpegtv.com>; Wed, 12 Aug 1998 11:38:17 -0700 Received: from adelaide.edu.au (jacaranda.itd.adelaide.edu.au [129.127.134.20]) by jarrah.itd.adelaide.edu.au (8.8.5/8.8.5/UofA-1.5) with ESMTP id EAA12836 for <tristan@mpegtv.com>; Thu, 13 Aug 1998 04:10:56 +0930 Sender: gturner@its.adelaide.edu.au Message-ID: <35D1E1B7.C078E547@adelaide.edu.au> Date: Thu, 13 Aug 1998 04:10:55 +0930 From: Glen Turner <glen.turner@adelaide.edu.au> Organization: IT Division, University of Adelaide, South Australia X-Mailer: Mozilla 4.05 [en] (X11; I; OSF1 V3.2 alpha) MIME-Version: 1.0 Newsgroups: comp.os.linux.development.system To: Tristan Savatier <tristan@mpegtv.com> Subject: Re: can select with infinite timeout return 0 References: <35C917EA.9543646A@revue.linux-kernel-mpegtv.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit

Tristan Savatier wrote: > We recently spent 2 days tracking a bug in our multithreaded > MPEG player and eventually found that it was caused by a > behavior of select that we think may be a bug.

> We observed that select sometimes returns 0 even though > a NULL pointer is passed for the timeout. > > According to the man page and all our books, select returns > 0 only if no fd is ready when the timeout expires. > Consequently if should never return 0 if a NULL pointer is > passed for the timeout (infinite timeout). > > My understanding is that with a NULL timeout, select should > never return 0. It should only return n > 0 (or -1 in case > of error or EINTR). > > Is my understanding correct ? If so, is that a bug in the > kernel ?

select() returning 0 is a well-documented UNIX design decision. This is explained on page 182 of "The design and implementation of the 4.2 BSD UNIX operating system".

Basically, if two processes are select()ing the same resource, both will be woken, but only one will have a non-zero return value.

This explains your note:

> Note: This situation only occurs in a multithreaded > environment, with at least two threads running.

This also explains:

> Another thing bothers me: when select returns 0, > one of the fd showed data ready (i.e. FD_ISSET(fd) > was TRUE), even though no data was immediately > available on that fd.

as the same fd value is given to both processes.

> IMHO, when select returns 0, FD_ISSET(fd) should be > 0 on all the fd in the set.

The 4.3BSD doco allows Linux's behaviour.

-- 
 Glen Turner                               Network Specialist
 Tel: (08) 8303 3936          Information Technology Services
 Fax: (08) 8303 4400         The University of Adelaide  5005
 Email: glen.turner@adelaide.edu.au           South Australia

--------------D1DAE88D4FB3F11309F7F034

Return-Path: <root> Received: by tristan (Smail3.1.29.1 #29) id m0z6jde-000QSxC; Wed, 12 Aug 98 15:45 PDT Received: from snowcrash.cymru.net (snowcrash.cymru.net [163.164.160.3]) by paris.bok.net (8.8.7/8.8.7) with ESMTP id PAA12048 for <tristan@mpegtv.com>; Wed, 12 Aug 1998 15:38:34 -0700 Received: from the-village.bc.nu (the-village.bc.nu [163.164.160.21]) by snowcrash.cymru.net (8.8.7/8.7.1) with SMTP id XAA21863; Wed, 12 Aug 1998 23:41:01 +0100 Received: by the-village.bc.nu (Smail3.1.29.1 #2) id m0z6kTH-000aNFC; Thu, 13 Aug 98 00:38 BST Message-Id: <m0z6kTH-000aNFC@the-village.bc.nu> From: alan@lxorguk.ukuu.org.uk (Alan Cox) Subject: Re: can select with infinite timeout return 0 To: tristan@mpegtv.com (Tristan Savatier) Date: Thu, 13 Aug 1998 00:38:23 +0100 (BST) Cc: glen.turner@adelaide.edu.au, alan@lxorguk.ukuu.org.uk, bok@bok.net In-Reply-To: <35D21989.E3EECF1@mpegtv.com> from "Tristan Savatier" at Aug 12, 98 03:39:05 pm Content-Type: text

> However this behavior is absolutely not documented in > the select man page.

Linux select does not have the behaviour your describe. Its a quirk (bug imho) of the threads library in user space. I also dont agree with the other poster on behaviour. Select on Linux and all other platforms I've ever met returns the ready indicators to _all_ the select() users as you'd expect.

Alan

--------------D1DAE88D4FB3F11309F7F034--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html