Re: kerneld/request-route

Eric Schenk (schenk@cs.toronto.edu)
Wed, 3 Jul 1996 18:18:05 -0400


"Andrew E. Mileski" <aem@nic.ott.hookup.net> writes:
>I use ip-up and ip-down to create/destroy files with all the PPP
>link info. This allows me to monitor the link with the same command
>regardless of the actual device. Dialing is handled by kerneld
>and a script, re-dialing is handled similarly.

The main reason I don't do this is because I don't want to
have to give instructions to people on how to change their ip-up and
ip-down scripts to make the demand dialer work. I tried that back
when I wrote the first version of diald.

The secondary reason is that I wanted more control over the connection
policy than I could get by just running the users startup script.
In the end I decided to have diald control the selection of the
modem line and run the connections script. It just calls pppd to
deal with the protocol once the connection has been made.

For example, diald allows you to specify a bank of modem devices
to use and will start each dialing cycle at a different device in the
bank to avoid getting hung up on a modem that is fried.
Yes, all this can be done in user scripts, but putting it all in
one place makes it a lot easier to just configure things once and go.

Anyway, from a technical point of view the ip-up/ip-down approach works
OK, although you have to do a bit of hand-standing to deal with
multiple simultaneous demand dialed links. Also, there is one
lurking little nasty you need to watch out for that makes it
necessary to determine the process id of the pppd process that
is controling a link (you should be able to get this by checking
the parent process id in the ip-up script).

[Aside 1: If you haven't already seen this problem you probably never will,
but it happens once a session for some people. Drove me nuts until
I figured out what was going on.]

[Aside 2: since diald starts the pppd process it just needs to watch
for a single to notice that pppd died. No need to check the existance
of a process every 15 seconds or minute or whatever.]

A packet hits the kerneld routing trap, kerneld kicks your script.
Your script in turn runs some stuff to start up pppd, which in
turn runs ip-up when the link comes up. You now start monitoring
the link in whatever way you want to do that.

Now, something goes wrong on the far end and pppd decides that it
needs to renegotiate the ip layer. It takes the ip layer down, deletes all
the routes, runs ip-down, and proceeds to renegotiate the ip layer.
Your monitoring code cannot now assume that pppd has died. It hasn't,
in fact it will bring the ip layer back up in about 30 seconds.
So, how do you tell the difference between a real death of pppd,
and this temprorary death? You need to watch to see if the pppd
process itself has died. Also, make sure you get the right pppd
process in systems where there are multiple pppd processes running.

Finally, during one of these renegotiation windows any packets that
hit the kerneld routing trap will cause /bin/request-route to be run,
so your /bin/request-route better be able to handle this correctly.
It must not return until the link has been restablished
by the still existing pppd process, since to do otherwise will
open up a window during which the link appears to be down.

All this ignores the fact that the way kerneld is currently doing
the routing trap it causes a sleep in the kernel where the routing
code assume there will never be one. This is a crash just waiting
to happen.

This is not to say that diald is perfect. In fact there is one
thing that really annoys me about the current design, and that
is the fact that diald just does not get along with gated.
The reason for this is that diald keeps two interfaces up with
the same IP address. Gated complains about this once a minute or so.
Now, I could have diald shut down the extra interface, but then
I open up a window where the network appears down. Since one
of the major goals of diald is to make it appear as though your
connection is never down I'm somewhat loath to do this.
If anyone can suggest a fix to this problem I'd love to hear it.

(Of course there are other things I don't like about the current
design, but this is getting pretty long, and this is going off
topic for this list. If people really want to discuss this further
I suggest we either take it over to linux-diald or stick to private email.

Cheers,

-- eric

---------------------------------------------------------------------------
Eric Schenk www: http://www.cs.toronto.edu/~schenk
Department of Computer Science email: schenk@cs.toronto.edu
University of Toronto