Re: Failover root devices

From: Austin S Hemmelgarn
Date: Fri Sep 18 2015 - 10:34:29 EST


On 2015-09-17 13:30, Drew DeVault wrote:
That said, using the term failover for this is probably not the best
idea, many people associate it almost exclusively with online failover
and high-availability setups, and trying to do something like that with
the root file system is just asking for trouble (I'll be happy to go
into specifics as to why if someone asks).

Do you have a suggestion for another name for this feature? Maybe we can
just call it "multiple root devices". The issue comes with the
associated command line options, like "rootfailoverdelay". Perhaps it
could be called "rootcycledelay". "rootdelay" is the obvious one, but
it's taken for another feature.
Possibly 'multirootdelay'?

However, is there any case you can think of for wanting the values to be different between rootdelay and the per-device scan delay other than having the per-device scan delay be 0 and rootdelay be >0?

The way I'd probably write it would be:
1. Wait rootdelay seconds
2. Check for 1st device
3. If first device is not there, check for 2nd
4. If second device is not there, check next one
5. Repeat 4 until all devices are checked.
6. If a device wasn't found, check if we were told to loop until one is found, and if so, start at 1 again.
And then add an option to tell it to wait 'rootdelay' seconds between checking each device.

1. Wait rootdelay seconds
2. Check 1st device, not present
3. Recheck 1st device until rootfailoverdelay seconds has passed
4. Move on to 2nd device, present -> boot

Or:

1. Wait rootdelay seconds
2. Check 1st device, not present
3. Recheck 1st device until rootfailoverdelay seconds has passed
4. Move on to 2nd device, not present
5. Recheck 2st device until rootfailoverdelay seconds has passed
6. GOTO 2

And so on.
As for this, I'd say default to the first method, and then provide an
option to switch to the second (both have practical uses).

Sorry to cause confusion - these are actually the same method, but
handling different scenarios. The first is dealing with the first device
being nonexistent, and the second device existing. The second is dealing
with both being nonexistent, and cycling between them until one of them
shows up. After further thought, though, I think the best solution is a
bit different: a new command line option called "rootmultiwait" or
similar, which is a maximum amount of time to wait for the user's first
choice of root device to become available, then testing all devices
until that time runs out or the first choice becomes available.
I think there's value in being able to tell it to go through each one exactly once, and halt like it does now if it can't find the filesystem on any of them. That should probably be the default behavior in fact, as it's more similar to what's done now.

Secondarily, I've been thinking more about this, and I think it would be wonderful to have such functionality in the nfsroot code as well (and for that matter, also in any other built-in networked root filesystem support).

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature