Re: Mis-Design of Btrfs?

From: Erik Jensen
Date: Thu Jul 14 2011 - 16:50:58 EST


On Thu, Jul 14, 2011 at 12:50 PM, John Stoffel <john@xxxxxxxxxxx> wrote:
>>>>>> "Alasdair" == Alasdair G Kergon <agk@xxxxxxxxxx> writes:
>
> Alasdair> On Thu, Jul 14, 2011 at 04:38:36PM +1000, Neil Brown wrote:
>>> It might make sense for a device to be able to report what the maximum
>>> 'N' supported is... that might make stacked raid easier to manage...
>
> Alasdair> I'll just say that any solution ought to be stackable.
>
> I've been mulling this over too and wondering how you'd handle this,
> because upper layers really can't peak down into lower layers easily.
> As far as I understand things.
>
> So if you have btrfs -> luks -> raid1 -> raid6 -> nbd -> remote disks
>
> How does btrfs handle errors (or does it even see them!) from the
> raid6 level when a single nbd device goes away? Or taking the
> original example, when btrfs notices a checksum isn't correct, how
> would it push down multiple levels to try and find the correct data?
>
> Alasdair> This means understanding both that the number of data access
> Alasdair> routes may vary as you move through the stack, and that this
> Alasdair> number may depend on the offset within the device.
>
> It almost seems to me that the retry needs to be done at each level on
> it's own, without pushing down or up the stack. But this doesn't
> handle the wrong file checksum issue.
>
> Hmm... maybe instead of just one number, we need another to count the
> levels down you go (or just split 16bit integer in half, bottom half
> being count of tries, the upper half being levels down to try that
> read?)
>
> It seems to defeat the purpose of layers if you can go down and find
> out how many layers there are underneath you....
>
> John

A random thought: What if we allow the number to wrap at each level,
and, each time it wraps, increment the number passed to the next lower
level.

A zero would propagate down, letting each level do what it wants:
luks: 0
raid1: 0
raid6: 0
nbd: 0

And higher numbers would indicate the method at each level:

For a 1:
luks: 1
raid1: 1
raid6: 1
nbd: 1

For a 3:
luks: 1 (only one possibility, passes three down)
raid1: 1 (two possibilities, so we wrap back to one and pass two down,
since we wrapped once)
raid6: 2 (not wrapped)
nbd: 1

When the bottom-most level gets an N that it can't handle, it would
return EINVAL, which would be propagated up the stack.

This would allow the same algorithm of incrementing N until we receive
good data or EINVAL, and would exhaust all ways of reading the data at
all levels.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/