Re: EFI partition code broken..

From: Matt Domsch
Date: Tue Nov 09 2004 - 16:52:10 EST


On Sun, Nov 07, 2004 at 01:13:00PM -0800, Linus Torvalds wrote:
> On Sun, 7 Nov 2004, Matt Domsch wrote:
> >
> > The check for invalid primary master boot record (PMBR) needs to be
> > moved up ahead of the reads for the GPTs (primary at the start of the
> > disk, alternate/backup at the end of the disk). When first written,
> > Intel didn't specify that the PMBR (a normal DOS-like MBR partition
> > table with a single entry of type 0xEE covering the whole disk) *had*
> > to exist, so we let the code try looking for GPTs first. When the
> > PMBR test was added, it should have been added ahead of the GPT tests,
> > not after. I'll work up a patch to do just that, and will then remove
> > the dependency on IA64. Fair enough?
>
> Yes, sounds good. Also, if I understand you correctly, I would suggest
> actually taking the size of the disk from the PMBR 0xEE entry itself,
> rather than depend on what size the disk reports (perhaps double-check
> that it is sane, of course - the more careful the better).

While the PMBR 0xEE entry should cover the whole disk, I've got older
disks from Itanium1 Linux systems where it doesn't quite. Likewise,
Windows systems appear to put 0xFFFFFFFF there, regardless of drive
size. So that value can't be trusted.

> With people doing things like concatenating partitions with "md", the size
> of the disk itself is less important than what the partition table was set
> up with - even if the size reporting of the disk itself is reliable.
>
> For example, let's say that you create a EFI table on a RAID (striped or
> whatever), and that in turn then means that the first sub-disk used for
> the RAID will contain (as part of the RAID array) the EFI signature in its
> PMBR, we don't want to look at the GPT at the end of _that_ disk. See what
> I'm saying?

Yes.

There are two ways to create software RAID volumes.

a) partition disks, put partitions in the array, then partition /dev/md_d0.

This is handled properly already. The space projected to the OS as
the RAID volume does not include the underlying disks' partition table
sectors or last few sectors. I tested this by making an md_d0 raid0
disk set, then partitioning that with GPT using parted, and all works
swimmingly, /dev/md_d0p[123456] all appeared as expected. This is
the method that RAID autostart expects to find.

b) put whole disks in the array, then partition /dev/md_d0.

This isn't handled properly by existing code, because as you point
out, the MBR for the first disk and the MBR for the partitioned array
would occupy the same blocks. In such a case, the GPT code would
blindly set up the disk with partitions (which were too big to have
fit on the disk), and then md would have done the same (with
partitions that fit in the md). I've modified the GPT code to ignore
a partition table where the start or end blocks of the disk are
larger than the reported size of the disk.

There's a pathological case here, where you create a single-whole-disk
/dev/md_d0 device, then partition /dev/md_d0 into pieces. I don't
know how to solve this, as the partition table data would occupy the
same space, and be valid for both configurations, but you would want
it to ignore the table on the disk and only care about the table on
the md. I think this is just a case of "don't do that".

Patch to solve the immediate problem will follow in the next email.

Thanks,
Matt

--
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/