Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages

From: Don Zickus
Date: Tue Aug 08 2006 - 15:32:13 EST


On Mon, Aug 07, 2006 at 11:01:53PM -0600, Eric W. Biederman wrote:
> Don Zickus <dzickus@xxxxxxxxxx> writes:
>
> >> >> >
> >> >> >>
> >> >> >> The error is the uncompressed length does not math the stored length
> >> >> >> of the data before from before we compressed it. Now what is
> >> >> >> fascinating is that our crc's match (as that check is performed first).
> >> >> >>
> >> >> >> Something is very slightly off and I don't see what it is.
> >> >> >
> >> >> > I printed out orig_len -> 5910532 (which matches vmlinux.bin)
> >> >> > bytes_out -> 5910531
> >> >> >
> >> >> >>
> >> > It seems to be an AMD64 vs EM64T problem. AMD chipsets work but Intel
> >> > chipsets don't.
> >> >
> >> > I also blindly incremented bytes_out (as a really cheap hack), it didn't
> >> > work until I added some random putstr's below it (timing??). Then the
> >> > kernel booted.
> >> >
> >> > Still looking into things.
> >>
> >> Odd. I wonder if I'm missing a serializing instruction somewhere,
> >> to ensure the effects of ``self modifying code'' aren't a problem.
> >> As I read Intels Documentation if you have a jump before you get
> >> to the code there shouldn't be a problem.
> >>
> >> Still that doesn't really explain bytes_out.
> >>
> >
> > So I narrowed down the problem but it isn't obvious to me why this problem
> > exists. Basically, even though bytes_out is supposed to be initialized to
> > 0, it becomes -1 before entering decompress_kernel(). Of course, the
> > fallout is in flush_window() bytes_out wounds up being one less than
> > outcnt and hence my original problem.
> >
> > Any thoughts on how to debug where this could be getting corrupted?
>
> Looking at my build it appears bytes_out is being placed in the .bss.
> A little odd since it is zero initialized but no big deal.
> Could you confirm that bytes_out is being placed in the .bss section
> by inspecting arch/x86_64/boot/compresssed/misc.o and
> arch/x86_64/boot_compressed/vmlinux. "readelf -a $file" and then
> looking up the section number and looking at the section table to see
> which section it is was my technique.

Yes bytes_out is in the .bss for both files.

>
> If bytes_out is in the .bss for you then I suspect something is not
> correctly zeroing the .bss. Or else the .bss is being stomped.
>
> I'm not certain how rep stosb can be done wrong but some bad pointer
> math could have done it.

Even worse, from the time the .bss is cleared to the time gunzip() is
called inside decompress_kernel(), there is very little code to do some
stomping.

So I am stuck trying to debug this. This code seems very fragile. The
more debug code I add (ie putstr) the more the length is off (varies from
-32 to +1). Makes me scratch my head as to what is really going on here.

I created a really pathetic patch to get the thing to boot but even that
doesn't make sense.


diff --git a/arch/x86_64/boot/compressed/misc.c b/arch/x86_64/boot/compressed/misc.c
index 0e6c4b7..614416e 100644
--- a/arch/x86_64/boot/compressed/misc.c
+++ b/arch/x86_64/boot/compressed/misc.c
@@ -183,6 +183,7 @@ #define OLD_CL_MAGIC 0xA33F
extern unsigned char input_data[];
extern int input_len;

+static long dummy;
static long bytes_out = 0;

static void *malloc(int size);
@@ -594,6 +595,7 @@ asmlinkage void decompress_kernel(void *
if ((ulg)output >= 0xffffffffffUL)
error("Destination address too large");

+ bytes_out = 0;
makecrc();
putstr(".\nDecompressing Linux...");
gunzip();

And yes, the 'dummy' variable needs to be there.
I am trying to use gdb on vmlinux to fish for clues. But I am at a loss
right now.

Cheers,
Don

>
> Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/