Re: Should we automatically generate a module signing key at all?
From: Andy Lutomirski
Date:  Tue May 19 2015 - 10:36:40 EST
On Tue, May 19, 2015 at 1:53 AM, David Howells <dhowells@xxxxxxxxxx> wrote:
> Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
>> I think we should get rid of the idea of automatically generated signing keys
>> entirely.  Instead I think we should generate, at build time, a list of all
>> the module hashes and link that into vmlinux.
>
> Just in Fedora 21:
>
> warthog>rpm -ql kernel-modules | grep [.]ko | wc -l
> 3604
> warthog>rpm -ql kernel-modules-extra | grep [.]ko | wc -l
> 480
>
> So that's >4000 modules, each signed with a SHA256 sum (32 bytes).  That's
> more than 125K of unswappable memory.  And it's uncompressible as Dave pointed
> out.  And that doesn't include any metadata to match a module to a digest, but
> rather assumes we just scan through the entire list comparing against each
> SHA256 sum until we find one that matches.
Let's go through the numbers.  There are two main things that matter,
I think: non-swappable memory and disk space.  For simplicity and
because it doesn't really matter, I'll ignore things like the
filesystem block size.
I'll assume that everyone uses a 256-bit hash.  (This is charitable to
the status quo, since hash size doesn't really matter for public-key
signatures, and the default is SHA-1.)  I'll further assume that there
are 4096 modules or so.
The current kernel uses 4096-bit RSA.  The kernel text needed for
verification seems to be around 21kB (9kB asymmetric_keys + 12kB MPI).
The public key is tiny, and the signature is 512 bytes per module.
(Actually, it's probably more because of PKCS garbage.  I'll ignore
that.)  This is a total of ~21kB of non-swappable storage and 2MB of
disk space for all the signatures.
If the goal were to optimize for size, the kernel should probably use
a much more compact signature scheme, probably some compressed EC
signature.  Ed25519 is 64 bytes per signature, which seems to be more
or less optimal.  That would reduce disk space used to 64 bytes per
module or 256kB for 4k modules.
With the hash-based scheme I outlined, the kernel text needed is
nearly zero.  The overhead in each .ko file is zero, and
module_hashes.ko is 32 bytes per module or 128kB for 4k modules.  It
wins the disk space competition hands down.  Naively, though, all of
that space is non-swappable.  Note that any sensible implementation
would sort the hash list, making hash checks very fast.
One improvement would be to unload module_hashes.ko when you're done
with it.  That's annoying.  A different approach would be to use a
hash tree.  For a basic binary hash tree, the root (module_hashes.ko,
for example) is a single signature, i.e. 32 bytes.  (For simplicity,
we'd store the number of hashes, too.  That would add a couple of
bytes.)  Each module needs log2(number of modules) - 1 hashes stored.
(There's no need for a module to store its own hash, and if the hashes
are sorted before the hash tree is generated, then the edge directions
are all implicit.)  For 4k modules, that's 11 hashes or 352 bytes per
module, for a total of 1408kB for 4k modules.  The kernel text
required is almost zero (while efficiently generating hash trees takes
some thought, verifying them is a very simple loop over the hash
function).  This already beats the status quo in terms of both
non-swappable memory and disk space.  It still loses to Ed25519 or
similar, though.
As David Woodhouse pointed out, if kmod were changed, most of the
overhead could go away.  kmod could generate the proof at module load
time.  That reduces the total overhead to just the list of hashes.
In summary, I think that the hash scheme does quite well for space
efficiency, although the comparison is a bit unfair because the
current code is unnecessarily inefficient.
>
>> Then, if anyone actually wants to use a public key to verify modules, they can
>> build the public key into a module as opposed to dragging all of the public
>> key crud into the main kernel image.
>
> A chunk of the 'public key crud' has to be in the kernel for other reasons
> (the integrity stuff, I think, which has to start before you load any modules)
> and the public key stuff is used for other things too (such as kexec and may
> well be used for firmware validation in future) - though that doesn't preclude
> it being modularised, it does mean that you are likely to load it anyway in
> future.
What integrity stuff?  IIRC dm-verity doesn't use asymmetric crypto at
all.  IMA probably does, though.
For firmware validation, there's no good reason it couldn't work
exactly like module signatures.  Alternatively, firmware validation
could still use loadable public key crypto.  (Again, it could be
unloaded after boot, which is currently impossible.)
For kexec, I think that the main use is for crash dumps, in which case
the hash of the crash kernel could be built in.  Alternatively, if the
crash kernel is identical to the original kernel, it would be
reasonably straightforward to arrange for the kernel to accept itself
as a valid kexec image.
>
>> We autogenerate module_hashes.ko
>
> This just makes things worse.  I suspect all distributions would have to load
> it anyway - and you don't really win as it will just make the initramfs bigger
> instead of the bzImage.
For initramfs use, the hash tree approach works quite well, since the
hash list doesn't need to live in the initramfs.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/