Re: [PATCH 1/1][NEW] crypto API: rsa algorithm module patch (kernelversion 2.6.20.3)

From: Randy Dunlap
Date: Thu Mar 22 2007 - 13:47:37 EST


On Thu, 22 Mar 2007 10:36:16 +0200 Tasos Parisinos wrote:

> > Hi,
> > Lots of good progress here, but still a few comments below.
> >
> > Needs to apply to current mainline.
> >
> >
> What do you mean by mainline?

Linus's current kernel tree, i.e., the latest git tree or
git snapshot preferably, or at least use the latest -rc if there
is one. IIRC, the patch was against 2.6.20.2 (or .3), but it does
not apply cleanly to the current mainline tree.

> > Most (probably all) of these functions should also be "static"
> > unless they are meant for use outside of this module.
> >
> > Which leads to the question: which functions are meant for
> > external/exported use?
> >
> > And it would be really Good if those function headers were
> > documented in kernel-doc notation (not much of a change from
> > what you already have here; see Documentation/kernel-doc-nano-HOWTO.txt).
> >
>
> I think that if someone wants to do modular exponentiation rsa-style
> (where n is always odd)
> using the montgomery algorithm i implement, only rsa_modexp should be static
>
> But the rest of the API can be used for multi-precision integer
> arithmetics, so they could almost all
> be exported (except rsa_load and rsa_unload). Do you think that this is
> namespace pollution?
> In that case i should make them all static except rsa_modexp

Sounds like they should all be static except for rsa_modexp().
If other functions need to be made non-static later, that's trivial
to do. And if loadable modules should be able to use rsa_modexp(),
then it needs one of these:

EXPORT_SYMBOL(rsa_modexp);
or
EXPORT_SYMBOL_GPL(rsa_modexp);


> About exportable comments, i just missed it, i will fix that

> >> = tmp = buf[i + j] + (abuf[j] * (u64)bbuf[i]) + (tmp >> 32);
> >>
> >
> > Break that line to < 80 columns, please.
> >
> >
> >> + tmp[j] = product = tmp[j] + (m * (u64)nbuf[j]) + (product >> 32);
> >>
> >
> > Line too long.
> >
> >
> >
> Well this is going to be a problem, because i believe that in this way
> gcc will create the best code. These internal product
> computations are essential and they get run many-many times. I will do
> my best to write it in less than 80 columns

You can split the line's source code without making it be multiple
C-language statements. I'm just talking about source line length here.
E.g., instead of (this one long line):

+ tmp[j] = product = tmp[j] + (m * (u64)nbuf[j]) + (product >> 32);

use:
tmp[j] = product = tmp[j] +
(m * (u64)nbuf[j]) + (product >> 32);

or:
product = tmp[j] + (m * (u64)nbuf[j]) + (product >> 32);
tmp[j] = product;


> Then i need to comment on some of the changes that where suggested on
> the previous patch but where not carried out
>
> First of all the
>
> UINT32_T_MAX
>
> This is nowhere in linux/kernel.h but there are some equivalents but they are
> defined as macros that do some computation to compute the max 32 bit unsigned integer
> and this would steal some (enough) clocks in loops that execute a lot. So i use the static value
> 0xFFFFFFFF which is more efficient.

Well, from a higher level, the code looks very 32-bit focused.
How well does it work on 64-bit CPUs?

> Secondly the chunk that initializes the mpi data
>
> /* Copy the data */
> for (i = size - 1, j = 0; i >= 0; i--, j++)
> buf[j / 4] |= ((u32)str[i] << ((j % 4) * 8));
>
> Ok this is not pretty, its rather cruel. What i want to do is have a byte array for example
>
> \xab\x11\xed\x43\x54\x34\x56\xcd
>
> and make the mpi having data[0] = 0x543456cd and data[1]=0xab11ed43
>
> I could reverse the byte array and then memcpy, what do you think?
> Is there a faster (and prettier) way to do this? someone said ntohl but im not sure

I don't see a good way to do that. I guess we need to set up a
foundation and have a coding contest. 8;)

Hm, won't swab32() help with that? and access the byte array
4 bytes at a time instead of 1 byte at a time?

Something like (not tested, with size converted to number of
u32's in the str[]):

for (i = size - 1, j = 0; i >= 0; i--, j++)
data[j] = swab32((u32)str[4 * i]);



> Also there are places that memset or memcpy could be used, but i can't figure out if it
> is more efficient to execute the set/copy loop my own or call such a function
> What is your opinion on that?

How time-critical is the code? Is it used often or just once when
loading a module? If it's not used frequently, I'd go for readability,
maintainability, and understandability.

> All the other changes suggested where carried out


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/