Re: Kernel-Messages translation

Keith Rohrer (kwrohrer@uiuc.edu)
Thu, 12 Jun 1997 08:20:28 -0500 (CDT)


Executive summary:
1) global string tables, loading only one at a time to avoid bloating
the binary (nobody seems to care about kernel source bloat)
2) code which doesn't explicitly do the table lookup will work the
same as always
3) use multiple tables/table sections to minimize message allocation
conflicts

> > Oh, no, not again.
> > Translation of kernel messages belongs, if anywhere, in userspace.
> It cannot be in userspace, e.g. 'klogd', because most of the problems
> happen while booting, so no klogd.
Choosing one of several sets of constant string literals shouldn't be
much more of an "un-kernel" thing than using constant string literals
scattered throughout the code, or including the verbose SCSI messages
into the kernel...

> > Putting it in the kernel would (a) bloat the kernel and (b) make it
> > harder to debug problems. (developers would have to know what
> > all sorts of translated messages meant for at least their subsystem,
> > rather than merely having to know the English messages.)
> a) Sure, but I don't give a sh*t on my 64MB machine :-)
So long as you only link with one set of messages, there's bloat in
the compiled kernel, just the sources get bigger, and nobody seems
to care a bit if the sources grow by megabytes...

> b) Right, but you can always disable the translation via a
> 'null'-translator. Still this won't fit for one-time-problems, but
> problems should be reproduceable, right?
If the coder doesn't want to use the tables and get his messages
translated, he can just prink a string literal as usual. It should
also be easy enough to grep the table source, and look that index
up in the base/English table, especially if the coding standard makes
the strings labeled with a comment as to their index number.

> > I've been toying around with this idea off and on, but haven't done
> [...]
> > First off, this seems like a very good idea, but IMHO, it wouldn't be
> > necessary to completely implement it. There are many such kernel
> > messages that really aren't expected to stay around (debugging and such)
> > and it would be a heck of a lot better for the author to not have to
> > worry about refrencing string tables and adding a new string to a list,
> > etc.
> Your are right, comments like 'this message should never appear' aren't
> subject to the first releases.
Just because a feature's there doesn't mean you have to use it. :-)

> > Secondly, this has a great potential to treat English messages in the
> > same manner as the foreign language ones and could end up as a
> > manor/minor rewrite in the means for outputting kernel messages. You
> > could probably use swappable lookup-tables that could be loaded like a
> > module and configured simularly. (I'm not 100% sure of the logistics
> > behind this, actually. In any case, it wouldn't be difcult to implement
> > your own method of swapping in and out string translation tables.)
> Maybe this ain't possible for all messages ('KERNEL-PANIC: swap failed').
I'd suggest just a few global symbols for the tables of strings (one table
for each subsystem to reduce the fight for new strings), rather than
some massive change to printk. I don't want to break anything that
already exists. If you want to print a message that's been translated:

printk("...%s...",...,intl_SUBSYS_messages[SYMBOLIC_CONSTANT],...);


> > I'm thinking something like changing the standard printk into something
> > like translated_printk(PK_TOKEN) where the token would refrence a string
> > in the table, but printk() could still be used when no entry in the
> > table is avalible or necessary. (And, heck, it could "fallback" on an
> > English string if the foreign language string table doesn't contain all
> > of the required strings.)
> My primary goal was as less interference with the current kernel as
> possible. I worked with indexed error messages on a project and it was
> quite ok, but then 'printk()' is similar to 'print()', so you can but a
> lot of code in it. Some kernel-messages contain strings that have to be
> translated, too, e.g.
> printk( "Mounting root device (%s)%s", whoknowsstring, bRO ? "read-only" :
> "" );
>
> For this I implemented a feature that feeds the 'sprintf()'-ed string
> again into the translator but remembering the original hashvalue to find
> it again.
Hmm...I'd think you'd want each intl_foo_strings[x] to have the same
number of %conversions in it, and "read-only" would be intl_foo_strings[y]...

> Token tables are harder to use and you can achieve the same goal with a
> tool that replaces the strings in the source-codes. You cannot apply any
> patches after that, but the kernel is tight! I wanted to implement the
> language-files as a database, but didn't know how to implement the feature
> mentioned above.
If you want to be that gross and non-reconfigurable, instead of using
arrays and indexes, just #define all the string constants and have the
former index as part of the constant name. No need to munge the actual
code, and it saves you from bugs which accidentally translate code or
comments...I don't believe C source code can contain even all ASCII
characters...

> > ps If you implemented the English strings in the same method as the
> > foreign strings, your '400K' loss wouldn't really exist.
> right, see above.
The problem I see is implementing the ability to switch languages via
insmod; having global char **'s the module setup routine could change
wouldn't be a biggie, but that would take up extra space until/unless
you somehow unloaded the original table...plus, rmmodding the "new"
message module should somehow restore the previous messages. But
that should just be a SMOP...

Keith