XML /proc first caveat

Clayton Weaver (cgweav@eskimo.com)
Thu, 29 Oct 1998 04:33:20 -0800 (PST)


The first hangup with this approach to portable, easily parseable /proc
files is probably going to be character-encoding. XML parsers for
multi-byte locales are going to expect UCS-4 (i.e. wchar_t in C terms)
characters, while a filesystem should deliver utf-8 (variable length, no
embedded '\0' bytes.

So conversion tools (for ps, etc) that want to use standard sgml/dsssl
parsers and translators like nsgmls and jade would likely need to parse
the document and convert utf-8 to wchar_t on the fly in multi-byte
locales. Probably not an issue in byte-per-character locales.

Another item: XML doesn't have any numeric data types, it only has
characters as the single data type during parsing. So (until the XML
schema proposal is a workable standard) you can't range check the
numeric data in an /proc XML file with XML validation techniques, verify
byte width for the type, or anything of that nature (an ASN.1 parser might
be able to do this, because of the association of numeric type names with
values). You can always do that in C before generating the specific
element values, of course.

Best way to proceed is probably to take one /proc file as the target,
try generating it both XML and ASN.1 encoded, and then try hack some
(currently ad-hoc) admin tool that uses /proc data to read it in those
formats, see which one is more hassle in practice.

Either encoding provides a standard data encoding that can be passed
around among developers to make sure that everyone is on the same page
with regard to /proc file formats, the question is which is less trouble
and less code to implement and get right in both single-byte and
multi-byte locales.

Looking at the MIBs in the rfcs, it doesn't look like ASN.1 objects have
the transparent extendability of, for example, html documents ("if you
don't have a defined content model for it then it isn't there"). When
you extend the information in an ASN.1-encoded /proc file, you would
probably have to distribute the new version to developers of other tools
or announce it on a mailing list (probably a good idea, proc-asn1@...
would get the redefinition out to other developers that use such
definitions). Then of course you have the return of distribution
versionitis, but we have that now. It can't be worse than having do the
same thing for ad-hoc /proc file encoding. With proper abstraction in the
tool, ...

Regards, Clayton Weaver <mailto:cgweav@eskimo.com> (Seattle)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/