Re: UTF-8 and Alt key in the console

From: H. Peter Anvin
Date: Tue Apr 01 2008 - 20:39:10 EST


David Newall wrote:
Jan Engelhardt wrote:
Hence the proposal of using definite start and end markers:

echo -e '\x1B43m\x1D wonderful \x1B0m\x1D' | cosmicrays | cat

I see no merit in the idea. Most seriously, there isn't any real-world
problem being solved. In addition, it proposes creating yet another
type of terminal emulation. If there's something you don't like about
VT escape codes, use a different emulation. For example, Televideo
terminals used almost exclusively single-character control codes,
reducing the scope of being mid-sequence to, well much closer to zero.

You need to make quite clear that your proposal is to discontinue use of
VT terminal emulation.

Okay, let's put this to rest once and for all:

*** ISO 6429 sequences are self-terminating. ***

No, you can't tell you're inside one if you miss the leading CSI, but as has been pointed out, there really isn't a huge case for it.

The standard is available for free under the name ECMA-48:
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-048.pdf

It references ISO 2022, a.k.a. ECMA-35:
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-035.pdf


These standards use a decimalized hexadecimal notation, so if you see "05/10" it means 0x5a. A "column" refers to a 16-character set, so "column 4" refers to bytes 0x40 to 0x4f.


The structure defined in section 5.4 of ISO 6429/ECMA-48:

-----------
5.4 Control sequences
A control sequence is a string of bit combinations starting with the control function CONTROL SEQUENCE INTRODUCER (CSI) followed by one or more bit combinations representing parameters, if any, and by one or more bit combinations identifying the control function. The control function CSI itself is an element of the C1 set.
The format of a control sequence is
CSI P ... P I ... I F
where
a) CSI is represented by bit combinations 01/11 (representing ESC) and 05/11 in a 7-bit code or by bit combination 09/11 in an 8-bit code, see 5.3;
b) P ... P are Parameter Bytes, which, if present, consist of bit combinations from 03/00 to 03/15;
c) I ... I are Intermediate Bytes, which, if present, consist of bit combinations from 02/00 to 02/15. Together with the Final Byte F, they identify the control function;
NOTE The number of Intermediate Bytes is not limited by this Standard; in practice, one Intermediate Byte will be sufficient since with sixteen different bit combinations available for the Intermediate Byte over one thousand control functions may be identified.
d) F is the Final Byte; it consists of a bit combination from 04/00 to 07/14; it terminates the control sequence and together with the Intermediate Bytes, if present, identifies the control function. Bit combinations 07/00 to 07/14 are available as Final Bytes of control sequences for private (or experimental) use.
-----------

Note: DEC added nonstandard control sequences initiated with SS3 (ESC O) as well as CSI (ESC [); otherwise they use the same format.

The Final Byte is easy enough to spot, as writing a generic parser which can pick this apart, including parameter handling.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/