Re: [PATCH v2 0/4] have the vt console preserve unicode characters

From: Nicolas Pitre
Date: Tue Jun 19 2018 - 11:34:43 EST


On Tue, 19 Jun 2018, Adam Borowski wrote:

> On Sun, Jun 17, 2018 at 03:07:02PM -0400, Nicolas Pitre wrote:
> > The vt code translates UTF-8 strings into glyph index values and stores
> > those glyph values directly in the screen buffer. Because there can only
> > be at most 512 glyphs, it is impossible to represent most unicode
> > characters, in which case a default glyph (often '?') is displayed
> > instead. The original unicode value is then lost.
> >
> > The 512-glyph limitation is inherent to VGA displays, but users of
> > /dev/vcs* shouldn't have to be restricted to a narrow unicode space from
> > lossy screen content because of that. This is especially true for
> > accessibility applications such as BRLTTY that rely on /dev/vcs to rander
> > screen content onto braille terminals.
>
> You're thinking small. That 256 possible values for Braille are easily
> encodable within the 512-glyph space (256 char + stolen fg brightness bit,
> another CGA peculiarity).

Braille is not just about 256 possible patterns. It is often the case
that a single print character is transcoded into a sequence of braille
characters given that there is more than 256 possible print characters.
And there are different transcoding rules for different languages, and
even different rules across different countries with the same language.
This may get complicated very quickly and you really don't want that
processing to live in the kernel.

The point is not to have a font that displays braille but to let user
space access the actual unicode character that corresponds to a given
screen position.

> Your patchset, though, can be used for proper
> Unicode support for the rest of us.

Absolutely. I think it is generic enough so that display drivers that
would benefit from it may do so already. My patchset introduces one
user: vc_screen. The selection code could be yet another easy convert.
Beyond that it is a matter of extending the kernel interface for larger
font definitions, etc. But being sight impaired myself I won't play with
actual display driver code.

> The 256/512 value limitation applies only to CGA-compatible hardware; these
> days this means vgacon. But most people use other drivers. Nouveau forces
> graphical console, on arm* there's no such thing as VGA[1], etc.

I do agree with you.

> Thus, it'd be nice to use the structure you add to implement full Unicode
> range for the vast majority of people. This includes even U+2800..FF. :)

Be my guest if you want to use this structure. As for U+2800..FF, like I
said earlier, this is not what most people use when communicating, so it
is of little interest even to blind users except for displaying native
braille documents, or showing off. ;-)

> > This patch series introduces unicode support to /dev/vcs* devices,
> > allowing full unicode access from userspace to the vt console which
> > can, amongst other purposes, appropriately translate actual unicode
> > screen content into braille. Memory is allocated, and possible CPU
> > overhead introduced, only if /dev/vcsu is read at least once.
>
> What about doing so if any updated console driver is loaded? Possibly, once
> the vt in question has been switched to (>99% people never see anything but
> tty1 during boot-up, all others showing nothing but getty). Or perhaps the
> moment any non-ASCII character is output to the given vt.

Right now it is activated only when an actual user manifests itself. I
think this is the right thing to do. If an updated console driver is
loaded then it will activate unicode handling right away as you say.

> If memory usage is a concern, it's possible to drop the old structure and
> convert back only in the rare case the driver is unloaded; reads of old-
> style /dev/vc{s,sa}\d* are not speed-critical thus can use conversion on the
> fuly. Unicode takes only 21 bits out of 32 you allocate, that's plenty of
> space for attributes: they currently take 8 bits; naive way gives us free 3
> bits that could be used for additional attributes.

If the core console code makes the switch to full unicode then yes, that
would be the way to go to maintain backward compatibility. However
vgacon users would see a performance drop when switching between VT's
and we used to brag about how fast the Linux console used to be 20 years
ago. Does it still matter today?

> > I'm a prime user of this feature, as well as the BRLTTY maintainer Dave Mielke
> > who implemented support for this in BRLTTY. There is therefore a vested
> > interest in maintaining this feature as necessary. And this received
> > extensive testing as well at this point.
>
> So, you care only about people with faulty wetware. Thus, it sounds like
> work that benefits sighted people would need to be done by people other than
> you.

Hard for me to contribute more if I can't enjoy the result.

> So I'm only mentioning possible changes; they could possibly go after
> your patchset goes in:
>
> A) if memory is considered to be at premium, what about storing only one
> 32-bit value, masked 21 bits char 11 bits attr? On non-vgacon, there's
> no reason to keep the old structures.

Absolutely. As soon as vgacon is officially relegated to second class
citizen i.e. perform the glyph translation each time it requires
a refresh instead of dictating how the core console code works then the
central glyph buffer can go.

> B) if being this frugal wrt memory is ridiculous today, what about instead
> going for 32 bits char (wasteful) 32 bits attr? This would be much nicer
> 15 bit fg color + 15 bit bg color + underline + CJK or something.
> You already triple memory use; variant A) above would reduce that to 2x,
> variant B) to 4x.
>
> Considering that modern machines can draw complex scenes of several
> megapixels 60 times a second, it could be reasonable to drop the complexity
> of two structures even on vgacon: converting characters on the fly during vt
> switch is beyond notice on any hardware Linux can run.

You certainly won't find any objections from me.

In the mean time, both systems may work in parallel for a smooth
transition.


Nicolas