[PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII

From: Mauro Carvalho Chehab
Date: Mon May 10 2021 - 06:35:42 EST


There are several UTF-8 characters at the Kernel's documentation.

Several of them were due to the process of converting files from
DocBook, LaTeX, HTML and Markdown. They were probably introduced
by the conversion tools used on that time.

Other UTF-8 characters were added along the time, but they're easily
replaceable by ASCII chars.

As Linux developers are all around the globe, and not everybody has UTF-8
as their default charset, better to use UTF-8 only on cases where it is really
needed.

The first 3 patches on this series were manually written, in order to solve
a few special cases.

The remaining patches on series address such cases on *.rst files and
inside the Documentation/ABI, using this perl map table in order to do the
charset conversion:

my %char_map = (
0x2010 => '-', # HYPHEN
0xad => '-', # SOFT HYPHEN
0x2013 => '-', # EN DASH
0x2014 => '-', # EM DASH

0x2018 => "'", # LEFT SINGLE QUOTATION MARK
0x2019 => "'", # RIGHT SINGLE QUOTATION MARK
0xb4 => "'", # ACUTE ACCENT

0x201c => '"', # LEFT DOUBLE QUOTATION MARK
0x201d => '"', # RIGHT DOUBLE QUOTATION MARK

0x2212 => '-', # MINUS SIGN
0x2217 => '*', # ASTERISK OPERATOR
0xd7 => 'x', # MULTIPLICATION SIGN

0xbb => '>', # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

0xa0 => ' ', # NO-BREAK SPACE
0xfeff => '', # ZERO WIDTH NO-BREAK SPACE
);

After the conversion, those UTF-8 chars will be kept:

- U+00a9 ('©'): COPYRIGHT SIGN
- U+00ac ('¬'): NOT SIGN # only at Documentation/powerpc/transactional_memory.rst
- U+00ae ('®'): REGISTERED SIGN
- U+00b0 ('°'): DEGREE SIGN
- U+00b1 ('±'): PLUS-MINUS SIGN
- U+00b2 ('²'): SUPERSCRIPT TWO
- U+00b5 ('µ'): MICRO SIGN
- U+00b7 ('·'): MIDDLE DOT # See below
- U+00bd ('½'): VULGAR FRACTION ONE HALF
- U+00c7 ('Ç'): LATIN CAPITAL LETTER C WITH CEDILLA
- U+00df ('ß'): LATIN SMALL LETTER SHARP S
- U+00e1 ('á'): LATIN SMALL LETTER A WITH ACUTE
- U+00e4 ('ä'): LATIN SMALL LETTER A WITH DIAERESIS
- U+00e6 ('æ'): LATIN SMALL LETTER AE
- U+00e7 ('ç'): LATIN SMALL LETTER C WITH CEDILLA
- U+00e9 ('é'): LATIN SMALL LETTER E WITH ACUTE
- U+00ea ('ê'): LATIN SMALL LETTER E WITH CIRCUMFLEX
- U+00eb ('ë'): LATIN SMALL LETTER E WITH DIAERESIS
- U+00f3 ('ó'): LATIN SMALL LETTER O WITH ACUTE
- U+00f4 ('ô'): LATIN SMALL LETTER O WITH CIRCUMFLEX
- U+00f6 ('ö'): LATIN SMALL LETTER O WITH DIAERESIS
- U+00f8 ('ø'): LATIN SMALL LETTER O WITH STROKE
- U+00fa ('ú'): LATIN SMALL LETTER U WITH ACUTE
- U+00fc ('ü'): LATIN SMALL LETTER U WITH DIAERESIS
- U+00fd ('ý'): LATIN SMALL LETTER Y WITH ACUTE
- U+011f ('ğ'): LATIN SMALL LETTER G WITH BREVE
- U+0142 ('ł'): LATIN SMALL LETTER L WITH STROKE
- U+03bc ('μ'): GREEK SMALL LETTER MU
- U+2026 ('…'): HORIZONTAL ELLIPSIS
- U+2122 ('™'): TRADE MARK SIGN
- U+2191 ('↑'): UPWARDS ARROW
- U+2192 ('→'): RIGHTWARDS ARROW
- U+2193 ('↓'): DOWNWARDS ARROW
- U+2264 ('≤'): LESS-THAN OR EQUAL TO
- U+2265 ('≥'): GREATER-THAN OR EQUAL TO
- U+2500 ('─'): BOX DRAWINGS LIGHT HORIZONTAL
- U+2502 ('│'): BOX DRAWINGS LIGHT VERTICAL
- U+2514 ('└'): BOX DRAWINGS LIGHT UP AND RIGHT
- U+251c ('├'): BOX DRAWINGS LIGHT VERTICAL AND RIGHT
- U+2b0d ('⬍'): UP DOWN BLACK ARROW

PS.: maintainers were bcc on patch 00/53, in order to reduce the
risk of patch 00 to be rejected by list servers.

-

For U+00b7 ('·'): MIDDLE DOT, I opted to keep it on a few places:

- Documentation/devicetree/bindings/clock/qcom,rpmcc.txt

As this file will be some day converted to yaml, where the
MIDDLE DOT will be removed, I guess it is not worth touching it.

- Documentation/scheduler/sched-deadline.rst

There, it is used on a math expressions. So, better to keep.

- Documentation/devicetree/bindings/media/video-interface-devices.yaml

There, it part of an ASCII artwork.

- translations/zh_CN

I prefer not touching it, as it might have some special meaning in Simplified Chinese.

Mauro Carvalho Chehab (53):
docs: cdrom-standard.rst: get rid of uneeded UTF-8 chars
docs: ABI: remove a meaningless UTF-8 character
docs: ABI: remove some spurious characters
docs: index.rst: avoid using UTF-8 chars
docs: hwmon: avoid using UTF-8 chars
docs: admin-guide: avoid using UTF-8 chars
docs: admin-guide: media: ipu3.rst: avoid using UTF-8 chars
docs: admin-guide: sysctl: kernel.rst: avoid using UTF-8 chars
docs: admin-guide: perf: imx-ddr.rst: avoid using UTF-8 chars
docs: admin-guide: pm: avoid using UTF-8 chars
docs: trace: coresight: coresight-etm4x-reference.rst: avoid using
UTF-8 chars
docs: driver-api: avoid using UTF-8 chars
docs: driver-api: fpga: avoid using UTF-8 chars
docs: driver-api: iio: avoid using UTF-8 chars
docs: driver-api: thermal: avoid using UTF-8 chars
docs: driver-api: media: drivers: avoid using UTF-8 chars
docs: driver-api: firmware: other_interfaces.rst: avoid using UTF-8
chars
docs: driver-api: nvdimm: btt.rst: avoid using UTF-8 chars
docs: fault-injection: nvme-fault-injection.rst: avoid using UTF-8
chars
docs: usb: avoid using UTF-8 chars
docs: process: avoid using UTF-8 chars
docs: block: data-integrity.rst: avoid using UTF-8 chars
docs: userspace-api: media: fdl-appendix.rst: avoid using UTF-8 chars
docs: userspace-api: media: v4l: avoid using UTF-8 chars
docs: userspace-api: media: dvb: avoid using UTF-8 chars
docs: vm: zswap.rst: avoid using UTF-8 chars
docs: filesystems: f2fs.rst: avoid using UTF-8 chars
docs: filesystems: ext4: avoid using UTF-8 chars
docs: kernel-hacking: avoid using UTF-8 chars
docs: hid: avoid using UTF-8 chars
docs: security: tpm: avoid using UTF-8 chars
docs: security: keys: trusted-encrypted.rst: avoid using UTF-8 chars
docs: riscv: vm-layout.rst: avoid using UTF-8 chars
docs: networking: scaling.rst: avoid using UTF-8 chars
docs: networking: devlink: devlink-dpipe.rst: avoid using UTF-8 chars
docs: networking: device_drivers: avoid using UTF-8 chars
docs: x86: avoid using UTF-8 chars
docs: scheduler: sched-deadline.rst: avoid using UTF-8 chars
docs: dev-tools: testing-overview.rst: avoid using UTF-8 chars
docs: power: powercap: powercap.rst: avoid using UTF-8 chars
docs: ABI: avoid using UTF-8 chars
docs: doc-guide: contributing.rst: avoid using UTF-8 chars
docs: PCI: acpi-info.rst: avoid using UTF-8 chars
docs: gpu: avoid using UTF-8 chars
docs: sound: kernel-api: writing-an-alsa-driver.rst: avoid using UTF-8
chars
docs: arm64: arm-acpi.rst: avoid using UTF-8 chars
docs: infiniband: tag_matching.rst: avoid using UTF-8 chars
docs: timers: no_hz.rst: avoid using UTF-8 chars
docs: misc-devices: ibmvmc.rst: avoid using UTF-8 chars
docs: firmware-guide: acpi: lpit.rst: avoid using UTF-8 chars
docs: firmware-guide: acpi: dsd: graph.rst: avoid using UTF-8 chars
docs: virt: kvm: avoid using UTF-8 chars
docs: RCU: avoid using UTF-8 chars

.../obsolete/sysfs-kernel-fadump_registered | 2 +-
.../obsolete/sysfs-kernel-fadump_release_mem | 2 +-
...sfs-class-chromeos-driver-cros-ec-lightbar | 2 +-
.../ABI/testing/sysfs-class-net-cdc_ncm | 2 +-
.../ABI/testing/sysfs-devices-platform-ipmi | 2 +-
.../testing/sysfs-devices-platform-trackpoint | 2 +-
Documentation/ABI/testing/sysfs-devices-soc | 4 +-
Documentation/ABI/testing/sysfs-module | 4 +-
Documentation/PCI/acpi-info.rst | 26 +-
.../Data-Structures/Data-Structures.rst | 52 ++--
.../Expedited-Grace-Periods.rst | 40 +--
.../Tree-RCU-Memory-Ordering.rst | 10 +-
.../RCU/Design/Requirements/Requirements.rst | 126 ++++-----
Documentation/admin-guide/index.rst | 2 +-
Documentation/admin-guide/media/ipu3.rst | 2 +-
Documentation/admin-guide/module-signing.rst | 4 +-
Documentation/admin-guide/perf/imx-ddr.rst | 2 +-
Documentation/admin-guide/pm/intel_idle.rst | 4 +-
Documentation/admin-guide/pm/intel_pstate.rst | 4 +-
Documentation/admin-guide/ras.rst | 94 +++----
.../admin-guide/reporting-issues.rst | 12 +-
Documentation/admin-guide/sysctl/kernel.rst | 2 +-
Documentation/arm64/arm-acpi.rst | 8 +-
Documentation/block/data-integrity.rst | 2 +-
Documentation/cdrom/cdrom-standard.rst | 30 +--
Documentation/dev-tools/testing-overview.rst | 4 +-
Documentation/doc-guide/contributing.rst | 2 +-
.../driver-api/firmware/other_interfaces.rst | 2 +-
Documentation/driver-api/fpga/fpga-bridge.rst | 10 +-
Documentation/driver-api/fpga/fpga-mgr.rst | 12 +-
.../driver-api/fpga/fpga-programming.rst | 8 +-
Documentation/driver-api/fpga/fpga-region.rst | 20 +-
Documentation/driver-api/iio/buffers.rst | 8 +-
Documentation/driver-api/iio/hw-consumer.rst | 10 +-
.../driver-api/iio/triggered-buffers.rst | 6 +-
Documentation/driver-api/iio/triggers.rst | 10 +-
Documentation/driver-api/index.rst | 2 +-
Documentation/driver-api/ioctl.rst | 8 +-
.../media/drivers/sh_mobile_ceu_camera.rst | 8 +-
.../driver-api/media/drivers/vidtv.rst | 4 +-
.../driver-api/media/drivers/zoran.rst | 2 +-
Documentation/driver-api/nvdimm/btt.rst | 2 +-
.../driver-api/thermal/cpu-idle-cooling.rst | 14 +-
.../driver-api/thermal/intel_powerclamp.rst | 6 +-
.../thermal/x86_pkg_temperature_thermal.rst | 2 +-
.../fault-injection/nvme-fault-injection.rst | 2 +-
Documentation/filesystems/ext4/attributes.rst | 20 +-
Documentation/filesystems/ext4/bigalloc.rst | 6 +-
Documentation/filesystems/ext4/blockgroup.rst | 8 +-
Documentation/filesystems/ext4/blocks.rst | 2 +-
Documentation/filesystems/ext4/directory.rst | 16 +-
Documentation/filesystems/ext4/eainode.rst | 2 +-
Documentation/filesystems/ext4/inlinedata.rst | 6 +-
Documentation/filesystems/ext4/inodes.rst | 6 +-
Documentation/filesystems/ext4/journal.rst | 8 +-
Documentation/filesystems/ext4/mmp.rst | 2 +-
.../filesystems/ext4/special_inodes.rst | 4 +-
Documentation/filesystems/ext4/super.rst | 10 +-
Documentation/filesystems/f2fs.rst | 6 +-
.../firmware-guide/acpi/dsd/graph.rst | 2 +-
Documentation/firmware-guide/acpi/lpit.rst | 2 +-
Documentation/gpu/i915.rst | 2 +-
Documentation/gpu/komeda-kms.rst | 2 +-
Documentation/hid/hid-sensor.rst | 70 ++---
Documentation/hid/intel-ish-hid.rst | 246 +++++++++---------
Documentation/hwmon/ir36021.rst | 2 +-
Documentation/hwmon/ltc2992.rst | 2 +-
Documentation/hwmon/pm6764tr.rst | 2 +-
Documentation/hwmon/tmp103.rst | 4 +-
Documentation/index.rst | 4 +-
Documentation/infiniband/tag_matching.rst | 8 +-
Documentation/kernel-hacking/hacking.rst | 2 +-
Documentation/kernel-hacking/locking.rst | 2 +-
Documentation/misc-devices/ibmvmc.rst | 8 +-
.../device_drivers/ethernet/intel/i40e.rst | 12 +-
.../device_drivers/ethernet/intel/iavf.rst | 6 +-
.../device_drivers/ethernet/netronome/nfp.rst | 12 +-
.../networking/devlink/devlink-dpipe.rst | 2 +-
Documentation/networking/scaling.rst | 18 +-
Documentation/power/powercap/powercap.rst | 210 +++++++--------
Documentation/process/code-of-conduct.rst | 2 +-
.../process/kernel-enforcement-statement.rst | 2 +-
Documentation/riscv/vm-layout.rst | 2 +-
Documentation/scheduler/sched-deadline.rst | 4 +-
.../security/keys/trusted-encrypted.rst | 4 +-
Documentation/security/tpm/tpm_event_log.rst | 2 +-
Documentation/security/tpm/xen-tpmfront.rst | 2 +-
.../kernel-api/writing-an-alsa-driver.rst | 68 ++---
Documentation/timers/no_hz.rst | 2 +-
.../coresight/coresight-etm4x-reference.rst | 16 +-
Documentation/usb/ehci.rst | 2 +-
Documentation/usb/gadget_printer.rst | 2 +-
Documentation/usb/mass-storage.rst | 36 +--
Documentation/usb/mtouchusb.rst | 2 +-
Documentation/usb/usb-serial.rst | 2 +-
.../media/dvb/audio-set-bypass-mode.rst | 2 +-
.../userspace-api/media/dvb/audio.rst | 2 +-
.../userspace-api/media/dvb/dmx-fopen.rst | 2 +-
.../userspace-api/media/dvb/dmx-fread.rst | 2 +-
.../media/dvb/dmx-set-filter.rst | 2 +-
.../userspace-api/media/dvb/intro.rst | 6 +-
.../userspace-api/media/dvb/video.rst | 2 +-
.../userspace-api/media/fdl-appendix.rst | 64 ++---
.../userspace-api/media/v4l/biblio.rst | 8 +-
.../userspace-api/media/v4l/crop.rst | 16 +-
.../userspace-api/media/v4l/dev-decoder.rst | 6 +-
.../userspace-api/media/v4l/diff-v4l.rst | 2 +-
.../userspace-api/media/v4l/open.rst | 2 +-
.../media/v4l/vidioc-cropcap.rst | 4 +-
Documentation/virt/kvm/api.rst | 28 +-
.../virt/kvm/running-nested-guests.rst | 12 +-
Documentation/vm/zswap.rst | 4 +-
Documentation/x86/resctrl.rst | 2 +-
Documentation/x86/sgx.rst | 4 +-
114 files changed, 807 insertions(+), 807 deletions(-)

--
2.30.2