Re: [PATCH v7 4/4] fs: unicode: Add utf8 module and a unicode layer

From: Gabriel Krisman Bertazi
Date: Thu Apr 08 2021 - 15:10:26 EST


Shreeya Patel <shreeya.patel@xxxxxxxxxxxxx> writes:

> utf8data.h_shipped has a large database table which is an auto-generated
> decodification trie for the unicode normalization functions.
> It is not necessary to load this large table in the kernel if no
> filesystem is using it, hence make UTF-8 encoding loadable by converting
> it into a module.
>
> Modify the file called unicode-core which will act as a layer for
> unicode subsystem. It will load the UTF-8 module and access it's functions
> whenever any filesystem that needs unicode is mounted.
> Currently, only UTF-8 encoding is supported but if any other encodings
> are supported in future then the layer file would be responsible for
> loading the desired encoding module.
>
> Also, indirect calls using function pointers are slow, use static calls to
> avoid overhead caused in case of repeated indirect calls. Static calls
> improves the performance by directly calling the functions as opposed to
> indirect calls.
>
> Signed-off-by: Shreeya Patel <shreeya.patel@xxxxxxxxxxxxx>
> ---
> Changes in v7
> - Update the help text in Kconfig
> - Handle the unicode_load_static_call function failure by decrementing
> the reference.
> - Correct the code for handling built-in utf8 option as well.
> - Correct the synchronization for accessing utf8mod.
> - Make changes to unicode_unload() for handling the situation where
> utf8mod != NULL and um == NULL.
>
> Changes in v6
> - Add spinlock to protect utf8mod and avoid NULL pointer
> dereference.
> - Change the static call function names for being consistent with
> kernel coding style.
> - Merge the unicode_load_module function with unicode_load as it is
> not really needed to have a separate function.
> - Use try_then_module_get instead of module_get to avoid loading the
> module even when it is already loaded.
> - Improve the commit message.
>
> Changes in v5
> - Rename global variables and default static call functions for better
> understanding
> - Make only config UNICODE_UTF8 visible and config UNICODE to be always
> enabled provided UNICODE_UTF8 is enabled.
> - Improve the documentation for Kconfig
> - Improve the commit message.
>
> Changes in v4
> - Return error from the static calls instead of doing nothing and
> succeeding even without loading the module.
> - Remove the complete usage of utf8_ops and use static calls at all
> places.
> - Restore the static calls to default values when module is unloaded.
> - Decrement the reference of module after calling the unload function.
> - Remove spinlock as there will be no race conditions after removing
> utf8_ops.
>
> Changes in v3
> - Add a patch which checks if utf8 is loaded before calling utf8_unload()
> in ext4 and f2fs filesystems
> - Return error if strscpy() returns value < 0
> - Correct the conditions to prevent NULL pointer dereference while
> accessing functions via utf8_ops variable.
> - Add spinlock to avoid race conditions.
> - Use static_call() for preventing speculative execution attacks.
>
> Changes in v2
> - Remove the duplicate file from the last patch.
> - Make the wrapper functions inline.
> - Remove msleep and use try_module_get() and module_put()
> for ensuring that module is loaded correctly and also
> doesn't get unloaded while in use.
> - Resolve the warning reported by kernel test robot.
> - Resolve all the checkpatch.pl warnings.
>
> fs/unicode/Kconfig | 26 +++-
> fs/unicode/Makefile | 5 +-
> fs/unicode/unicode-core.c | 297 ++++++++++++++------------------------
> fs/unicode/unicode-utf8.c | 264 +++++++++++++++++++++++++++++++++
> include/linux/unicode.h | 96 ++++++++++--
> 5 files changed, 483 insertions(+), 205 deletions(-)
> create mode 100644 fs/unicode/unicode-utf8.c
>
> diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig
> index 2c27b9a5cd6c..0c69800a2a37 100644
> --- a/fs/unicode/Kconfig
> +++ b/fs/unicode/Kconfig
> @@ -2,13 +2,31 @@
> #
> # UTF-8 normalization
> #
> +# CONFIG_UNICODE will be automatically enabled if CONFIG_UNICODE_UTF8
> +# is enabled. This config option adds the unicode subsystem layer which loads
> +# the UTF-8 module whenever any filesystem needs it.
> config UNICODE
> - bool "UTF-8 normalization and casefolding support"
> + bool
> +
> +config UNICODE_UTF8
> + tristate "UTF-8 module"

"UTF-8 module" is the text that will appear in menuconfig and other
configuration utilities. This string not very helpful to describe what
this code is about or why it is different from NLS_utf8. People come to
this option looking for the case-insensitive feature in ext4, so I'd
prefer to keep the mention to 'casefolding'. or even improve the
original a bit to say:

tristate: "UTF-8 support for native Case-Insensitive filesystems"

Other than these and what Eric mentioned, the code looks good to me. I
gave this series a try and it seems to work fine.

It does raise a new warning, though

/home/krisman/src/linux/fs/unicode/unicode-core.c: In function ‘unicode_load’:
/home/krisman/src/linux/include/linux/kmod.h:28:8: warning: the omitted middle operand in ‘?:’ will always be ‘true’, suggest explicit middle operand [-Wparentheses]
28 | ((x) ?: (__request_module(true, mod), (x)))
| ^
/home/krisman/src/linux/fs/unicode/unicode-core.c:123:7: note: in expansion of macro ‘try_then_request_module’
123 | if (!try_then_request_module(utf8mod_get(), "utf8")) {

But in this specific case, i think gcc is just being silly. What would
be the right way to avoid it?

--
Gabriel Krisman Bertazi