Re: [PATCH 0/3] Unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()

From: Mathieu Desnoyers
Date: Mon Mar 02 2020 - 15:17:11 EST


----- On Mar 2, 2020, at 2:39 PM, Greg Kroah-Hartman gregkh@xxxxxxxxxxxxxxxxxxx wrote:

> On Mon, Mar 02, 2020 at 08:36:58PM +0100, Greg Kroah-Hartman wrote:
>> On Mon, Mar 02, 2020 at 02:28:11PM -0500, Mathieu Desnoyers wrote:
>> > On 21-Feb-2020 11:44:01 AM, Will Deacon wrote:
>> > > Hi folks,
>> > >
>> > > Despite having just a single modular in-tree user that I could spot,
>> > > kallsyms_lookup_name() is exported to modules and provides a mechanism
>> > > for out-of-tree modules to access and invoke arbitrary, non-exported
>> > > kernel symbols when kallsyms is enabled.
>> > >
>> > > This patch series fixes up that one user and unexports the symbol along
>> > > with kallsyms_on_each_symbol(), since that could also be abused in a
>> > > similar manner.
>> >
>> > Hi,
>> >
>> > I maintain a GPL kernel tracer (LTTng) since 2005 which happens to be
>> > out-of-tree, even though we have made unsuccessful attempts to upstream
>> > it in the past. It uses kallsyms_lookup_name() to fetch a few symbols. I
>> > would be very glad to have them GPL-exported upstream rather than
>> > relying on this work-around. Here is the list of symbols we would need
>> > to GPL-export:
>> >
>> > stack_trace_save
>> > stack_trace_save_user
>> > vmalloc_sync_all (CONFIG_X86)
>> > get_pfnblock_flags_mask
>> > disk_name
>> > block_class
>> > disk_type
>>
>> I hate to ask, but why does anyone need block_class? or disk_name or
>> disk_type? I need to put them behind a driver core namespace or
>> something soon...
>

In LTTng, we have a "statedump" which dumps all the relevant state of
the kernel at trace start (or when the user asks for it) into the
kernel trace. It is used to collect information which helps translating
compact numeric data into human-readable information at post-processing.

For block devices, the statedump contains the mapping between the
device number and the disk name [1]. It uses the "block_class",
"disk_name", and "disk_type" symbols for this. Here is the
post-processing output:

[14:48:41.388934812] (+?.?????????) compudjdev lttng_statedump_block_device: { cpu_id = 0 }, { dev = 1048576, diskname = "ram0" }
[...]
[14:48:41.442548745] (+0.003574998) compudjdev lttng_statedump_block_device: { cpu_id = 0 }, { dev = 1048591, diskname = "ram15" }
[14:48:41.446064977] (+0.003516232) compudjdev lttng_statedump_block_device: { cpu_id = 0 }, { dev = 265289728, diskname = "vda" }
[14:48:41.449579781] (+0.003514804) compudjdev lttng_statedump_block_device: { cpu_id = 0 }, { dev = 265289729, diskname = "vda1" }
[14:48:41.453113808] (+0.003534027) compudjdev lttng_statedump_block_device: { cpu_id = 0 }, { dev = 265289744, diskname = "vdb" }
[14:48:41.456640876] (+0.003527068) compudjdev lttng_statedump_block_device: { cpu_id = 0 }, { dev = 265289745, diskname = "vdb1" }

This information is then used in our I/O analyses to show information
comprehensible to a user.

> Wait, disk_type is a static variable. And there's multiple ones of
> them, how does that work?

Yes, this is far from ideal. Here are the ones I observe in the kernel
sources:

block/genhd.c
40:static const struct device_type disk_type; <---- the one we use

lib/raid6/test/test.c
41:static char disk_type(int d) <---- this is a stand-alone user-space test program, not part of the kernel image.

crypto/async_tx/raid6test.c (depends on CONFIG_ASYNC_RAID6_TEST)
44:static char disk_type(int d, int disks) <---- the compiler optimizes away this function, so this symbol is not present in the kernel image.

I think a better approach to solve this would be to implement and expose an
iterator function in the core kernel which could invoke a callback. However,
the main issue remains: if the only user is out-of-tree, I cannot justify
adding an exported kernel helper for this.

Thanks,

Mathieu

[1] https://github.com/lttng/lttng-modules/blob/master/lttng-statedump-impl.c#L125

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com