Re: [PATCH] thermal: add thermal_zone_remove_device_groups()

From: Zhang Rui
Date: Tue Jan 03 2017 - 23:35:28 EST


On Thu, 2016-12-15 at 16:47 -0500, Yasuaki Ishimatsu wrote:
> When offlining all cores on a CPU, the following system panic
> occurs:
>
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: strlen+0x0/0x20
> <snip>
> Call Trace:
> Â ? kernfs_name_hash+0x17/0x80
> Â kernfs_find_ns+0x3f/0xd0
> Â kernfs_remove_by_name_ns+0x36/0xa0
> Â remove_files.isra.1+0x36/0x70
> Â sysfs_remove_group+0x44/0x90
> Â sysfs_remove_groups+0x2e/0x50
> Â device_remove_attrs+0x5e/0x90
> Â device_del+0x1ea/0x350
> Â device_unregister+0x1a/0x60
> Â thermal_zone_device_unregister+0x1f2/0x210
> Â pkg_thermal_cpu_offline+0x14f/0x1a0 [x86_pkg_temp_thermal]
> Â ? kzalloc.constprop.2+0x10/0x10 [x86_pkg_temp_thermal]
> Â cpuhp_invoke_callback+0x8d/0x3f0
> Â cpuhp_down_callbacks+0x42/0x80
> Â cpuhp_thread_fun+0x8b/0xf0
> Â smpboot_thread_fn+0x110/0x160
> Â kthread+0x101/0x140
> Â ? sort_range+0x30/0x30
> Â ? kthread_park+0x90/0x90
> Â ret_from_fork+0x25/0x30
>
> thermal_zone_create_device_group() sets attribute_groups in
> thermal_zone_attribute_groups[] to tz->device.groups. But these
> attributes_groups do not have name argument.
>
I'm a little confused here, in remove_files(),
it is the (struct attribute *)->name which is passed into
kernfs_remove_by_name, instead of attributes_groups->name.

IMO, a NULL-name attribute group won't bring any problem.

thanks,
rui

> So when offlining all cores on CPU and executing
> thermal_zone_device_unregister(), the panic occurs in strlen()
> called from kernfs_name_hash() because name argument is NULL.
>
> The patch adds thermal_zone_remove_device_groups() to free
> tz->device.groups and set NULL pointer.
>
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx>
> CC: Zhang Rui <rui.zhang@xxxxxxxxx>
> CC: Eduardo Valentin <edubezval@xxxxxxxxx>
> ---
> Â drivers/thermal/thermal_core.cÂÂ| 3 ++-
> Â drivers/thermal/thermal_core.hÂÂ| 1 +
> Â drivers/thermal/thermal_sysfs.c | 6 ++++++
> Â 3 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/thermal_core.c
> b/drivers/thermal/thermal_core.c
> index 641faab..926e385 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1251,6 +1251,7 @@ struct thermal_zone_device *
>
> Â unregister:
> ÂÂ release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
> + thermal_zone_remove_device_groups(tz);
> ÂÂ device_unregister(&tz->device);
> ÂÂ return ERR_PTR(result);
> Â }
> @@ -1315,8 +1316,8 @@ void thermal_zone_device_unregister(struct
> thermal_zone_device *tz)
> ÂÂ release_idr(&thermal_tz_idr, &thermal_idr_lock, tz->id);
> ÂÂ idr_destroy(&tz->idr);
> ÂÂ mutex_destroy(&tz->lock);
> + thermal_zone_remove_device_groups(tz);
> ÂÂ device_unregister(&tz->device);
> - kfree(tz->device.groups);
> Â }
> Â EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);
>
> diff --git a/drivers/thermal/thermal_core.h
> b/drivers/thermal/thermal_core.h
> index 2412b37..e3a60db 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -70,6 +70,7 @@ void thermal_zone_device_unbind_exception(struct
> thermal_zone_device *,
> Â int thermal_build_list_of_policies(char *buf);
>
> Â /* sysfs I/F */
> +void thermal_zone_remove_device_groups(struct thermal_zone_device
> *tz);
> Â int thermal_zone_create_device_groups(struct thermal_zone_device *,
> int);
> Â void thermal_cooling_device_setup_sysfs(struct
> thermal_cooling_device *);
> Â /* used only at binding time */
> diff --git a/drivers/thermal/thermal_sysfs.c
> b/drivers/thermal/thermal_sysfs.c
> index a694de9..3dfd29b 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -605,6 +605,12 @@ static int create_trip_attrs(struct
> thermal_zone_device *tz, int mask)
> ÂÂ return 0;
> Â }
>
> +void thermal_zone_remove_device_groups(struct thermal_zone_device
> *tz)
> +{
> + kfree(tz->device.groups);
> + tz->device.groups = NULL;
> +}
> +
> Â int thermal_zone_create_device_groups(struct thermal_zone_device
> *tz,
> ÂÂ ÂÂÂÂÂÂint mask)
> Â {