Re: [PATCH 3/3] mm/slub: Fix potential deadlock problem in slab_attr_store()

From: Waiman Long
Date: Wed Feb 12 2020 - 15:40:40 EST


On 2/11/20 6:30 PM, Waiman Long wrote:
> On 2/10/20 6:10 PM, Andrew Morton wrote:
>> On Mon, 10 Feb 2020 17:14:31 -0500 Waiman Long <longman@xxxxxxxxxx> wrote:
>>
>>>>> --- a/mm/slub.c
>>>>> +++ b/mm/slub.c
>>>>> @@ -5536,7 +5536,12 @@ static ssize_t slab_attr_store(struct kobject *kobj,
>>>>> if (slab_state >= FULL && err >= 0 && is_root_cache(s)) {
>>>>> struct kmem_cache *c;
>>>>>
>>>>> - mutex_lock(&slab_mutex);
>>>>> + /*
>>>>> + * Timeout after 100ms
>>>>> + */
>>>>> + if (mutex_timed_lock(&slab_mutex, 100) < 0)
>>>>> + return -EBUSY;
>>>>> +
>>>> Oh dear. Surely there's a better fix here. Does slab really need to
>>>> hold slab_mutex while creating that sysfs file? Why?
>>>>
>>>> If the issue is two threads trying to create the same sysfs file
>>>> (unlikely, given that both will need to have created the same cache)
>>>> then can we add a new mutex specifically for this purpose?
>>>>
>>>> Or something else.
>>>>
>>> Well, the current code iterates all the memory cgroups to set the same
>>> value in all of them. I believe the reason for holding the slab mutex is
>>> to make sure that memcg hierarchy is stable during this iteration
>>> process.
>> But that is unrelated to creation of the sysfs file?
>>
> OK, I will take a closer look at that.

During the creation of a sysfs file:

static int sysfs_slab_add(struct kmem_cache *s)
{
  :
        if (unmergeable) {
                /*
                 * Slabcache can never be merged so we can use the name
proper.
                 * This is typically the case for debug situations. In that
                 * case we can catch duplicate names easily.
                 */
                sysfs_remove_link(&slab_kset->kobj, s->name);
                name = s->name;

The code is trying to remove sysfs files of a cache with conflicting
name. So it seems like kmem_cache_create() is called with a name that
has been used before. If it happens that a write to one of the sysfs
files to be removed happens at the same time, a deadlock can happen.

In this particular case, the kmem_cache_create() call comes from the
mlx5_core module.

        steering->fgs_cache = kmem_cache_create("mlx5_fs_fgs",
                                                sizeof(struct
mlx5_flow_group), 0,
                                                0, NULL);

Perhaps the module is somehow unloaded and then loaded again. Unfortunately
this lockdep error was seen once. It is hard to find out how to fix it
without an easy way to reproduce it.

So I will table this for now until there is a way to reproduce it.

Thanks,
Longman