Re: [syzbot] WARNING in mntput_no_expire (3)

From: Al Viro
Date: Wed May 18 2022 - 01:37:29 EST


On Wed, May 18, 2022 at 04:57:46AM +0000, Al Viro wrote:

> Gotcha.
> percpu_ref_init():
> ref->percpu_count_ptr = (unsigned long)
> __alloc_percpu_gfp(sizeof(unsigned long), align, gfp);
> if (!ref->percpu_count_ptr)
> return -ENOMEM;
> data = kzalloc(sizeof(*ref->data), gfp);
> if (!data) {
> free_percpu((void __percpu *)ref->percpu_count_ptr);
> return -ENOMEM;
> }
>
> cgroup_create():
> err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL);
> if (err)
> goto err_free_css;
>
> err = cgroup_idr_alloc(&ss->css_idr, NULL, 2, 0, GFP_KERNEL);
> if (err < 0)
> goto err_free_css;
>
> Now note that we end up hitting the same path in case of successful and
> failed percpu_ref_init(). With no way to tell if css->refcnt.percpu_count_ptr
> is an already freed object or needs to be freed. And sure enough, we have
>
> err_free_css:
> list_del_rcu(&css->rstat_css_node);
> INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
> queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
>
> with css_free_rwork_fn() starting with
> percpu_ref_exit(&css->refcnt);
>
> which will give that double free. That might be not the only cause of
> trouble, but this looks like a bug and a plausible source of the
> symptoms observed here. Let's see if this helps:
>
> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> index af9302141bcf..e5c5315da274 100644
> --- a/lib/percpu-refcount.c
> +++ b/lib/percpu-refcount.c
> @@ -76,6 +76,7 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release,
> data = kzalloc(sizeof(*ref->data), gfp);
> if (!data) {
> free_percpu((void __percpu *)ref->percpu_count_ptr);
> + ref->percpu_count_ptr = 0;
> return -ENOMEM;
> }
>

... and it appears to fix the damn thing. 10 minutes and still running;
without that it usually fails within a few seconds.