Re: [PATCH v5] devcoredump : Serialize devcd_del work

From: Mukesh Ojha
Date: Wed Jul 20 2022 - 08:34:21 EST


Gentle reminder for review.

-Mukesh

On 7/1/2022 8:23 PM, Mukesh Ojha wrote:
Thanks @greg.

Hi @johannes,

Could you review this patch?

-Mukesh

On 6/27/2022 6:41 PM, Greg KH wrote:
On Fri, May 27, 2022 at 07:33:40PM +0530, Mukesh Ojha wrote:
In following scenario(diagram), when one thread X running dev_coredumpm()
adds devcd device to the framework which sends uevent notification to
userspace and another thread Y reads this uevent and call to
devcd_data_write() which eventually try to delete the queued timer that
is not initialized/queued yet.

So, debug object reports some warning and in the meantime, timer is
initialized and queued from X path. and from Y path, it gets reinitialized
again and timer->entry.pprev=NULL and try_to_grab_pending() stucks.

To fix this, introduce mutex and a boolean flag to serialize the behaviour.

      cpu0(X)                            cpu1(Y)

     dev_coredump() uevent sent to user space
     device_add()  ======================> user space process Y reads the
                                           uevents writes to devcd fd
                                           which results into writes to

                                          devcd_data_write()
                                            mod_delayed_work()
                                              try_to_grab_pending()
                                                del_timer()
                                                  debug_assert_init()
    INIT_DELAYED_WORK()
    schedule_delayed_work()
                                                    debug_object_fixup()
timer_fixup_assert_init()
                                                        timer_setup()
do_init_timer()
                                                        /*
                                                         Above call reinitializes
                                                         the timer to
timer->entry.pprev=NULL
                                                         and this will be checked
                                                         later in timer_pending() call.
                                                        */
                                                  timer_pending()
!hlist_unhashed_lockless(&timer->entry)
                                                     !h->pprev
                                                 /*
                                                   del_timer() checks h->pprev and finds
                                                   it to be NULL due to which
try_to_grab_pending() stucks.
                                                 */

Link: https://lore.kernel.org/lkml/2e1f81e2-428c-f11f-ce92-eb11048cb271@xxxxxxxxxxx/
Signed-off-by: Mukesh Ojha <quic_mojha@xxxxxxxxxxx>
---

I need an ack from the devcoredump maintainer before I can take this...

thanks,

greg k-h