Re: [PATCH 3/3] mm/kmemleak: Prevent soft lockup in first object iteration loop of kmemleak_scan()

From: Waiman Long
Date: Tue Jun 14 2022 - 14:22:52 EST


On 6/14/22 13:27, Catalin Marinas wrote:
On Tue, Jun 14, 2022 at 06:15:14PM +0100, Catalin Marinas wrote:
On Sun, Jun 12, 2022 at 02:33:01PM -0400, Waiman Long wrote:
@@ -1437,10 +1440,25 @@ static void kmemleak_scan(void)
#endif
/* reset the reference count (whiten the object) */
object->count = 0;
- if (color_gray(object) && get_object(object))
+ if (color_gray(object) && get_object(object)) {
list_add_tail(&object->gray_list, &gray_list);
+ gray_list_cnt++;
+ object_pinned = true;
+ }
I may have the mistaken belief that setting count to 0 will make most object gray. Apparently, that may not be the case here.
raw_spin_unlock_irq(&object->lock);
+
+ /*
+ * With object pinned by a positive reference count, it
+ * won't go away and we can safely release the RCU read
+ * lock and do a cond_resched() to avoid soft lockup every
+ * 64k objects.
+ */
+ if (object_pinned && !(gray_list_cnt & 0xffff)) {
+ rcu_read_unlock();
+ cond_resched();
+ rcu_read_lock();
+ }
I'm not sure this gains much. There should be very few gray objects
initially (those passed to kmemleak_not_leak() for example). The
majority should be white objects.

If we drop the fine-grained object->lock, we could instead take
kmemleak_lock outside the loop with a cond_resched_lock(&kmemleak_lock)
within the loop. I think we can get away with not having an
rcu_read_lock() at all for list traversal with the big lock outside the
loop.
Actually this doesn't work is the current object in the iteration is
freed. Does list_for_each_rcu_safe() help?

list_for_each_rcu_safe() helps if we are worrying about object being freed. However, it won't help if object->next is freed instead.

How about something like:

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 7dd64139a7c7..fd836e43cb16 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1417,12 +1417,16 @@ static void kmemleak_scan(void)
        struct zone *zone;
        int __maybe_unused i;
        int new_leaks = 0;
+       int loop1_cnt = 0;

        jiffies_last_scan = jiffies;

        /* prepare the kmemleak_object's */
        rcu_read_lock();
        list_for_each_entry_rcu(object, &object_list, object_list) {
+               bool obj_pinned = false;
+
+               loop1_cnt++;
                raw_spin_lock_irq(&object->lock);
 #ifdef DEBUG
                /*
@@ -1437,10 +1441,32 @@ static void kmemleak_scan(void)
 #endif
                /* reset the reference count (whiten the object) */
                object->count = 0;
-               if (color_gray(object) && get_object(object))
+               if (color_gray(object) && get_object(object)) {
                        list_add_tail(&object->gray_list, &gray_list);
+                       obj_pinned = true;
+               }

                raw_spin_unlock_irq(&object->lock);
+
+               /*
+                * Do a cond_resched() to avoid soft lockup every 64k objects.
+                * Make sure a reference has been taken so that the object
+                * won't go away without RCU read lock.
+                */
+               if (loop1_cnt & 0xffff) {
+                       if (!obj_pinned && !get_object(object)) {
+                               /* Try the next object instead */
+                               loop1_cnt--;
+                               continue;
+                       }
+
+                       rcu_read_unlock();
+                       cond_resched();
+                       rcu_read_lock();
+
+                       if (!obj_pinned)
+                               put_object(object);
+               }
        }
        rcu_read_unlock();

Cheers,
Longman