Re: [GIT PULL] iov_iter: Improve page extraction (pin or just list)

From: David Hildenbrand
Date: Tue Jan 31 2023 - 10:14:10 EST


On 31.01.23 16:04, Jens Axboe wrote:
On 1/31/23 8:02?AM, David Hildenbrand wrote:
On 31.01.23 15:50, Jens Axboe wrote:
On 1/31/23 6:48?AM, David Hildenbrand wrote:
On 31.01.23 14:41, David Howells wrote:
David Hildenbrand <david@xxxxxxxxxx> wrote:

percpu counters maybe - add them up at the point of viewing?
They are percpu, see my last email. But for every 108 changes (on
my system), they will do two atomic_long_adds(). So not very
useful for anything but low frequency modifications.


Can we just treat the whole acquired/released accounting as a debug mechanism
to detect missing releases and do it only for debug kernels?


The pcpu counter is an s8, so we have to flush on a regular basis and cannot
really defer it any longer ... but I'm curious if it would be of any help to
only have a single PINNED counter that goes into both directions (inc/dec on
pin/release), to reduce the flushing.

Of course, once we pin/release more than ~108 pages in one go or we switch
CPUs frequently it won't be that much of a help ...

What are the stats actually used for? Is it just debugging, or do we actually
have users for them (control groups spring to mind)?

As it's really just "how many pinning events" vs. "how many unpinning
events", I assume it's only for debugging.

For example, if you pin the same page twice it would not get accounted
as "a single page is pinned".

How about something like the below then? I can send it out as a real
patch, will run a sanity check on it first but would be surprised if
this doesn't fix it.


diff --git a/mm/gup.c b/mm/gup.c
index f45a3a5be53a..41abb16286ec 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -168,7 +168,9 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
*/
smp_mb__after_atomic();
+#ifdef CONFIG_DEBUG_VM
node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
+#endif
return folio;
}
@@ -180,7 +182,9 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
{
if (flags & FOLL_PIN) {
+#ifdef CONFIG_DEBUG_VM
node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
+#endif
if (folio_test_large(folio))
atomic_sub(refs, folio_pincount_ptr(folio));
else
@@ -236,8 +240,9 @@ int __must_check try_grab_page(struct page *page, unsigned int flags)
} else {
folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
}
-
+#ifdef CONFIG_DEBUG_VM
node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
+#endif
}
return 0;


We might want to hide the counters completely by defining them only
with CONFIG_DEBUG_VM.

Are all of them debug aids only? If so, yes we should just have
node_stat_* under CONFIG_DEBUG_VM.


Rather only these 2. Smth like:

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 815c7c2edf45..a526964b65ce 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -196,8 +196,10 @@ enum node_stat_item {
NR_WRITTEN, /* page writings since bootup */
NR_THROTTLED_WRITTEN, /* NR_WRITTEN while reclaim throttled */
NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */
+#ifdef CONFIG_DEBUG_VM
NR_FOLL_PIN_ACQUIRED, /* via: pin_user_page(), gup flag: FOLL_PIN */
NR_FOLL_PIN_RELEASED, /* pages returned via unpin_user_page() */
+#endif
NR_KERNEL_STACK_KB, /* measured in KiB */
#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
NR_KERNEL_SCS_KB, /* measured in KiB */
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1ea6a5ce1c41..5cbd9a1924bf 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1227,8 +1227,10 @@ const char * const vmstat_text[] = {
"nr_written",
"nr_throttled_written",
"nr_kernel_misc_reclaimable",
+#ifdef CONFIG_DEBUG_VM
"nr_foll_pin_acquired",
"nr_foll_pin_released",
+#endif
"nr_kernel_stack",
#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
"nr_shadow_call_stack",

--
Thanks,

David / dhildenb