Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics

From: Ingo Molnar
Date: Sun Aug 23 2015 - 02:05:27 EST



* George Spelvin <linux@xxxxxxxxxxx> wrote:

> Linus wrote:
> > I don't think any of this can be called "correct", in that the
> > unlocked accesses to the cached state are clearly racy, but I think
> > it's very much "acceptable".
>
> I'd think you could easily fix that with a seqlock-like system.
>
> What makes it so simple is that you can always fall back to
> calc_vmalloc_info if there's any problem, rather than looping or blocking.
>
> The basic idea is that you have a seqlock counter, but if either of
> the two lsbits are set, the cached information is stale.
>
> Basically, you need a seqlock and a spinlock. The seqlock does
> most of the work, and the spinlock ensures that there's only one
> updater of the cache.
>
> vmap_unlock() does set_bit(0, &seq->sequence). This marks the information
> as stale.
>
> get_vmalloc_info reads the seqlock. There are two case:
> - If the two lsbits are 10, the cached information is valid.
> Copy it out, re-check the seqlock, and loop if the sequence
> number changes.
> - In any other case, the cached information is
> not valid.
> - Try to obtain the spinlock. Do not block if it's unavailable.
> - If unavailable, do not block.
> - If the lock is acquired:
> - Set the sequence to (sequence | 3) + 1 (we're the only writer)
> - This bumps the sequence number and leaves the lsbits at 00 (invalid)
> - Memory barrier TBD. Do the RCU ops in calc_vmalloc_info do it for us?
> - Call calc_vmalloc_info
> - If we obtained the spinlock earlier:
> - Copy our vmi to cached_info
> - smp_wmb()
> - set_bit(1, &seq->sequence). This marks the information as valid,
> as long as bit 0 is still clear.
> - Release the spinlock.
>
> Basically, bit 0 says "vmalloc info has changed", and bit 1 says
> "vmalloc cache has been updated". This clears bit 0 before
> starting the update so that an update during calc_vmalloc_info
> will force a new update.
>
> So the three case are basically:
> 00 - calc_vmalloc_info() in progress
> 01 - vmap_unlock() during calc_vmalloc_info()
> 10 - cached_info is valid
> 11 - vmap_unlock has invalidated cached_info, awaiting refresh
>
> Logically, the sequence number should be initialized to ...01,
> but the code above handles 00 okay.

I think this is too complex.

How about something simple like the patch below (on top of the third patch)?

It makes the vmalloc info transactional - /proc/meminfo will always print a
consistent set of numbers. (Not that we really care about races there, but it
looks really simple to solve so why not.)

( I also moved the function-static cache next to the flag and seqlock - this
should further compress the cache footprint. )

Have I missed anything? Very lightly tested: booted in a VM.

Thanks,

Ingo

=========================>

mm/vmalloc.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ef48e557df5a..66726f41e726 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -278,7 +278,15 @@ EXPORT_SYMBOL(vmalloc_to_pfn);

static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock);

+/*
+ * Seqlock and flag for the vmalloc info cache printed in /proc/meminfo.
+ *
+ * The assumption of the optimization is that it's read frequently, but
+ * modified infrequently.
+ */
+static DEFINE_SEQLOCK(vmap_info_lock);
static int vmap_info_changed;
+static struct vmalloc_info vmap_info_cache;

static inline void vmap_lock(void)
{
@@ -2752,10 +2760,14 @@ static void calc_vmalloc_info(struct vmalloc_info *vmi)

void get_vmalloc_info(struct vmalloc_info *vmi)
{
- static struct vmalloc_info cached_info;
+ if (!READ_ONCE(vmap_info_changed)) {
+ unsigned int seq;
+
+ do {
+ seq = read_seqbegin(&vmap_info_lock);
+ *vmi = vmap_info_cache;
+ } while (read_seqretry(&vmap_info_lock, seq));

- if (!vmap_info_changed) {
- *vmi = cached_info;
return;
}

@@ -2764,8 +2776,9 @@ void get_vmalloc_info(struct vmalloc_info *vmi)

calc_vmalloc_info(vmi);

- barrier();
- cached_info = *vmi;
+ write_seqlock(&vmap_info_lock);
+ vmap_info_cache = *vmi;
+ write_sequnlock(&vmap_info_lock);
}

#endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/