Re: [PATCH] add hook to read_from_oldmem() to check for non-rampages

From: Andrew Morton
Date: Thu May 05 2011 - 17:26:32 EST


On Tue, 3 May 2011 21:08:06 +0200
Olaf Hering <olaf@xxxxxxxxx> wrote:

>
> The balloon driver in a Xen guest frees guest pages and marks them as
> mmio. When the kernel crashes and the crash kernel attempts to read the
> oldmem via /proc/vmcore a read from ballooned pages will generate 100%
> load in dom0 because Xen asks qemu-dm for the page content. Since the
> reads come in as 8byte requests each ballooned page is tried 512 times.
>
> With this change a hook can be registered which checks wether the given
> pfn is really ram. The hook has to return a value > 0 for ram pages, a
> value < 0 on error (because the hypercall is not known) and 0 for
> non-ram pages.
>
> This will reduce the time to read /proc/vmcore. Without this change a
> 512M guest with 128M crashkernel region needs 200 seconds to read it,
> with this change it takes just 2 seconds.

Seems reasonable, I suppose.

Is there some suitable ifdef we can put around this stuff to avoid
adding it to kernel builds which will never use it?

> ...
>
> --- linux-2.6.39-rc5.orig/fs/proc/vmcore.c
> +++ linux-2.6.39-rc5/fs/proc/vmcore.c
> @@ -18,6 +18,7 @@
> #include <linux/init.h>
> #include <linux/crash_dump.h>
> #include <linux/list.h>
> +#include <linux/wait.h>
> #include <asm/uaccess.h>
> #include <asm/io.h>
>
> @@ -35,6 +36,44 @@ static u64 vmcore_size;
>
> static struct proc_dir_entry *proc_vmcore = NULL;
>
> +/* returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error */
> +static int (*oldmem_pfn_is_ram)(unsigned long pfn);
> +static DECLARE_WAIT_QUEUE_HEAD(oldmem_fn_waitq);
> +static atomic_t oldmem_fn_refcount = ATOMIC_INIT(0);
> +
> +void register_oldmem_pfn_is_ram(int (*fn)(unsigned long))
> +{
> + if (oldmem_pfn_is_ram == NULL)
> + oldmem_pfn_is_ram = fn;
> +}

This is racy, and it should return a success code. And we may as well
mark it __must_check to prevent people from cheating.

> +void unregister_oldmem_pfn_is_ram(void)
> +{
> + wait_event(oldmem_fn_waitq, atomic_read(&oldmem_fn_refcount) == 0);
> + oldmem_pfn_is_ram = NULL;
> + wmb();
> +}

I'd say we should do away with the (also racy) refcount thing.
Instead, require that callers not be using the thing when they
unregister. ie: that callers not be buggy.

> +static int pfn_is_ram(unsigned long pfn)
> +{
> + int (*fn)(unsigned long);
> + /* pfn is ram unless fn() checks pagetype */
> + int ret = 1;
> +
> + atomic_inc(&oldmem_fn_refcount);
> + smp_mb__after_atomic_inc();
> + fn = oldmem_pfn_is_ram;
> + if (fn)
> + ret = fn(pfn);
> + if (atomic_dec_and_test(&oldmem_fn_refcount))
> + wake_up(&oldmem_fn_waitq);
> +
> + return ret;
> +}

This function would have been a suitable place at which to document the
entire feature. As it stands, anyone who is reading this code won't
have any clue why it exists.

> +EXPORT_SYMBOL_GPL(register_oldmem_pfn_is_ram);
> +EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram);

Each export should be placed immediately after the function which is
being exported, please. Checkpatch reports this. Please use checkpatch.


> /* Reads a page from the oldmem device from given offset. */
> static ssize_t read_from_oldmem(char *buf, size_t count,
> u64 *ppos, int userbuf)
> @@ -55,9 +94,14 @@ static ssize_t read_from_oldmem(char *bu
> else
> nr_bytes = count;
>
> - tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
> - if (tmp < 0)
> - return tmp;
> + /* if pfn is not ram, return zeros for spares dump files */

typo.

Also, sentences start with capital letters!

> + if (pfn_is_ram(pfn) == 0)
> + memset(buf, 0, nr_bytes);
> + else {
> + tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
> + if (tmp < 0)
> + return tmp;
> + }
> *ppos += nr_bytes;
> count -= nr_bytes;
> buf += nr_bytes;
> Index: linux-2.6.39-rc5/include/linux/crash_dump.h
> ===================================================================
> --- linux-2.6.39-rc5.orig/include/linux/crash_dump.h
> +++ linux-2.6.39-rc5/include/linux/crash_dump.h
> @@ -66,6 +66,11 @@ static inline void vmcore_unusable(void)
> if (is_kdump_kernel())
> elfcorehdr_addr = ELFCORE_ADDR_ERR;
> }
> +
> +#define HAVE_OLDMEM_PFN_IS_RAM 1

What's this for?

> +extern void register_oldmem_pfn_is_ram(int (*fn)(unsigned long));

"unsigned long pfn" in the declaration, please. This has good
documentation value.

> +extern void unregister_oldmem_pfn_is_ram(void);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/