Re: [PATCH] add hook to read_from_oldmem() to check for non-ram pages

From: Olaf Hering
Date: Fri May 06 2011 - 06:55:55 EST


On Thu, May 05, Andrew Morton wrote:

> On Tue, 3 May 2011 21:08:06 +0200
> Olaf Hering <olaf@xxxxxxxxx> wrote:

> > This will reduce the time to read /proc/vmcore. Without this change a
> > 512M guest with 128M crashkernel region needs 200 seconds to read it,
> > with this change it takes just 2 seconds.
>
> Seems reasonable, I suppose.

Andrew,
Thanks for your feedback.

> Is there some suitable ifdef we can put around this stuff to avoid
> adding it to kernel builds which will never use it?

The change is for pv-on-hvm guests. In this setup the (out-of-tree)
paravirtualized drivers shutdown the emulated hardware, then they
communicate directly with the backend.
There is no ifdef right now. I guess at some point, when xen is fully
merged, this hook can be put into some CONFIG_XEN_PV_ON_HVM thing.

> > +void register_oldmem_pfn_is_ram(int (*fn)(unsigned long))
> > +{
> > + if (oldmem_pfn_is_ram == NULL)
> > + oldmem_pfn_is_ram = fn;
> > +}
>
> This is racy, and it should return a success code. And we may as well
> mark it __must_check to prevent people from cheating.

I will update that part.

> > +void unregister_oldmem_pfn_is_ram(void)
> > +{
> > + wait_event(oldmem_fn_waitq, atomic_read(&oldmem_fn_refcount) == 0);
> > + oldmem_pfn_is_ram = NULL;
> > + wmb();
> > +}
>
> I'd say we should do away with the (also racy) refcount thing.
> Instead, require that callers not be using the thing when they
> unregister. ie: that callers not be buggy.

I think oldmem_pfn_is_ram can be cleared unconditionally, the NULL check
in pfn_is_ram() below will prevent a crash.
This whole refcount thing is to prevent a module unload while
pfn_is_ram() is calling the hook, in other words the called code should
not go away until the hook returns to pfn_is_ram().
Should the called code increase/decrease the modules refcount instead?
I remember there was some MODULE_INC/MODULE_DEC macro (cant remember the
exact name) at some point. What needs to be done inside the module to
prevent an unload while its still in use? Is it __module_get/module_put
for each call of fn()?

The called function which will go into the xen source at some point
looks like shown below. HVMOP_get_mem_type was just merged into xen-unstable.

xen-unstable.hg/unmodified_drivers/linux-2.6/platform-pci/platform-pci.c

#ifdef HAVE_OLDMEM_PFN_IS_RAM
static int xen_oldmem_pfn_is_ram(unsigned long pfn)
{
struct xen_hvm_get_mem_type a;
int ret;

a.domid = DOMID_SELF;
a.pfn = pfn;
if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a))
return -ENXIO;

switch (a.mem_type) {
case HVMMEM_mmio_dm:
ret = 0;
break;
case HVMMEM_ram_rw:
case HVMMEM_ram_ro:
default:
ret = 1;
break;
}

return ret;
}
#endif

static int __devinit platform_pci_init(...)
{
/* other init stuff */
#ifdef HAVE_OLDMEM_PFN_IS_RAM
register_oldmem_pfn_is_ram(&xen_oldmem_pfn_is_ram);
#endif
/* other init stuff */
}


Also, this xen driver has no module_exit. So for xen the unregister is
not an issue. I havent looked at the to-be-merged pv-on-hvm drivers,
maybe they can be properly unloaded.

> > +static int pfn_is_ram(unsigned long pfn)
> > +{
> > + int (*fn)(unsigned long);
> > + /* pfn is ram unless fn() checks pagetype */
> > + int ret = 1;
> > +
> > + atomic_inc(&oldmem_fn_refcount);
> > + smp_mb__after_atomic_inc();
> > + fn = oldmem_pfn_is_ram;
> > + if (fn)
> > + ret = fn(pfn);
> > + if (atomic_dec_and_test(&oldmem_fn_refcount))
> > + wake_up(&oldmem_fn_waitq);
> > +
> > + return ret;
> > +}
>
> This function would have been a suitable place at which to document the
> entire feature. As it stands, anyone who is reading this code won't
> have any clue why it exists.

I will add a comment.

> > +EXPORT_SYMBOL_GPL(register_oldmem_pfn_is_ram);
> > +EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram);
>
> Each export should be placed immediately after the function which is
> being exported, please. Checkpatch reports this. Please use checkpatch.

Will do.

> > +++ linux-2.6.39-rc5/include/linux/crash_dump.h
> > @@ -66,6 +66,11 @@ static inline void vmcore_unusable(void)
> > if (is_kdump_kernel())
> > elfcorehdr_addr = ELFCORE_ADDR_ERR;
> > }
> > +
> > +#define HAVE_OLDMEM_PFN_IS_RAM 1
>
> What's this for?

So that out-of-tree drivers dont fail to compile when they call that
hook unconditionally. Perhaps they could use the kernel version.

Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/