Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

From: Naoya Horiguchi
Date: Fri Aug 10 2012 - 18:01:31 EST


On Fri, Aug 10, 2012 at 05:41:53PM -0400, Naoya Horiguchi wrote:
...
> +/*
> * Dirty cache page page
> * Issues: when the error hit a hole page the error is not properly
> * propagated.
> */
> static int me_pagecache_dirty(struct page *p, unsigned long pfn)
> {
> - /*
> - * The original memory error handling on dirty pagecache has
> - * a bug that user processes who use corrupted pages via read()
> - * or write() can't be aware of the memory error and result
> - * in throwing out dirty data silently.
> - *
> - * Until we solve the problem, let's close the path of memory
> - * error handling for dirty pagecache. We just leave errors
> - * for the 2nd MCE to trigger panics.
> - */
> - return IGNORED;
> + struct address_space *mapping = page_mapping(p);
> +
> + SetPageError(p);
> + if (mapping) {
> + struct hwp_dirty *hwp;
> + struct inode *inode = mapping->host;
> +
> + /*
> + * Memory error is reported to userspace by AS_HWPOISON flags
> + * in mapping->flags. The mechanism is similar to that of
> + * AS_EIO, but we have separete flags because there'are two
> + * differences between them:
> + * 1. Expected userspace handling. When user processes get
> + * -EIO, they can retry writeback hoping the error in IO
> + * devices is temporary, switch to write to other devices,
> + * or do some other application-specific handling.
> + * For -EHWPOISON, we can clear the error by overwriting
> + * the corrupted page.
> + * 2. When to clear. For -EIO, we can think that we recover
> + * from the error when writeback succeeds. For -EHWPOISON
> + * OTOH, we can see that things are back to normal when
> + * corrupted data are overwritten from user buffer.
> + */
> + hwp = kmalloc(sizeof(struct hwp_dirty), GFP_ATOMIC);
> + hwp->page = p;
> + hwp->fpage = NULL;
> + hwp->mapping = mapping;
> + hwp->index = page_index(p);

> + hwp->ino = inode->i_ino;
> + hwp->dev = inode->i_sb->s_dev;

Sorry, these two members are not in struct hwp_dirty in current version.
Please ignore them.

Thanks,
Naoya

> + add_hwp_dirty(hwp);
> +
> + pr_err("MCE %#lx: Corrupted dirty pagecache, dev %u:%u, inode:%lu, index:%lu\n",
> + pfn, MAJOR(inode->i_sb->s_dev),
> + MINOR(inode->i_sb->s_dev), inode->i_ino, page_index(p));
> + mapping_set_error(mapping, -EHWPOISON);
> + }
> +
> + return me_pagecache_clean(p, pfn);
> }
>
> /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/