Re: [BUG] infinite loop in find_get_pages()

From: Justin Piszcz
Date: Thu Sep 15 2011 - 07:11:13 EST




On Thu, 15 Sep 2011, Pawel Sikora wrote:

On Wednesday 14 of September 2011 08:34:21 Lin Ming wrote:

[3.0.2-stable] BUG: soft lockup - CPU#13 stuck for 22s! [kswapd2:1092]
http://marc.info/?l=linux-kernel&m=131469584117857&w=2

Hi,

i'm not sure that this is fully related to this thread but i've found
new warnings about memory pages in dmesg today:

[650697.716481] ------------[ cut here ]------------
[650697.716498] WARNING: at mm/page-writeback.c:1176 __set_page_dirty_nobuffers+0x10a/0x140()
[650697.716501] Hardware name: H8DGU
[650697.716502] Modules linked in: nfs fscache binfmt_misc nfsd lockd nfs_acl auth_rpcgss sunrpc ipmi_si ipmi_devintf ipmi_msghandler sch_sfq iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 iptable_filter xt_TCPMSS xt_tcpudp iptable_mangle ip_tables ip6table_filter ip6_tables x_tables ext4 jbd2 crc16 raid10 raid0 dm_mod uvesafb autofs4
dummy aoe joydev usbhid hid ide_cd_mod cdrom ata_generic pata_acpi pata_atiixp sp5100_tco ohci_hcd ide_pci_generic ssb ehci_hcd pcmcia igb pcmcia_core psmouse mmc_core evdev
i2c_piix4 atiixp ide_core k10temp usbcore amd64_edac_mod edac_core i2c_core dca hwmon edac_mce_amd ghes serio_raw button hed processor pcspkr sg sd_mod crc_t10dif raid1 md_mod ext3
jbd mbcache ahci libahci libata scsi_mod [last unloaded: scsi_wait_scan]
[650697.716569] Pid: 16806, comm: m_xilinx Not tainted 3.0.4 #5
[650697.716572] Call Trace:
[650697.716582] [<ffffffff810470da>] warn_slowpath_common+0x7a/0xb0
[650697.716586] [<ffffffff81047125>] warn_slowpath_null+0x15/0x20
[650697.716590] [<ffffffff810e71ba>] __set_page_dirty_nobuffers+0x10a/0x140
[650697.716596] [<ffffffff81127eb8>] migrate_page_copy+0x1c8/0x1d0
[650697.716600] [<ffffffff81127ef5>] migrate_page+0x35/0x50
[650697.716623] [<ffffffffa04b6f19>] nfs_migrate_page+0x59/0xf0 [nfs]
[650697.716627] [<ffffffff81127fb9>] move_to_new_page+0xa9/0x260
[650697.716630] [<ffffffff811286bd>] migrate_pages+0x3fd/0x4c0
[650697.716635] [<ffffffff8142988e>] ? apic_timer_interrupt+0xe/0x20
[650697.716641] [<ffffffff8111cbf0>] ? ftrace_define_fields_mm_compaction_isolate_template+0x70/0x70
[650697.716645] [<ffffffff8111d5da>] compact_zone+0x52a/0x8c0
[650697.716649] [<ffffffff8111dade>] compact_zone_order+0x7e/0xb0
[650697.716653] [<ffffffff8111dbcd>] try_to_compact_pages+0xbd/0xf0
[650697.716657] [<ffffffff810e5148>] __alloc_pages_direct_compact+0xa8/0x180
[650697.716661] [<ffffffff810e588d>] __alloc_pages_nodemask+0x66d/0x7f0
[650697.716667] [<ffffffff8110a92d>] ? page_add_new_anon_rmap+0x9d/0xb0
[650697.716671] [<ffffffff8111b865>] alloc_pages_vma+0x95/0x180
[650697.716676] [<ffffffff8112c2f8>] do_huge_pmd_anonymous_page+0x138/0x310
[650697.716680] [<ffffffff81102ace>] handle_mm_fault+0x21e/0x310
[650697.716685] [<ffffffff81424bed>] do_page_fault+0x11d/0x4e0
[650697.716688] [<ffffffff811077a7>] ? do_mmap_pgoff+0x357/0x370
[650697.716692] [<ffffffff8110790d>] ? sys_mmap_pgoff+0x14d/0x220
[650697.716697] [<ffffffff811371b8>] ? do_sys_open+0x168/0x1d0
[650697.716701] [<ffffffff81421d5f>] page_fault+0x1f/0x30
[650697.716704] ---[ end trace 4255de435c6def21 ]---

BR,
Pawe?.


Hi Pawell,

I had the same issues, either try the latest patch that was recommended,
OR, try the older ones (I am using these three and I have not had a memory
error/OOPS/etc in 24hrs)

Before patches:
Aug 30 05:00:48 p34 kernel: [122150.720173] [<ffffffff8103798a>] warn_slowpath_common+0x7a/0xb0
Sep 10 20:59:39 p34 kernel: [531189.671424] [<ffffffff810379ba>] warn_slowpath_common+0x7a/0xb0

After patches:
(no errors)

Patches you need (against 3.1-rc4):

(for the igb problem/memory allocation issue)
0001-Fix-pointer-dereference-before-call-to-pcie_bus_conf.patch
0002-PCI-Remove-MRRS-modification-from-MPS-setting-code.patch

(for the RCU/memory errors)
0003-filemap.patch

I've attached them to this e-mail, they seem to have fixed all of my problems so far.

Justin. From eric.dumazet@xxxxxxxxx Wed Sep 14 06:20:11 2011
Date: Wed, 14 Sep 2011 06:20:08
From: Eric Dumazet <eric.dumazet@xxxxxxxxx>
To: Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx>
Cc: Lin Ming <mlin@xxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, Alan Piszcz <ap@xxxxxxxxxxxxx>, "Li, Shaohua" <shaohua.li@xxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxx>
Subject: Re: 3.0.1: pagevec_lookup+0x1d/0x30, SLAB issues?

Le mercredi 14 septembre 2011 à 05:47 -0400, Justin Piszcz a écrit :
>
> On Wed, 14 Sep 2011, Lin Ming wrote:
>
> > On Mon, Sep 12, 2011 at 6:44 AM, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote:
> > Hi, Justin
> >
> > There is a similar bug report at:
> > http://marc.info/?t=131594190600005&r=1&w=2
> >
> > The attached patch from Shaohua fixed the bug.
> >
> > Could you have a try it?
> >
>
> Hi Lin/LKML,
>
> Can you please provide text patch files for what you want me to apply?
> I did read that e-mail thread and that could be the culprit, I will patch
> and apply as soon as someone points to to the patch locations :)

diff --git a/mm/filemap.c b/mm/filemap.c
index 645a080..7771871 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -827,13 +827,14 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
{
unsigned int i;
unsigned int ret;
- unsigned int nr_found;
+ unsigned int nr_found, nr_skip;

rcu_read_lock();
restart:
nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
(void ***)pages, NULL, start, nr_pages);
ret = 0;
+ nr_skip = 0;
for (i = 0; i < nr_found; i++) {
struct page *page;
repeat:
@@ -856,6 +857,7 @@ repeat:
* here as an exceptional entry: so skip over it -
* we only reach this from invalidate_mapping_pages().
*/
+ nr_skip++;
continue;
}

@@ -876,7 +878,7 @@ repeat:
* If all entries were removed before we could secure them,
* try again, because callers stop trying once 0 is returned.
*/
- if (unlikely(!ret && nr_found))
+ if (unlikely(!ret && nr_found > nr_skip))
goto restart;
rcu_read_unlock();
return ret;

From 74d81235f8e4bd60859d539a27e51d3a09d183cf Mon Sep 17 00:00:00 2001
From: Jon Mason <mason@xxxxxxxx>
Date: Thu, 8 Sep 2011 12:59:00 -0500
Subject: [PATCH 2/2] PCI: Remove MRRS modification from MPS setting code

Modifying the Maximum Read Request Size to 0 (value of 128Bytes) has
massive negative ramifications on some devices. Without knowing which
devices have this issue, do not modify from the default value when
walking the PCI-E bus in pcie_bus_safe mode. Also, make pcie_bus_safe
the default procedure.

Tested-by: Sven Schnelle <svens@xxxxxxxxxxxxxx>
Tested-by: Simon Kirby <sim@xxxxxxxxxx>
Tested-by: Stephen M. Cameron <scameron@xxxxxxxxxxxxxxxxxx>
Reported-and-tested-by: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Reported-and-tested-by: Niels Ole Salscheider <niels_ole@xxxxxxxxxxxxxxxxxxxxx>
References: https://bugzilla.kernel.org/show_bug.cgi?id=42162
Signed-off-by: Jon Mason <mason@xxxxxxxx>
---
drivers/pci/pci.c | 2 +-
drivers/pci/probe.c | 41 ++++++++++++++++++++++-------------------
2 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0ce6742..4e84fd4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -77,7 +77,7 @@ unsigned long pci_cardbus_mem_size = DEFAULT_CARDBUS_MEM_SIZE;
unsigned long pci_hotplug_io_size = DEFAULT_HOTPLUG_IO_SIZE;
unsigned long pci_hotplug_mem_size = DEFAULT_HOTPLUG_MEM_SIZE;

-enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_PERFORMANCE;
+enum pcie_bus_config_types pcie_bus_config = PCIE_BUS_SAFE;

/*
* The default CLS is used if arch didn't set CLS explicitly and not
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 0820fc1..b1187ff 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1396,34 +1396,37 @@ static void pcie_write_mps(struct pci_dev *dev, int mps)

static void pcie_write_mrrs(struct pci_dev *dev, int mps)
{
- int rc, mrrs;
+ int rc, mrrs, dev_mpss;

- if (pcie_bus_config == PCIE_BUS_PERFORMANCE) {
- int dev_mpss = 128 << dev->pcie_mpss;
+ /* In the "safe" case, do not configure the MRRS. There appear to be
+ * issues with setting MRRS to 0 on a number of devices.
+ */

- /* For Max performance, the MRRS must be set to the largest
- * supported value. However, it cannot be configured larger
- * than the MPS the device or the bus can support. This assumes
- * that the largest MRRS available on the device cannot be
- * smaller than the device MPSS.
- */
- mrrs = mps < dev_mpss ? mps : dev_mpss;
- } else
- /* In the "safe" case, configure the MRRS for fairness on the
- * bus by making all devices have the same size
- */
- mrrs = mps;
+ if (pcie_bus_config != PCIE_BUS_PERFORMANCE)
+ return;
+
+ dev_mpss = 128 << dev->pcie_mpss;

+ /* For Max performance, the MRRS must be set to the largest supported
+ * value. However, it cannot be configured larger than the MPS the
+ * device or the bus can support. This assumes that the largest MRRS
+ * available on the device cannot be smaller than the device MPSS.
+ */
+ mrrs = min(mps, dev_mpss);

/* MRRS is a R/W register. Invalid values can be written, but a
- * subsiquent read will verify if the value is acceptable or not.
+ * subsequent read will verify if the value is acceptable or not.
* If the MRRS value provided is not acceptable (e.g., too large),
* shrink the value until it is acceptable to the HW.
*/
while (mrrs != pcie_get_readrq(dev) && mrrs >= 128) {
+ dev_warn(&dev->dev, "Attempting to modify the PCI-E MRRS value"
+ " to %d. If any issues are encountered, please try "
+ "running with pci=pcie_bus_safe\n", mrrs);
rc = pcie_set_readrq(dev, mrrs);
if (rc)
- dev_err(&dev->dev, "Failed attempting to set the MRRS\n");
+ dev_err(&dev->dev,
+ "Failed attempting to set the MRRS\n");

mrrs /= 2;
}
@@ -1436,13 +1439,13 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
if (!pci_is_pcie(dev))
return 0;

- dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
+ dev_dbg(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev));

pcie_write_mps(dev, mps);
pcie_write_mrrs(dev, mps);

- dev_info(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
+ dev_dbg(&dev->dev, "Dev MPS %d MPSS %d MRRS %d\n",
pcie_get_mps(dev), 128<<dev->pcie_mpss, pcie_get_readrq(dev));

return 0;
--
1.7.6

From cf822aed99fd8851d82ae5f2df11c29b79e316c8 Mon Sep 17 00:00:00 2001
From: Shyam Iyer <shyam.iyer.t@xxxxxxxxx>
Date: Wed, 31 Aug 2011 12:21:42 -0400
Subject: [PATCH 1/2] Fix pointer dereference before call to
pcie_bus_configure_settings

There is a potential NULL pointer dereference in calls to
pcie_bus_configure_settings due to attempts to access pci_bus self
variables when the self pointer is NULL. To correct this, verify that
the self pointer in pci_bus is non-NULL before dereferencing it.

Reported-by: Stanislaw Gruszka <sgruszka@xxxxxxxxxx>
Signed-off-by: Shyam Iyer <shyam_iyer@xxxxxxxx>
Signed-off-by: Jon Mason <mason@xxxxxxxx>
---
arch/x86/pci/acpi.c | 9 +++++++--
drivers/pci/hotplug/pcihp_slot.c | 4 +++-
drivers/pci/probe.c | 3 ---
3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index c953302..039d913 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -365,8 +365,13 @@ struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root *root)
*/
if (bus) {
struct pci_bus *child;
- list_for_each_entry(child, &bus->children, node)
- pcie_bus_configure_settings(child, child->self->pcie_mpss);
+ list_for_each_entry(child, &bus->children, node) {
+ struct pci_dev *self = child->self;
+ if (!self)
+ continue;
+
+ pcie_bus_configure_settings(child, self->pcie_mpss);
+ }
}

if (!bus)
diff --git a/drivers/pci/hotplug/pcihp_slot.c b/drivers/pci/hotplug/pcihp_slot.c
index 753b21a..3ffd9c1 100644
--- a/drivers/pci/hotplug/pcihp_slot.c
+++ b/drivers/pci/hotplug/pcihp_slot.c
@@ -169,7 +169,9 @@ void pci_configure_slot(struct pci_dev *dev)
(dev->class >> 8) == PCI_CLASS_BRIDGE_PCI)))
return;

- pcie_bus_configure_settings(dev->bus, dev->bus->self->pcie_mpss);
+ if (dev->bus && dev->bus->self)
+ pcie_bus_configure_settings(dev->bus,
+ dev->bus->self->pcie_mpss);

memset(&hpp, 0, sizeof(hpp));
ret = pci_get_hp_params(dev, &hpp);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8473727..0820fc1 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1456,9 +1456,6 @@ void pcie_bus_configure_settings(struct pci_bus *bus, u8 mpss)
{
u8 smpss = mpss;

- if (!bus->self)
- return;
-
if (!pci_is_pcie(bus->self))
return;

--
1.7.6