AMD-IOMMU and problem with __init(data)?

From: Alexander Holler
Date: Wed Sep 23 2015 - 06:22:37 EST


Hello,

It looks like I have a problem with the AMD IOMMU and it's handling of __init or __initdata.

I'm working on something which stores some structs right after INIT_CALLS but before CON_INITCALL (see include/asm-generic/vmlinux.lds.h).

This structures will be accessed right after the initcalls from level fs (5, see init/main.c), have been called.

Here is how that structure is defined:
--
struct _annotated_initcall {
initcall_t initcall;
unsigned driver_id;
unsigned *dependencies;
struct device_driver *driver;
};
extern struct _annotated_initcall __annotated_initcall_start[],
__annotated_initcall_end[];
--

The code which uses that is
--
struct _annotated_initcall *ac;

ac = __annotated_initcall_start;
for (; ac < __annotated_initcall_end; ++ac)
pr_info("AHO: ac %p ic %p ID %u deps %p drv %p\n",
ac, ac->initcall, ac->driver_id,
ac->dependencies, ac->driver);
--

What now happens if I've enabled CONFIG_AMD_IOMMU is the following:

--
(...)
[ 1.240362] io scheduler noop registered
[ 1.395764] iommu: Adding device 0000:00:00.0 to group 0
[ 1.401478] iommu: Adding device 0000:00:01.0 to group 1
[ 1.406828] iommu: Adding device 0000:00:01.1 to group 1
[ 1.412487] iommu: Adding device 0000:00:10.0 to group 2
[ 1.417839] iommu: Adding device 0000:00:10.1 to group 2
[ 1.423501] iommu: Adding device 0000:00:11.0 to group 3
[ 1.429157] iommu: Adding device 0000:00:12.0 to group 4
[ 1.434510] iommu: Adding device 0000:00:12.2 to group 4
[ 1.440166] iommu: Adding device 0000:00:13.0 to group 5
[ 1.445520] iommu: Adding device 0000:00:13.2 to group 5
[ 1.451196] iommu: Adding device 0000:00:14.0 to group 6
[ 1.456551] iommu: Adding device 0000:00:14.2 to group 6
[ 1.461904] iommu: Adding device 0000:00:14.3 to group 6
[ 1.467568] iommu: Adding device 0000:00:14.4 to group 7
[ 1.473226] iommu: Adding device 0000:00:15.0 to group 8
[ 1.480807] iommu: Adding device 0000:00:15.1 to group 8
[ 1.486470] iommu: Adding device 0000:00:18.0 to group 9
[ 1.491822] iommu: Adding device 0000:00:18.1 to group 9
[ 1.497176] iommu: Adding device 0000:00:18.2 to group 9
[ 1.502528] iommu: Adding device 0000:00:18.3 to group 9
[ 1.507883] iommu: Adding device 0000:00:18.4 to group 9
[ 1.513237] iommu: Adding device 0000:00:18.5 to group 9
[ 1.518593] iommu: Adding device 0000:03:00.0 to group 8
[ 1.523932] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[ 1.529276] AMD-Vi: Extended features: PreF PPR GT IA
[ 1.534776] AMD-Vi: Interrupt remapping enabled
[ 1.539496] AMD-Vi: Lazy IO/TLB flushing enabled
[ 1.545741] AHO: count_annotated 25
[ 1.549259] AHO: build inventory
[ 1.552517] AHO: ac ffffffff81d400d8 ic (null) ID 2177560225 deps 00000000000000b0 drv ffffffff81d25090
[ 1.562801] BUG: unable to handle kernel paging request at 00000000039c2af5
(...)
--

The bug happens because the field driver_id of the structure (and likely the other stuff) is wrong.

If I disable CONFIG_AMD_IOMMU it looks like it should and how it works on ARM and Intel systems:
--
(...)
[ 1.151906] io scheduler noop registered
[ 1.307088] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 1.313563] software IO TLB [mem 0x894ca000-0x8d4ca000] (64MB) mapped at [ffff8800894ca000-ffff88008d4c9fff]
[ 1.323411] AHO: count_annotated 25
[ 1.326934] AHO: build inventory
[ 1.330189] AHO: ac ffffffff81d3cea0 ic ffffffff81cadcb4 ID 176 deps ffffffff81d22090 drv (null)
(...)
--

Does anyone have an idea what's going on?

Kernel is 4.2.1 (x86_64) and HW is an A10-5800K.

If it's necessary, I could try put together a small patch which kills a system (reproducible here).

Thanks in advance,

Alexander Holler
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/