Re: BUG: unable to handle page fault for address
From: Suren Baghdasaryan
Date: Fri May 16 2025 - 20:14:00 EST
On Fri, May 16, 2025 at 10:03 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> Hi David,
>
> On Fri, May 16, 2025 at 6:13 AM David Wang <00107082@xxxxxxx> wrote:
> >
> > Hi,
> >
> > I caught a page fault when I was changing my nvidia driver:
> > (This happens randomly, I can reproduce it with about 1/3 probability)
> >
> > [Fri May 16 12:05:41 2025] BUG: unable to handle page fault for address: ffff9d28984c3000
> > [Fri May 16 12:05:41 2025] #PF: supervisor read access in kernel mode
> > [Fri May 16 12:05:41 2025] #PF: error_code(0x0000) - not-present page
> > ...
> > [Fri May 16 12:05:41 2025] RIP: 0010:release_module_tags+0x103/0x1b0
> > ...
> > [Fri May 16 12:05:41 2025] Call Trace:
> > [Fri May 16 12:05:41 2025] <TASK>
> > [Fri May 16 12:05:41 2025] codetag_unload_module+0x135/0x160
> > [Fri May 16 12:05:41 2025] free_module+0x19/0x1a0
> > ...
> > (full kernel logs are pasted at the end.)
> >
> > Using a image with DEBUG_INFO, the calltrack parses as:
> >
> > RIP: 0010:release_module_tags (./include/linux/alloc_tag.h:134 lib/alloc_tag.c:352 lib/alloc_tag.c:573)
> > [Fri May 16 12:05:41 2025] codetag_unload_module (lib/codetag.c:355)
> > [Fri May 16 12:05:41 2025] free_module (kernel/module/main.c:1305)
> > [Fri May 16 12:05:41 2025] __do_sys_delete_module (kernel/module/main.c:795)
> >
> > The offending lines in my codebase:
> > 126 static inline struct alloc_tag_counters alloc_tag_read(struct alloc_tag *tag)
> > 127 {
> > ...
> > 131
> > 132 for_each_possible_cpu(cpu) {
> > 133 counter = per_cpu_ptr(tag->counters, cpu);
> > >>>> 134 v.bytes += counter->bytes; <--------------here
> > 135 v.calls += counter->calls;
> >
> >
> > Nvidia drivers are out-tree... there could be some strange behavior in it causes this.. but,
> > when I check the code, I got concerned about lifecycle of tag->counters.
> > Based on following defination:
> > 108 #define DEFINE_ALLOC_TAG(_alloc_tag) \
> > 109 static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr); \
> > 110 static struct alloc_tag _alloc_tag __used __aligned(8) \
> > 111 __section(ALLOC_TAG_SECTION_NAME) = { \
> > 112 .ct = CODE_TAG_INIT, \
> > 113 .counters = &_alloc_tag_cntr };
> > 114
> > _alloc_tag_cntr is the data referenced by tag->counters, but they are in different section,
> > and alloc_tag only prepare storage for section ALLOC_TAG_SECTION_NAME.
> > right?
> > Then what happens to those ".data..percpu" section when module is unloaded?
> > Is it safe to keep using those ".data..percpu" section after module unloaded,
> > or even during module is unloading?
>
> Yes, I think you are right, free_module() calls percpu_modfree() which
> would free the per-cpu memory allocated for the module. Before
> 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous
> memory") we would not unload the module if there were tags which were
> still in use. After that change we load module tags into separate
> memory, so I expected this to work but due to this external reference
> it indeed should lead to UAF.
> I think the simplest way to fix this would be to bypass
> percpu_modfree() inside free_module() when there are module tags still
> referenced, store mod->percpu inside alloc_tag_module_section and free
> it inside clean_unused_module_areas_locked() once we know the counters
> are not used anymore. I'll take a stab at it and will send a patch for
> testing today.
Ok, I went with another implementation, instead dynamically allocating
percpu memory for modules at the module load time. This has another
advantage of not needing extra PERCPU_MODULE_RESERVE currently
required for memory allocation tagging to work.
David, the patch is posted at [1]. Please give it a try and let me
know if the fix works for you.
Thanks,
Suren.
[1] https://lore.kernel.org/all/20250517000739.5930-1-surenb@xxxxxxxxxx/
> Thanks,
> Suren.
>
> >
> > I was expecting page fault when I make following experiments:
> >
> > 1.Load module A
> > 2.Load module B
> > 3.module B alloc memory, and handover the memory to A
> > 4.unload module B
> > (memory profiling report module B has memory not freed)
> > ... after a while....
> > 5.unload module A, where A free the memory. (when A kfree the memory,
> > the counters used to be in module B's ".data..percpu" section should
> > be referenced, it that section is gone, a pagefault should happen).
> >
> > But, after several trials, not page fault reported....
> > Would kernel keeps ".data..percpu" since ALLOC_TAG_SECTION_NAME has reference to it,
> > or I just need wait longer for kernel to "purge" those sections.
> >
> >
> > Full logs:
> > [Fri May 16 12:02:28 2025] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 550.144.03 Mon Dec 30 17:10:10 UTC 2024
> > [Fri May 16 12:02:28 2025] [drm] [nvidia-drm] [GPU ID 0x00002600] Loading driver
> > [Fri May 16 12:02:28 2025] [drm] Initialized nvidia-drm 0.0.0 for 0000:26:00.0 on minor 0
> > [Fri May 16 12:03:45 2025] [drm] [nvidia-drm] [GPU ID 0x00002600] Unloading driver
> > [Fri May 16 12:03:46 2025] nvidia-modeset: Unloading
> > [Fri May 16 12:03:46 2025] nvidia-nvlink: Unregistered Nvlink Core, major device number 240
> > [Fri May 16 12:04:38 2025] VFIO - User Level meta-driver version: 0.3
> > [Fri May 16 12:04:38 2025] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
> >
> > [Fri May 16 12:04:38 2025] nvidia 0000:26:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=none
> > [Fri May 16 12:04:38 2025] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 570.144 Thu Apr 10 20:33:29 UTC 2025
> > [Fri May 16 12:04:38 2025] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 570.144 Thu Apr 10 20:03:03 UTC 2025
> > [Fri May 16 12:04:38 2025] nvidia_drm: Unknown symbol drm_client_setup (err -2)
> > [Fri May 16 12:04:40 2025] nvidia-modeset: Unloading
> > [Fri May 16 12:04:40 2025] nvidia-nvlink: Unregistered Nvlink Core, major device number 239
> > [Fri May 16 12:05:40 2025] VFIO - User Level meta-driver version: 0.3
> > [Fri May 16 12:05:40 2025] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
> >
> > [Fri May 16 12:05:40 2025] nvidia 0000:26:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=none
> > [Fri May 16 12:05:40 2025] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 570.144 Thu Apr 10 20:33:29 UTC 2025
> > [Fri May 16 12:05:40 2025] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 570.144 Thu Apr 10 20:03:03 UTC 2025
> > [Fri May 16 12:05:40 2025] [drm] [nvidia-drm] [GPU ID 0x00002600] Loading driver
> > [Fri May 16 12:05:40 2025] [drm] Initialized nvidia-drm 0.0.0 for 0000:26:00.0 on minor 0
> > [Fri May 16 12:05:40 2025] [drm] [nvidia-drm] [GPU ID 0x00002600] Unloading driver
> > [Fri May 16 12:05:40 2025] nvidia-modeset: Unloading
> > [Fri May 16 12:05:41 2025] nvidia-nvlink: Unregistered Nvlink Core, major device number 239
> > [Fri May 16 12:05:41 2025] BUG: unable to handle page fault for address: ffff9d28984c3000
> > [Fri May 16 12:05:41 2025] #PF: supervisor read access in kernel mode
> > [Fri May 16 12:05:41 2025] #PF: error_code(0x0000) - not-present page
> > [Fri May 16 12:05:41 2025] PGD 163001067 P4D 163001067 PUD 0
> > [Fri May 16 12:05:41 2025] Oops: Oops: 0000 [#1] SMP NOPTI
> > [Fri May 16 12:05:41 2025] CPU: 0 UID: 0 PID: 35898 Comm: modprobe Tainted: G OE 6.15.0-rc6-linan-0 #587 PREEMPT(voluntary)
> > [Fri May 16 12:05:41 2025] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> > [Fri May 16 12:05:41 2025] Hardware name: Micro-Star International Co., Ltd. MS-7B89/B450M MORTAR MAX (MS-7B89), BIOS 2.L0 07/18/2024
> > [Fri May 16 12:05:41 2025] RIP: 0010:release_module_tags+0x103/0x1b0
> > [Fri May 16 12:05:41 2025] Code: e3 4c 03 64 24 08 4c 39 e3 72 bf 8b 0d 86 02 f6 00 31 ed 31 c0 eb 17 48 63 f0 49 8b 54 24 20 83 c0 01 48 8b 34 f5 40 e6 d4 95 <48> 03 2c 16 89 ce 48 63 d0 48 c7 c7 c0 fd d4 95 e8 78 3f f5 ff 8b
> > [Fri May 16 12:05:41 2025] RSP: 0018:ffffb2eb40bdfe00 EFLAGS: 00010202
> > [Fri May 16 12:05:41 2025] RAX: 0000000000000001 RBX: ffffffffc04609d7 RCX: 0000000000000008
> > [Fri May 16 12:05:41 2025] RDX: 0000000000000000 RSI: ffff9d28984c3000 RDI: ffffffff95d4fdc0
> > [Fri May 16 12:05:41 2025] RBP: 0000000000000000 R08: ffff9d25035aed08 R09: 000000000000fcf8
> > [Fri May 16 12:05:41 2025] R10: ffffb2eb40bdfe00 R11: 0000000000000001 R12: ffffffffc04609b0
> > [Fri May 16 12:05:41 2025] R13: ffffffffc19ef000 R14: ffff9d2502ab6870 R15: 0000000000000000
> > [Fri May 16 12:05:41 2025] FS: 00007fe9ee051040(0000) GS:ffff9d28984c3000(0000) knlGS:0000000000000000
> > [Fri May 16 12:05:41 2025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [Fri May 16 12:05:41 2025] CR2: ffff9d28984c3000 CR3: 000000011278d000 CR4: 0000000000350ef0
> > [Fri May 16 12:05:41 2025] Call Trace:
> > [Fri May 16 12:05:41 2025] <TASK>
> > [Fri May 16 12:05:41 2025] codetag_unload_module+0x135/0x160
> > [Fri May 16 12:05:41 2025] free_module+0x19/0x1a0
> > [Fri May 16 12:05:41 2025] __do_sys_delete_module+0x274/0x310
> > [Fri May 16 12:05:41 2025] ? srso_return_thunk+0x5/0x5f
> > [Fri May 16 12:05:41 2025] ? fpregs_assert_state_consistent+0x21/0x50
> > [Fri May 16 12:05:41 2025] ? srso_return_thunk+0x5/0x5f
> > [Fri May 16 12:05:41 2025] do_syscall_64+0x4b/0x120
> > [Fri May 16 12:05:41 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [Fri May 16 12:05:41 2025] RIP: 0033:0x7fe9ed928a67
> > [Fri May 16 12:05:41 2025] Code: 73 01 c3 48 8b 0d 99 83 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 83 0c 00 f7 d8 64 89 01 48
> > [Fri May 16 12:05:41 2025] RSP: 002b:00007ffe17759568 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> > [Fri May 16 12:05:41 2025] RAX: ffffffffffffffda RBX: 0000564f850b7e70 RCX: 00007fe9ed928a67
> > [Fri May 16 12:05:41 2025] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000564f850b7ed8
> > [Fri May 16 12:05:41 2025] RBP: 0000000000000000 R08: 1999999999999999 R09: 0000000000000000
> > [Fri May 16 12:05:41 2025] R10: 00007fe9ed99bac0 R11: 0000000000000206 R12: 0000000000000000
> > [Fri May 16 12:05:41 2025] R13: 0000000000000000 R14: 00007ffe177595a0 R15: 00007ffe1775aa60
> > [Fri May 16 12:05:41 2025] </TASK>
> > [Fri May 16 12:05:41 2025] Modules linked in: vfio_pci_core(E-) vfio(E) drm_client_lib(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) binfmt_misc(E) edac_mce_amd(E) kvm_amd(E) snd_hda_codec_realtek(E) nls_ascii(E) iwlmvm(E) nls_cp437(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) kvm(E) snd_hda_scodec_component(E) mac80211(E) irqbypass(E) uvcvideo(E) vfat(E) snd_hda_intel(E) libarc4(E) ghash_clmulni_intel(E) fat(E) videobuf2_vmalloc(E) snd_intel_dspcfg(E) sha256_ssse3(E) uvc(E) snd_usb_audio(E) sha1_ssse3(E) snd_hda_codec(E) videobuf2_memops(E) iwlwifi(E) drm_ttm_helper(E) videobuf2_v4l2(E) aesni_intel(E) snd_usbmidi_lib(E) snd_hda_core(E) ttm(E) snd_rawmidi(E) gf128mul(E) snd_pcsp(E) snd_seq_device(E) ppdev(E) snd_hwdep(E) crypto_simd(E) drm_kms_helper(E) videodev(E) cryptd(E) cfg80211(E) snd_pcm(E) videobuf2_common(E) rapl(E) snd_timer(E) sp5100_tco(E) drm(E) mc(E) wmi_bmof(E) k10temp(E) tpm_crb(E) rfkill(E) snd(E) sha3_generic(E) video(E) soundcore(E) jitterentropy_rng(E) sha512_ssse3(E) tpm_tis(E) ccp(E)
> > [Fri May 16 12:05:41 2025] sha512_generic(E) tpm_tis_core(E) joydev(E) drbg(E) tpm(E) ansi_cprng(E) libaescfb(E) ecdh_generic(E) ecc(E) parport_pc(E) parport(E) sg(E) rng_core(E) evdev(E) msr(E) loop(E) fuse(E) efi_pstore(E) dm_mod(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) blake2b_generic(E) efivarfs(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) raid1(E) raid0(E) md_mod(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) ahci(E) xhci_pci(E) libahci(E) nvme(E) xhci_hcd(E) libata(E) i2c_piix4(E) i2c_smbus(E) r8169(E) nvme_core(E) realtek(E) usbcore(E) scsi_mod(E) scsi_common(E) usb_common(E) wmi(E) gpio_amdpt(E) gpio_generic(E) button(E) [last unloaded: vfio_pci_core(E)]
> > [Fri May 16 12:05:41 2025] CR2: ffff9d28984c3000
> > [Fri May 16 12:05:41 2025] ---[ end trace 0000000000000000 ]---
> > [Fri May 16 12:05:41 2025] RIP: 0010:release_module_tags+0x103/0x1b0
> > [Fri May 16 12:05:41 2025] Code: e3 4c 03 64 24 08 4c 39 e3 72 bf 8b 0d 86 02 f6 00 31 ed 31 c0 eb 17 48 63 f0 49 8b 54 24 20 83 c0 01 48 8b 34 f5 40 e6 d4 95 <48> 03 2c 16 89 ce 48 63 d0 48 c7 c7 c0 fd d4 95 e8 78 3f f5 ff 8b
> > [Fri May 16 12:05:41 2025] RSP: 0018:ffffb2eb40bdfe00 EFLAGS: 00010202
> > [Fri May 16 12:05:41 2025] RAX: 0000000000000001 RBX: ffffffffc04609d7 RCX: 0000000000000008
> > [Fri May 16 12:05:41 2025] RDX: 0000000000000000 RSI: ffff9d28984c3000 RDI: ffffffff95d4fdc0
> > [Fri May 16 12:05:41 2025] RBP: 0000000000000000 R08: ffff9d25035aed08 R09: 000000000000fcf8
> > [Fri May 16 12:05:41 2025] R10: ffffb2eb40bdfe00 R11: 0000000000000001 R12: ffffffffc04609b0
> > [Fri May 16 12:05:41 2025] R13: ffffffffc19ef000 R14: ffff9d2502ab6870 R15: 0000000000000000
> > [Fri May 16 12:05:41 2025] FS: 00007fe9ee051040(0000) GS:ffff9d28984c3000(0000) knlGS:0000000000000000
> > [Fri May 16 12:05:41 2025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [Fri May 16 12:05:41 2025] CR2: ffff9d28984c3000 CR3: 000000011278d000 CR4: 0000000000350ef0
> > [Fri May 16 12:05:41 2025] note: modprobe[35898] exited with irqs disabled
> > [Fri May 16 12:05:41 2025] note: modprobe[35898] exited with preempt_count 1
> >
> >
> >
> >
> > Thanks
> > David
> >