Solved: [PATCH 0/4] mm/gup, drm/i915: refactor gup_fast, convert to pin_user_pages()

From: John Hubbard
Date: Thu May 21 2020 - 16:40:43 EST


On 2020-05-21 12:11, John Hubbard wrote:
On 2020-05-21 11:57, Chris Wilson wrote:
Quoting John Hubbard (2020-05-19 01:21:20)
This needs to go through Andrew's -mm tree, due to adding a new gup.c
routine. However, I would really love to have some testing from the
drm/i915 folks, because I haven't been able to run-time test that part
of it.

CI hit

<4> [185.667750] WARNING: CPU: 0 PID: 1387 at mm/gup.c:2699 internal_get_user_pages_fast+0x63a/0xac0


OK, what happened here is that it's WARN()'ing due to passing in the new
FOLL_FAST_ONLY flag, which was not added to the whitelist.

So the fix is easy, and should be applied to the refactoring patch. I'll
send out a v2 of the series, which will effectively have this applied:


diff --git a/mm/gup.c b/mm/gup.c
index 6cbe98c93466..4f0ca3f849d1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2696,7 +2696,8 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
int nr_pinned = 0, ret = 0;

if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
- FOLL_FORCE | FOLL_PIN | FOLL_GET)))
+ FOLL_FORCE | FOLL_PIN | FOLL_GET |
+ FOLL_FAST_ONLY)))
return -EINVAL;

start = untagged_addr(start) & PAGE_MASK;


<4> [185.667752] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 mei_hdcp x86_pkg_temp_thermal coretemp snd_hda_intel snd_intel_dspcfg crct10dif_pclmul snd_hda_codec crc32_pclmul snd_hwdep snd_hda_core ghash_clmulni_intel cdc_ether usbnet mii snd_pcm e1000e mei_me ptp pps_core mei intel_lpss_pci prime_numbers
<4> [185.667774] CPU: 0 PID: 1387 Comm: gem_userptr_bli Tainted: GÂÂÂÂ U 5.7.0-rc5-CI-Patchwork_17704+ #1
<4> [185.667777] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3234.A01.1906141750 06/14/2019
<4> [185.667782] RIP: 0010:internal_get_user_pages_fast+0x63a/0xac0
<4> [185.667785] Code: 24 40 08 48 39 5c 24 38 49 89 df 0f 85 74 fc ff ff 48 83 44 24 50 08 48 39 5c 24 58 49 89 dc 0f 85 e0 fb ff ff e9 14 fe ff ff <0f> 0b b8 ea ff ff ff e9 36 fb ff ff 4c 89 e8 48 21 e8 48 39 e8 0f
<4> [185.667789] RSP: 0018:ffffc90001133c38 EFLAGS: 00010206
<4> [185.667792] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8884999ee800
<4> [185.667795] RDX: 00000000000c0001 RSI: 0000000000000100 RDI: 00007f419e774000
<4> [185.667798] RBP: ffff888453dbf040 R08: 0000000000000000 R09: 0000000000000001
<4> [185.667800] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888453dbf380
<4> [185.667803] R13: ffff8884999ee800 R14: ffff888453dbf3e8 R15: 0000000000000040
<4> [185.667806] FS:Â 00007f419e875e40(0000) GS:ffff88849fe00000(0000) knlGS:0000000000000000
<4> [185.667808] CS:Â 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [185.667811] CR2: 00007f419e873000 CR3: 0000000458bd2004 CR4: 0000000000760ef0
<4> [185.667814] PKRU: 55555554
<4> [185.667816] Call Trace:
<4> [185.667912]Â ? i915_gem_userptr_get_pages+0x1c6/0x290 [i915]
<4> [185.667918]Â ? mark_held_locks+0x49/0x70
<4> [185.667998]Â ? i915_gem_userptr_get_pages+0x1c6/0x290 [i915]
<4> [185.668073]Â ? i915_gem_userptr_get_pages+0x1c6/0x290 [i915]

and then panicked, across a range of systems.
-Chris


btw, the panic seems to indicate an additional, pre-existing problem:
i915_gem_userptr_get_pages(), in this case at least, is not able to
recover from a get_user_pages/pin_user_pages failure.


thanks,
--
John Hubbard
NVIDIA