Re: [PATCH v2] fbdev: Fix invalid page access after closing deferred I/O devices

From: Takashi Iwai
Date: Mon Jan 30 2023 - 06:18:14 EST


On Mon, 30 Jan 2023 10:28:16 +0100,
Thomas Zimmermann wrote:
>
> Hi
>
> Am 30.01.23 um 09:52 schrieb Takashi Iwai:
> > On Mon, 30 Jan 2023 09:28:36 +0100,
> > Thomas Zimmermann wrote:
> >>
> >> Hi
> >>
> >> Am 29.01.23 um 09:28 schrieb Takashi Iwai:
> >>> When a fbdev with deferred I/O is once opened and closed, the dirty
> >>> pages still remain queued in the pageref list, and eventually later
> >>> those may be processed in the delayed work. This may lead to a
> >>> corruption of pages, hitting an Oops.
> >>
> >> Do you have more information on this problem?
> >
> > The details are in SUSE bugzilla, but that's an internal bug entry
> > (and you know the number :) It happens at the following at least:
> >
> > - A VM is started with VGA console, no fb, on the installer
> > - VM is switched to bochs drm
> > - Start fbiterm on VT1, switching to the graphics mode on VT
> > - Exit fbiterm, going back to the text mode on VT;
> > at this moment, it gets Oops like:
> >
> > [ 42.338319][ T122] BUG: unable to handle page fault for address:
> > ffffe570c1000030
> > [ 42.340063][ T122] #PF: supervisor read access in kernel mode
> > [ 42.340519][ T122] #PF: error_code(0x0000) - not-present page
> > [ 42.340979][ T122] PGD 34c38067 P4D 34c38067 PUD 34c37067 PMD 0
> > [ 42.341456][ T122] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [ 42.341853][ T122] CPU: 1 PID: 122 Comm: kworker/1:2 Not tainted
> > 5.14.21-150500.5.g2ad24ee-default #1 SLE15-SP5 (unreleased)
> > b7a28d028376a517e888a7ff28c5e5dede93267c
> > [ 42.343000][ T122] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
> > [ 42.343929][ T122] Workqueue: events fb_deferred_io_work
> > [ 42.344355][ T122] RIP: 0010:page_mapped+0x5e/0x90
> > [ 42.344743][ T122] Code: a8 01 75 d7 8b 47 30 f7 d0 c1 e8 1f c3 cc cc cc cc
> > 48 89 df e8 33 9c 05 00 89 c1 31 c0 85 c9 74 13 eb d3 48 c1 e2 06 48 01 da <8b>
> > 42 30 85 c0 79 c0 83 c1 01 48 8b 33 48 63 d1 b8 01 00 00 00 f7
> > [ 42.346285][ T122] RSP: 0018:ffffb68640207e08 EFLAGS: 00010286
> > [ 42.346749][ T122] RAX: 00000000b3aea8f0 RBX: ffffe570c0f00000 RCX:
> > 0000000000004000
> > [ 42.347355][ T122] RDX: ffffe570c1000000 RSI: 000fffffc0010009 RDI:
> > ffffe570c0f00000
> > [ 42.347960][ T122] RBP: ffffffffc0503050 R08: 0000000000000000 R09:
> > 0000000000000001
> > [ 42.348568][ T122] R10: 0000000000000000 R11: ffffb68640207c88 R12:
> > ffffffffc0503020
> > [ 42.349180][ T122] R13: ffff921281dcdc00 R14: ffff9212bcf08000 R15:
> > ffffe570c0f00000
> > [ 42.349789][ T122] FS: 0000000000000000(0000) GS:ffff9212b3b00000(0000)
> > knlGS:0000000000000000
> > [ 42.350471][ T122] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 42.350975][ T122] CR2: ffffe570c1000030 CR3: 000000001b810000 CR4:
> > 00000000000006e0
> > [ 42.351588][ T122] Call Trace:
> > [ 42.351845][ T122] <TASK>
> > [ 42.352069][ T122] page_mkclean+0x6e/0xc0
> > [ 42.352400][ T122] ? page_referenced_one+0x190/0x190
> > [ 42.353714][ T122] ? pmdp_collapse_flush+0x60/0x60
> > [ 42.354106][ T122] fb_deferred_io_work+0x13d/0x190
> > [ 42.354496][ T122] process_one_work+0x267/0x440
> > [ 42.354866][ T122] ? process_one_work+0x440/0x440
> > [ 42.355247][ T122] worker_thread+0x2d/0x3d0
> > [ 42.355590][ T122] ? process_one_work+0x440/0x440
> > [ 42.355972][ T122] kthread+0x156/0x180
> > [ 42.356281][ T122] ? set_kthread_struct+0x50/0x50
> > [ 42.356662][ T122] ret_from_fork+0x22/0x30
> > [ 42.357006][ T122] </TASK>
> >
> > The page info shows that it's a compound page but it's somehow
> > broken. On VM, it's triggered reliably with the scenario above,
> > always at the same position.
> >
> > FWIW, the Oops is hit even if there is no rewrite on the screen.
> > That is, another procedure is:
> > - Start VM, run fbiterm on VT1
> > - Switch to VT2, text mode
> > - On VT2, kill fbiterm; the crash still happens even if no screen
> > change is performed
> >
> >> The mmap'ed buffer of the fbdev device comes from a vmalloc call. That
> >> memory's location never changes; even across pairs of open/close on
> >> the device file. I'm surprised that a page entry becomes invalid.
> >>
> >> In drm_fbdev_cleanup(), we first remove the fbdefio at [1] and then
> >> vfree() the shadow buffer. So the memory should still be around until
> >> fbdevio is gone.
> >
> > Yes, that's the puzzling part, too. Also, another thing is that the
> > bug couldn't be triggered easily when the fb is started in a different
> > way. e.g. when you run fbiterm & exit on the VM that had efifb, it
> > didn't hit.
> >
> > So, overall, it might be that I'm scratching a wrong surface. But at
> > least it "fixes" the problem above apparently, and the deferred io
> > base code itself has certainly the potential problem in general as my
> > patch suggests.
>
> Are there multiple graphics devices? There's just recently been a
> bugfix where graphics devices accidentally shared the same list of
> deferred pages. See
>
>
> https://lore.kernel.org/dri-devel/20230121192418.2814955-4-javierm@xxxxxxxxxx/

No, it's a KVM with a single VGA.

> >> [1]
> >> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_fb_helper.c#L2146
> >>
> >>>
> >>> This patch makes sure to cancel the delayed work and clean up the
> >>> pageref list at closing the device for addressing the bug. A part of
> >>> the cleanup code is factored out as a new helper function that is
> >>> called from the common fb_release().
> >>
> >> The delayed work is required to copy the framebuffer to the device
> >> output. So if it's just canceled, could this result in missing
> >> updates?
> >>
> >> There's a call to cancel_delayed_work_sync() in the new helper
> >> fb_deferred_io_release(). Is this the right function? Maybe
> >> flush_delayed_work() is a better choice.
> >
> > I thought of that, but took a shorter path.
> > OK, let's check whether this keeps working with that change.
>
> I read that cancel_() is not enough and needs to be followed by a
> flush_() to ensure quiescence.

That's not the case, I believe. As long as the code after that cleans
up the pending pages, the cancel_sync alone must be fine.
(Otherwise so many other code paths would hit the problem.)

> So maybe we should call that flush_
> unconditionally.
>
> >>> Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@xxxxxxxxx>
> >>> Cc: <stable@xxxxxxxxxxxxxxx>
> >>> Signed-off-by: Takashi Iwai <tiwai@xxxxxxx>
> >>
> >> This could use a Fixes tag. It's not exactly clear to me when this
> >> problem got originally introduced, but the recent refactoring seems a
> >> candidate.
> >>
> >> Fixes: 56c134f7f1b5 ("fbdev: Track deferred-I/O pages in pageref struct")
> >
> > Hrm, this might be. Maybe Patrik can test with the revert of this?
>
> That's not easily revertable.

Indeed.

> >> Cc: Thomas Zimmermann <tzimmermann@xxxxxxx>
> >> Cc: Javier Martinez Canillas <javierm@xxxxxxxxxx>
> >> Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>
> >> Cc: Maxime Ripard <mripard@xxxxxxxxxx>
> >> Cc: Zack Rusin <zackr@xxxxxxxxxx>
> >> Cc: VMware Graphics Reviewers <linux-graphics-maintainer@xxxxxxxxxx>
> >> Cc: Jaya Kumar <jayalk@xxxxxxxxxxxx>
> >> Cc: Daniel Vetter <daniel@xxxxxxxx>
> >> Cc: "K. Y. Srinivasan" <kys@xxxxxxxxxxxxx>
> >> Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> >> Cc: Wei Liu <wei.liu@xxxxxxxxxx>
> >> Cc: Dexuan Cui <decui@xxxxxxxxxxxxx>
> >> Cc: Steve Glendinning <steve.glendinning@xxxxxxxxxxx>
> >> Cc: Bernie Thompson <bernie@xxxxxxxxxxxx>
> >> Cc: Helge Deller <deller@xxxxxx>
> >> Cc: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx>
> >> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> >> Cc: Stephen Kitt <steve@xxxxxxx>
> >> Cc: Peter Suti <peter.suti@xxxxxxxxxxxxxxxxxxx>
> >> Cc: Sam Ravnborg <sam@xxxxxxxxxxxx>
> >> Cc: Geert Uytterhoeven <geert+renesas@xxxxxxxxx>
> >> Cc: ye xingchen <ye.xingchen@xxxxxxxxxx>
> >> Cc: Petr Mladek <pmladek@xxxxxxxx>
> >> Cc: John Ogness <john.ogness@xxxxxxxxxxxxx>
> >> Cc: Tom Rix <trix@xxxxxxxxxx>
> >> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx
> >> Cc: linux-fbdev@xxxxxxxxxxxxxxx
> >> Cc: linux-hyperv@xxxxxxxxxxxxxxx
> >> Cc: <stable@xxxxxxxxxxxxxxx> # v5.19+
> >
> > Nah, please don't. Too many Cc's, literally a spam.
>
> Ok.
>
> >
> >>> ---
> >>> v1->v2: Fix build error without CONFIG_FB_DEFERRED_IO
> >>>
> >>> drivers/video/fbdev/core/fb_defio.c | 10 +++++++++-
> >>> drivers/video/fbdev/core/fbmem.c | 4 ++++
> >>> include/linux/fb.h | 1 +
> >>> 3 files changed, 14 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
> >>> index c730253ab85c..583cbcf09446 100644
> >>> --- a/drivers/video/fbdev/core/fb_defio.c
> >>> +++ b/drivers/video/fbdev/core/fb_defio.c
> >>> @@ -313,7 +313,7 @@ void fb_deferred_io_open(struct fb_info *info,
> >>> }
> >>> EXPORT_SYMBOL_GPL(fb_deferred_io_open);
> >>> -void fb_deferred_io_cleanup(struct fb_info *info)
> >>> +void fb_deferred_io_release(struct fb_info *info)
> >>> {
> >>> struct fb_deferred_io *fbdefio = info->fbdefio;
> >>> struct page *page;
> >>> @@ -327,6 +327,14 @@ void fb_deferred_io_cleanup(struct fb_info *info)
> >>> page = fb_deferred_io_page(info, i);
> >>> page->mapping = NULL;
> >>> }
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(fb_deferred_io_release);
> >>
> >> It's all in the same module. No need to export this symbol.
> >
> > I noticed it, too, but just keep the same style as other functions :)
> > That said, the other exported symbols are also useless. I can prepare
> > another patch to clean it up.
>
> Your choice, but appreciated.

Rather a choice by the maintainer, I'd say.


thanks,

Takashi