Re: [Intel-wired-lan] [PATCH v2] ice: wait for reset completion in ice_resume()
From: Aaron Ma
Date: Tue Apr 28 2026 - 03:59:54 EST
On Mon, Apr 27, 2026 at 6:13 PM Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote:
>
> Dear Aaron,
>
>
> Thank you for your patch.
>
> Am 24.04.26 um 05:03 schrieb Aaron Ma via Intel-wired-lan:
> > ice_resume() schedules an asynchronous PF reset and returns
> > immediately. The reset runs later in ice_service_task(). If
> > userspace tries to bring up the net device before the reset
> > finishes, ice_open() fails with -EBUSY:
> >
> > ice_resume()
> > ice_schedule_reset() # sets ICE_PFR_REQ, returns
> > ...
> > ice_open()
> > ice_is_reset_in_progress() # ICE_PFR_REQ still set, -EBUSY
> > ...
> > ice_service_task()
> > ice_do_reset()
> > ice_rebuild() # clears ICE_PFR_REQ, too late
> >
> > Reproduced on E800 series NICs during suspend/resume with irdma
> > enabled, where the aux device probe widens the race window.
>
> Please document, how you reproduced it, and also paste possible messages
> by Linux or NetworkManager, so that people can easily search for the commit.
>
The error message is "can't open net device while reset is in progress"
I can add it in v3 if you like.
> > Wait for the reset to complete before returning from ice_resume().
>
> Please mention the delay length in the commit message.
The timeout is 10 * HZ (10 seconds), matching the existing usage in
ice_devlink_info_get() for the same ice_wait_for_reset() call. In
practice the wait completes in ~300ms.
>
> > Fixes: 769c500dcc1e ("ice: Add advanced power mgmt for WoL")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Signed-off-by: Aaron Ma <aaron.ma@xxxxxxxxxxxxx>
> > ---
> > v2: reword comment to clarify best-effort semantics (Kohei Enju)
> >
> > drivers/net/ethernet/intel/ice/ice_main.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> > index 5f92377d4dfc2..a81eb21ea87c1 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > @@ -5635,6 +5635,15 @@ static int ice_resume(struct device *dev)
> > /* Restart the service task */
> > mod_timer(&pf->serv_tmr, round_jiffies(jiffies + pf->serv_tmr_period));
> >
> > + /* Best-effort wait for the scheduled reset to finish so that the
> > + * device is operational before returning. Without this, userspace
> > + * (e.g. NetworkManager) may try to open the net device while the
> > + * asynchronous reset is still in progress, hitting -EBUSY.
> > + */
> > + ret = ice_wait_for_reset(pf, 10 * HZ);
>
> Why not pass a delay in micro/milliseconds?
ice_wait_for_reset() takes jiffies — that's the existing API.
>
> > + if (ret)
> > + dev_err(dev, "Wait for reset failed during resume: %d\n", ret);
>
> Mention the delay?
Good point. I'll include the timeout in the error message in v3.
Thanks,
Aaron
>
> > +
> > return 0;
> > }
> >
>
>
> Kind regards,
>
> Paul