Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM

From: Vineeth Remanan Pillai
Date: Mon Jan 30 2017 - 11:50:43 EST



On 01/29/2017 03:09 PM, Boris Ostrovsky wrote:

There are couple of problems with this patch.
1. The 'if' clause now evaluates to true on pretty much every call to xennet_alloc_rx_buffers().
Thanks for catching this. In my testing I did not notice this - mostly because of the nature of the workload in my testing.

2. It tickles a latent bug during resume where the timer triggers before we re-connect. The trouble is that we now try to dereference queue->rx.sring which is NULL since we disconnect in netfront_resume(). (Curiously, I only observe it with 32-bit guests)
I think we may hit this bug after removing the timer as well. We call RING_PUSH_REQUESTS_AND_CHECK_NOTIFY soon after, which also dereference queue->rx.sring.

Thanks,
Vineeth