Re: [External] : Re: [PATCH 1/1] IB/mlx5: Add a signature check to received EQEs and CQEs

From: Rohit Nair
Date: Mon Nov 07 2022 - 12:52:11 EST


On 11/6/22 10:03 AM, Leon Romanovsky wrote:

rds-stress exercises the codepath we are modifying here. rds-stress didn't
show much of performance degrade when we ran internally. We also requested
our DB team for performance regression testing and this change passed their
test suite. This motivated us to submit this to upstream.

If there is any other test that is better suited for this change, I am
willing to test it. Please let me know if you have something in mind. We can
revisit this patch after such a test may be.

I agree that, this was a rare debug scenario, but it took lot more than
needed to narrow down[engaged vendor on live sessions]. We are adding this
in the hope to finding the cause at the earliest or at least point us which
direction to look at. We also requested the vendor[mlx] to include some
diagnostics[HW counter], which can help us narrow it faster next time. This
is our attempt to add kernel side of diagnostics.

The thing is that "vendor" failed to explain internally if this debug
code is useful. Like I said, extremely rare debug code shouldn't be part
of main data path.

Thanks


I understand.
Thank you for taking the time to review this patch.


Best,
Rohit.