Re: Performance impact in networking data path tests in Linux 5.5 Kernel

From: Vincent Guittot
Date: Wed Feb 26 2020 - 09:10:48 EST


On Wed, 26 Feb 2020 at 12:45, Rajender M <manir@xxxxxxxxxx> wrote:
>
> Thanks for your response, Vincent.
> Just curious to know, if there are any room for optimizing
> the additional CPU cost.

That's difficult to say, the additional cost is probably link to how
the CPU is involved in the data path. IIUC your results, there is +30%
CPUs for +20% of throughput for the 10GB NIC but only +10% CPU for
+25% of throughput for the 40GB which might have more things done by
HW and needs less action from CPU

>
>
> ïOn 26/02/20, 3:18 PM, "Vincent Guittot" <vincent.guittot@xxxxxxxxxx> wrote:
>
> Hi Rajender,
>
> On Tue, 25 Feb 2020 at 06:46, Rajender M <manir@xxxxxxxxxx> wrote:
> >
> > As part of VMware's performance regression testing for Linux Kernel upstream
> > releases, when comparing Linux 5.5 kernel against Linux 5.4 kernel, we noticed
> > 20% improvement in networking throughput performance at the cost of a 30%
> > increase in the CPU utilization.
>
> Thanks for testing and sharing results with us. It's always
> interesting to get feedbacks from various tests cases
>
> >
> > After performing the bisect between 5.4 and 5.5, we identified the root cause
> > of this behaviour to be a scheduling change from Vincent Guittot's
> > 2ab4092fc82d ("sched/fair: Spread out tasks evenly when not overloaded").
> >
> > The impacted testcases are TCP_STREAM SEND & RECV â on both small
> > (8K socket & 256B message) & large (64K socket & 16K message) packet sizes.
> >
> > We backed out Vincent's commit & reran our networking tests and found that
> > the performance were similar to 5.4 kernel - improvements in networking tests
> > were no more.
> >
> > In our current network performance testing, we use Intel 10G NIC to evaluate
> > all Linux Kernel releases. In order to confirm that the impact is also seen in
> > higher bandwidth NIC, we repeated the same test cases with Intel 40G and
> > we were able to reproduce the same behaviour - 25% improvements in
> > throughput with 10% more CPU consumption.
> >
> > The overall results indicate that the new scheduler change has introduced
> > much better network throughput performance at the cost of incremental
> > CPU usage. This can be seen as expected behavior because now the
> > TCP streams are evenly spread across all the CPUs and eventually drives
> > more network packets, with additional CPU consumption.
> >
> >
> > We have also confirmed this theory by parsing the ESX stats for 5.4 and 5.5
> > kernels in a 4vCPU VM running 8 TCP streams - as shown below;
> >
> > 5.4 kernel:
> > "2132149": {"id": 2132149, "used": 94.37, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-0:rhel7x64-0",
> > "2132151": {"id": 2132151, "used": 0.13, "ready": 0.00, "cstp": 0.00, "name": "vmx-vcpu-1:rhel7x64-0",
> > "2132152": {"id": 2132152, "used": 9.07, "ready": 0.03, "cstp": 0.00, "name": "vmx-vcpu-2:rhel7x64-0",
> > "2132153": {"id": 2132153, "used": 34.77, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-3:rhel7x64-0",
> >
> > 5.5 kernel:
> > "2132041": {"id": 2132041, "used": 55.70, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-0:rhel7x64-0",
> > "2132043": {"id": 2132043, "used": 47.53, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-1:rhel7x64-0",
> > "2132044": {"id": 2132044, "used": 77.81, "ready": 0.00, "cstp": 0.00, "name": "vmx-vcpu-2:rhel7x64-0",
> > "2132045": {"id": 2132045, "used": 57.11, "ready": 0.02, "cstp": 0.00, "name": "vmx-vcpu-3:rhel7x64-0",
> >
> > Note, "used %" in above stats for 5.5 kernel is evenly distributed across all vCPUs.
> >
> > On the whole, this change should be seen as a significant improvement for
> > most customers.
> >
> > Rajender M
> > Performance Engineering
> > VMware, Inc.
> >
>
>