Re: [v3 PATCH 1/2] mm: swap: check if swap backing device is congested or not

From: Yang Shi
Date: Fri Dec 28 2018 - 20:41:11 EST




On 12/28/18 4:42 PM, Andrew Morton wrote:
On Sat, 22 Dec 2018 05:40:19 +0800 Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote:

Swap readahead would read in a few pages regardless if the underlying
device is busy or not. It may incur long waiting time if the device is
congested, and it may also exacerbate the congestion.

Use inode_read_congested() to check if the underlying device is busy or
not like what file page readahead does. Get inode from swap_info_struct.
Although we can add inode information in swap_address_space
(address_space->host), it may lead some unexpected side effect, i.e.
it may break mapping_cap_account_dirty(). Using inode from
swap_info_struct seems simple and good enough.

Just does the check in vma_cluster_readahead() since
swap_vma_readahead() is just used for non-rotational device which
much less likely has congestion than traditional HDD.

Although swap slots may be consecutive on swap partition, it still may be
fragmented on swap file. This check would help to reduce excessive stall
for such case.
Some words about the observed effects of the patch would be more than
appropriate!

Yes, sure. Actually, this could reduce the latency long tail of do_swap_page() on a congested system.

The test on my virtual machine with emulated HDD shows:

Without swap congestion check:
page_fault1_thr-1490Â [023]ÂÂ 129.311706: funcgraph_entry:ÂÂÂÂÂ # 57377.796 us |Â do_swap_page();
Âpage_fault1_thr-1490Â [023]ÂÂ 129.369103: funcgraph_entry: 5.642 usÂÂ |Â do_swap_page();
Âpage_fault1_thr-1490Â [023]ÂÂ 129.369119: funcgraph_entry:ÂÂÂÂÂ # 1289.592 us |Â do_swap_page();
Âpage_fault1_thr-1490Â [023]ÂÂ 129.370411: funcgraph_entry: 4.957 usÂÂ |Â do_swap_page();
Âpage_fault1_thr-1490Â [023]ÂÂ 129.370419: funcgraph_entry: 1.940 usÂÂ |Â do_swap_page();
Âpage_fault1_thr-1490Â [023]ÂÂ 129.378847: funcgraph_entry:ÂÂÂÂÂ # 1411.385 us |Â do_swap_page();
Âpage_fault1_thr-1490Â [023]ÂÂ 129.380262: funcgraph_entry: 3.916 usÂÂ |Â do_swap_page();
Âpage_fault1_thr-1490Â [023]ÂÂ 129.380275: funcgraph_entry:ÂÂÂÂÂ # 4287.751 us |Â do_swap_page();


With swap congestion check:
ÂÂÂÂÂ runtest.py-1417Â [020]ÂÂ 301.925911: funcgraph_entry:ÂÂÂÂÂ # 9870.146 us |Â do_swap_page();
ÂÂÂÂÂ runtest.py-1417Â [020]ÂÂ 301.935785: funcgraph_entry: 9.802 usÂÂ |Â do_swap_page();
ÂÂÂÂÂ runtest.py-1417Â [020]ÂÂ 301.935799: funcgraph_entry: 3.551 usÂÂ |Â do_swap_page();
ÂÂÂÂÂ runtest.py-1417Â [020]ÂÂ 301.935806: funcgraph_entry: 2.142 usÂÂ |Â do_swap_page();
ÂÂÂÂÂ runtest.py-1417Â [020]ÂÂ 301.935853: funcgraph_entry: 6.938 usÂÂ |Â do_swap_page();
ÂÂÂÂÂ runtest.py-1417Â [020]ÂÂ 301.935864: funcgraph_entry: 3.765 usÂÂ |Â do_swap_page();
ÂÂÂÂÂ runtest.py-1417Â [020]ÂÂ 301.935871: funcgraph_entry: 3.600 usÂÂ |Â do_swap_page();
ÂÂÂÂÂ runtest.py-1417Â [020]ÂÂ 301.935878: funcgraph_entry: 7.202 usÂÂ |Â do_swap_page();


The long tail latency (>1000us) is reduced significantly.

BTW, do you need I resend the patch with the above information appended into the commit log?

Thanks,
Yang