Re: io.latency controller apparently not working
From: Josef Bacik
Date: Fri Aug 16 2019 - 13:59:37 EST
On Fri, Aug 16, 2019 at 07:52:40PM +0200, Paolo Valente wrote:
>
>
> > Il giorno 16 ago 2019, alle ore 15:21, Josef Bacik <josef@xxxxxxxxxxxxxx> ha scritto:
> >
> > On Fri, Aug 16, 2019 at 12:57:41PM +0200, Paolo Valente wrote:
> >> Hi,
> >> I happened to test the io.latency controller, to make a comparison
> >> between this controller and BFQ. But io.latency seems not to work,
> >> i.e., not to reduce latency compared with what happens with no I/O
> >> control at all. Here is a summary of the results for one of the
> >> workloads I tested, on three different devices (latencies in ms):
> >>
> >> no I/O control io.latency BFQ
> >> NVMe SSD 1.9 1.9 0.07
> >> SATA SSD 39 56 0.7
> >> HDD 4500 4500 11
> >>
> >> I have put all details on hardware, OS, scenarios and results in the
> >> attached pdf. For your convenience, I'm pasting the source file too.
> >>
> >
> > Do you have the fio jobs you use for this?
>
> The script mentioned in the draft (executed with the command line
> reported in the draft), executes one fio instance for the target
> process, and one fio instance for each interferer. I couldn't do with
> just one fio instance executing all jobs, because the weight parameter
> doesn't work in fio jobfiles for some reason, and because the ioprio
> class cannot be set for individual jobs.
>
> In particular, the script generates a job with the following
> parameters for the target process:
>
> ioengine=sync
> loops=10000
> direct=0
> readwrite=randread
> fdatasync=0
> bs=4k
> thread=0
> filename=/mnt/scsi_debug/largefile_interfered0
> iodepth=1
> numjobs=1
> invalidate=1
>
> and a job with the following parameters for each of the interferers,
> in case, e.g., of a workload made of reads:
>
> ioengine=sync
> direct=0
> readwrite=read
> fdatasync=0
> bs=4k
> filename=/mnt/scsi_debug/largefileX
> invalidate=1
>
> Should you fail to reproduce this issue by creating groups, setting
> latencies and starting fio jobs manually, what if you try by just
> executing my script? Maybe this could help us spot the culprit more
> quickly.
Ah ok, you are doing it on a mountpoint. Are you using btrfs? Cause otherwise
you are going to have a sad time. The other thing is you are using buffered,
which may or may not hit the disk. This is what I use to test io.latency
https://patchwork.kernel.org/patch/10714425/
I had to massage it since it didn't apply directly, but running this against the
actual block device, with O_DIRECT so I'm sure to be measure the actual impact
of the controller, it all works out fine. Thanks,
Josef