Re: SCSI sr driver: parallel writes to optical serialized which hurts performance (sr_mutex)

From: Thomas Schmitt
Date: Mon Mar 07 2016 - 08:11:37 EST


Hi,

i wrote:
> > Given the old reports of Otto Meta about possible race conditions
> > with drives at the same IDE controller, and the rareness of IDE
> > attached drives nowadays, i propose to keep the global sr_mutex lock
> > for IDE attached drives.

One Thousand Gnomes <gnomes@xxxxxxxxxxxxxxxxxxx> wrote:
> If there are race conditions present in the libata drivers then they want
> fixing there.

>From the view of software architecture: of course, yes.

But the research of Johan de Jong shows that this patch was
proposed several times and always failed to be decided due to
problems when testing heavy concurrency on IDE attached drives.

Newest threads known to me (besides this one) were started by Tim Small
in november 2014:
"[PATCH 0/4] Fix performance burning or extracting audio etc.
from multiple optical drives."
http://marc.info/?t=141692734400009&r=1&w=2
"Very slow throughput when using cdparanoia on two SATA CDROM drives with /dev/sr but not /dev/sg"
http://marc.info/?t=141528207400003&r=1&w=2
In the middle of the discussion Jens Axboe was positive towards
the issue. But then came IDE problems.

It is not clear to me whether the reported problems existed already
with the Big Kernel Lock and whether they do not exist with the global
sr_mutex lock which is currently in drivers/scsi/sr.c.

Especially the problem reports of Otto Meta in 2013 are not explainable
alone by wrongly directed SCSI commands or confused householding in
the lower drivers. In
http://marc.info/?l=linux-scsi&m=135734072119667&w=2
he reports that a drive tray was stuck out and moved in only on
command eject -t, but not on pressing the drive's eject button.

This is not SCSI MMC (as payload of ATAPI) behavior.
The SCSI command 1Eh PREVENT/ALLOW MEDIUM REMOVAL is defined in MMC-5
to override the definition in SPC-3. MMC-5, 6.14 says about it:
"[...] requests that the Drive enable or disable the removal of the
medium in the Drive. The Drive shall not allow medium removal if any
Host currently has medium removal prevented."
The drive cannot protect the medium when the tray is out. So being
stuck in this state is not normal on drive firmware level.


> The old IDE drivers are basically obsoleted by libata for
> all real world uses and most "IDE" devices are actually SATA now anyway.

Of course, if we can get reports that a modern kernel on a machine
with two optical drives on the same IDE controller works fine,
then we do not have to care for older kernels.

But given the situation i see, it seems better to handle all IDE
drives like they are handled now, and to only let the SATA or USB
attached drives perform per-drive locking.
We have several positive reports with SATA drives. So i consider it
proven that no concurrency problems exist before SATA processing
gets separated from IDE processing.

If still concurrency problems show up on IDE, then they cannot be
blamed on the relaxed locking of the other drives.
If IDE users want no discrimination, one could give them a kernel
configuration option and let them search for problems on their own
risk. Maybe they find out what's really wrong in IDE.

(Uninformed guess:
include/uapi/linux/major.h and block/genhd.c function
add_disk(struct gendisk *disk) make me think that one could
possibly recognize IDE attached drives by comparing
static int ide_majors[] =
{IDE0_MAJOR, IDE1_MAJOR, IDE2_MAJOR, IDE3_MAJOR, IDE4_MAJOR,
IDE5_MAJOR, IDE6_MAJOR, IDE7_MAJOR, IDE8_MAJOR, IDE9_MAJOR,
-1};
with
MAJOR(disk_to_dev(disk)->devt)
)


Have a nice day :)

Thomas