The ATA autopsy has a found medium error and decided that reset is notIf you had an error, then you already are guaranteed that you will not see any
required - this is similar in that regard to the "unaligned write to a
sequential write required zone on SMR" error you mentioned from your
test previously. The problem in this is that for hisi_sas we depend on
disk reset to release driver resources associated with ATA QCs. That is
because it is only after reset that we can guarantee that no IO
associated with the disk will complete in HW and it is safe to release
the resources.
completion at all since the SATA drive is in error mode already. But I see the
point here. The HBA internal qc resources need to be cleared and that seems to
be done only with a device reset big hammer.
But pm8001 seems different here with regards releasing resources. I findYou lost me: ata_eh_recover() will call ata_eh_reset() only if the ATA_EH_RESET
that when EH kicks in from NCQ error and libsas tries to abort missing
commands, the pm8001_abort_task() -> sas_execute_internal_abort_single()
causes the original IO to complete as aborted - this is good, as then we
may release the resources there. hisi_sas has no such feature.
But the pm8001 manual and current driver indicate that the
OPC_INB_SATA_ABORT command should be sent after read log ext when
handling NCQ error, regardless of an autopsy. I send OPC_INB_SATA_ABORT
in ata_eh_reset() -> pm8001_I_T_nexus_reset() -> pm8001_send_abort_all()
action flag is set. So are you saying that even though it is not needed, you
still need to set ATA_EH_RESET for pm8001 ?
As I mentioned before, I saw nowhere better to callWe could add a new op ->eh_link_autopsy which we can call if defined after the
pm8001_send_abort_all() for this. I would rather not do it in
ata_eh_reset() -> pm8001_I_T_nexus_reset()
call to ata_eh_analyze_ncq_error() in ata_eh_link_autopsy(). With that, you can
set ATA_EH_RESET for the hisi driver and only do pm8001_send_abort_all() for
pm8001 (that will be done after the read log 10h).
How about this modified approach:See the above. the new op may decided a reset is needed (hisi case) and even if
- Continue to set ATA_EH_RESET in sas_ata_device_link_abort()
- pm8001_I_T_nexus_reset() will only call pm8001_send_abort_all() when
the driver is in NCQ error state and not do a hard reset in that case.
- But I am not sure if that works as the autopsy for NCQ error may have
decided that a hardreset was really required. Hmmm..
the standard autopsy does not make that decision, the flag is set and
ata_eh_recover() will reset the device. For the pm8001 case, no reset set with
the new op, only pm8001_send_abort_all(). And even if ata_eh_link_autopsy()
decides that a reset is needed after calling the new op, that would still be OK
I think.