[git pull] habanalabs pull request for kernel 5.17

From: Oded Gabbay
Date: Mon Dec 27 2021 - 03:03:59 EST


Hi Greg,

This is habanalabs pull request for the merge window of kernel 5.17.
It mainly enhances the driver to deal with extreme cases, such as
reset-during-reset, events during reset and allowing monitoring
applications to continue running during reset.

Full details are in the tag.

Thanks,
Oded

The following changes since commit 1bb866dcb8cf5054de88f592fc0ec1f275ad9d63:

Merge tag 'iio-for-5.17a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next (2021-12-22 12:33:01 +0100)

are available in the Git repository at:

https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git misc-habanalabs-next-2021-12-27

for you to fetch changes up to ce80098db2439ee44403ec6fccd3a10be21c7aff:

habanalabs: support hard-reset scheduling during soft-reset (2021-12-26 14:42:31 +0200)

----------------------------------------------------------------
This tag contains habanalabs driver changes for v5.17:

- Support reset-during-reset. In case the f/w notifies the driver
that the f/w is going to reset the device, the driver should
support that even if it is in the middle of doing another
reset

- Support events from f/w that arrive during device resets.
These events would be ignored which is bad as critical errors
would not be reported and treated by the driver.

- Don't kill processes that hold the control device open during
hard-reset of the device. The control device operations can't
crash if done during hard-reset. And usually, only monitoring
applications are using the control device, so killing them
defies their purpose.

- Fix handling of hwmon nodes when working with legacy f/w

- Change the compute context pointer to be boolean. This pointer
was abused by multiple code paths that wanted fast access to
the compute context structure.

- Add uapi to fetch historical errors. This is necessary as errors
sometimes result in hard-reset where the user application is
being terminated.

- Optimize GAUDI's MMU cache invalidation.

- Add support for loading the latest f/w.

- Add uapi to fetch HBM replacement and pending rows information.

- Multiple bug fixes to the reset code.

- Multiple bug fixes for Multi-CS ioctl code.

- Multiple bug fixes for wait-for-interrupt ioctl code.

- Many small bug fixes and cleanups.

----------------------------------------------------------------
Bharat Jauhari (3):
habanalabs: handle abort scenario for user interrupt
habanalabs: rename reset flags
habanalabs: refactor wait-for-user-interrupt function

Dani Liberman (6):
habanalabs: change wait for interrupt timeout to 64 bit
habanalabs: add support for fetching historic errors
habanalabs: fix race condition in multi CS completion
habanalabs: add SOB information to signal submission uAPI
habanalabs: enable access to info ioctl during hard reset
habanalabs: keep control device alive during hard reset

Guy Zadicario (1):
habanalabs/gaudi: fix debugfs dma channel selection

Oded Gabbay (16):
habanalabs/gaudi: recover from CPU WD event
habanalabs: make hdev creation code more readable
habanalabs: prevent false heartbeat message
habanalabs: abort reset on invalid request
habanalabs: fix soft reset accounting
habanalabs: rename late init after reset function
habanalabs/gaudi: return EPERM on non hard-reset
habanalabs: free signal handle on failure
habanalabs: remove redundant check on ctx_fini
habanalabs: save ctx inside encaps signal
habanalabs: fix etr asid configuration
habanalabs: add helper to get compute context
habanalabs: remove compute context pointer
habanalabs: remove in_debug check in device open
habanalabs: fix hwmon handling for legacy f/w
habanalabs: replace some -ENOTTY with -EINVAL

Ofir Bitton (18):
habanalabs: expand clock throttling information uAPI
habanalabs: debugfs support for larger I2C transactions
habanalabs: handle device TPM boot error as warning
habanalabs: fix possible deadlock in cache invl failure
habanalabs: move device boot warnings to the correct location
habanalabs: add more info ioctls support during reset
habanalabs: change misleading IRQ warning during reset
habanalabs: handle events during soft-reset
habanalabs: return correct clock throttling period
habanalabs: add current PI value to cpu packets
habanalabs: sysfs support for two infineon versions
habanalabs: expose soft reset sysfs nodes for inference ASIC
habanalabs: modify cpu boot status error print
habanalabs: fix endianness when reading cpld version
habanalabs: fix comments according to kernel-doc
habanalabs: refactor reset information variables
habanalabs: add a lock to protect multiple reset variables
habanalabs: support hard-reset scheduling during soft-reset

Ohad Sharabi (11):
habanalabs: modify wait for boot fit in dynamic FW load
habanalabs: revise and document use of boot status flags
habanalabs: adding indication of boot fit loaded
habanalabs: use variable poll interval for fw loading
habanalabs: don't clear previous f/w indications
habanalabs: skip PLL freq fetch
habanalabs: skip read fw errors if dynamic descriptor invalid
habanalabs: wait again for multi-CS if no CS completed
habanalabs: clean MMU headers definitions
habanalabs: prevent wait if CS in multi-CS list completed
habanalabs: handle skip multi-CS if handling not done

Rajaravi Krishna Katta (2):
habanalabs: add dedicated message towards f/w to set power
habanalabs: Move frequency change thread to goya_late_init

Tomer Tayar (5):
habanalabs: align debugfs documentation to alphabetical order
habanalabs: add power information type to POWER_GET packet
habanalabs: pass reset flags to reset thread
habanalabs: add missing kernel-doc comments for hl_device fields
habanalabs: add CPU-CP packet for engine core ASID cfg

Yuri Nudelman (5):
habanalabs: print va_range in vm node debugfs
habanalabs: wrong VA size calculation
habanalabs: make last_mask an MMU property
habanalabs: add enum mmu_op_flags
habanalabs: partly skip cache flush when in PMMU map flow

farah kassabri (3):
habanalabs/gaudi: Fix collective wait bug
habanalabs: add new opcodes for INFO IOCTL
habanalabs: change wait_for_interrupt implementation

.../ABI/testing/debugfs-driver-habanalabs | 23 +-
drivers/misc/habanalabs/common/command_buffer.c | 46 ++-
.../misc/habanalabs/common/command_submission.c | 389 +++++++++++++++------
drivers/misc/habanalabs/common/context.c | 39 ++-
drivers/misc/habanalabs/common/debugfs.c | 97 +++--
drivers/misc/habanalabs/common/device.c | 387 ++++++++++----------
drivers/misc/habanalabs/common/firmware_if.c | 253 ++++++++++----
drivers/misc/habanalabs/common/habanalabs.h | 301 +++++++++++-----
drivers/misc/habanalabs/common/habanalabs_drv.c | 150 ++++----
drivers/misc/habanalabs/common/habanalabs_ioctl.c | 195 +++++++++--
drivers/misc/habanalabs/common/hw_queue.c | 5 +-
drivers/misc/habanalabs/common/hwmon.c | 209 +++++++++--
drivers/misc/habanalabs/common/irq.c | 14 +-
drivers/misc/habanalabs/common/memory.c | 78 +++--
drivers/misc/habanalabs/common/mmu/mmu.c | 25 ++
drivers/misc/habanalabs/common/mmu/mmu_v1.c | 18 +-
drivers/misc/habanalabs/common/sysfs.c | 56 ++-
drivers/misc/habanalabs/gaudi/gaudi.c | 313 ++++++++++++-----
drivers/misc/habanalabs/gaudi/gaudiP.h | 4 +-
drivers/misc/habanalabs/gaudi/gaudi_coresight.c | 4 +-
drivers/misc/habanalabs/goya/goya.c | 165 +++++++--
drivers/misc/habanalabs/goya/goyaP.h | 14 +-
drivers/misc/habanalabs/goya/goya_coresight.c | 4 +-
drivers/misc/habanalabs/goya/goya_hwmgr.c | 31 +-
drivers/misc/habanalabs/include/common/cpucp_if.h | 62 +++-
.../misc/habanalabs/include/common/hl_boot_if.h | 4 +
.../habanalabs/include/hw_ip/mmu/mmu_general.h | 19 +-
.../misc/habanalabs/include/hw_ip/mmu/mmu_v1_0.h | 18 +-
.../misc/habanalabs/include/hw_ip/mmu/mmu_v1_1.h | 20 +-
include/uapi/misc/habanalabs.h | 166 +++++++--
30 files changed, 2185 insertions(+), 924 deletions(-)