RE: [PATCH 0/4] Add Toshiba Visconti DNN image processing accelerator driver

From: yuji2.ishikawa
Date: Tue Jul 12 2022 - 01:07:01 EST


Hi Hans,

I'm going to submit DNN driver (with misc driver style) again.
Updated version will include corrections suggested in reviews.

For example:
* apply checkpatch.pl --strict
* uppercase letters in function name -> make them lowercase
* use of integer types: uint32_t -> u32 ( _u32 in user API header )

Please let me know further actions which should be done before submission.

Regards,
Yuji Ishikawa

> -----Original Message-----
> From: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開)
> Sent: Tuesday, July 5, 2022 10:17 AM
> To: Hans Verkuil <hverkuil@xxxxxxxxx>; robh+dt@xxxxxxxxxx; iwamatsu
> nobuhiro(岩松 信洋 □SWC◯ACT) <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>;
> sumit.semwal@xxxxxxxxxx; christian.koenig@xxxxxxx
> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> linux-media@xxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx;
> linaro-mm-sig@xxxxxxxxxxxxxxxx
> Subject: RE: [PATCH 0/4] Add Toshiba Visconti DNN image processing
> accelerator driver
>
> Hi Hans,
>
> Thank you for your advice.
> I understand that V4L2 M2M driver is recommended for a group including
> Affine and Pyramid.
>
> As for DSPIF, our application SW engineer said that there is another version of
> firmware to handle neural network.
> Therefore, DSP can handle not only images, but also some array/list typed
> data.
> It would be better to handle DSPIF as a misc device as well as DNN.
>
> It is still difficult for me to imagine a whole structure of M2M type driver and
> corresponding userland application.
> What do they look like? Is there any existing driver suitable for reference?
>
> Let me think about a Pyramid HW which receives a planar (R/G/B as different
> planes, not in interleaved RGBRGB format) image, and yields 3 planar images.
> Is it covered by only 1 fd, or multiple fds are required?
> * A fd for Pyramid driver:
> * OUTPUT queue for receiving a picture?
> * OUTPUT or OUTPUT_MPLANE?
> * Q: Typically, RGB image is handled with single plane buffer in V4L2. RGB
> images do not have fourcc for multi-planar format (while YUV images do).
> * How should I explicitly request R/G/B planar buffer in V4L2
> semantics?
> * CAPTURE queue for yielding a picture?
> * CAPTURE or CAPTURE_MPLANE?
> * Again: RGB interleaved vs R/G/B planar
> * Q: If I want to capture multiple images (with different resolution) at a time,
> how should I do?
> * Do I have to add other fds for 2nd, 3rd CAPTURE queue?
> * Or, multiple images can be packed in the same CAPTURE buffer?
> * Or, multiple images can be placed in CAPTURE_MPLANE buffer ?
> * Compound control to receive a configuration.
>
> Regards,
> Yuji
>
> > -----Original Message-----
> > From: Hans Verkuil <hverkuil@xxxxxxxxx>
> > Sent: Wednesday, June 29, 2022 10:44 PM
> > To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開)
> > <yuji2.ishikawa@xxxxxxxxxxxxx>; robh+dt@xxxxxxxxxx; iwamatsu
> > nobuhiro(岩松
> > 信洋 □SWC◯ACT) <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>;
> > sumit.semwal@xxxxxxxxxx; christian.koenig@xxxxxxx
> > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
> > linux-kernel@xxxxxxxxxxxxxxx; linux-media@xxxxxxxxxxxxxxx;
> > dri-devel@xxxxxxxxxxxxxxxxxxxxx; linaro-mm-sig@xxxxxxxxxxxxxxxx
> > Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing
> > accelerator driver
> >
> > My apologies for the late reply...
> >
> > On 01/06/2022 03:40, yuji2.ishikawa@xxxxxxxxxxxxx wrote:
> > > Hi Hans,
> > >
> > > Thank you for your advice.
> > > I prepared some description of DNN accelerator and its usage.
> > >
> > > #### Handling memory blocks for Visconti5 accelerators
> > >
> > > Visconti5 Image-Processing-Accelerators do not have fine grained
> > > IOMMU,
> > as CPU have.
> > > Therefore, memory region to be passed to the accelerators should be
> > physically contiguous.
> > > We use DMA-BUF backed by CMA (Contiguous Memory Allocator) to
> > > allocate
> > memory regions for sharing between CPU/IPAs.
> > > Originally, in v4.19 based implementation, the ION allocator was
> > > used to
> > allocate DMA-BUF instances.
> > > For the latest implementation, DMA-BUF HEAPS is used.
> > >
> > > Two structure types are used to represent memory region passed to drivers.
> > > * struct drv_ipa_buffer_info
> > > * to describe whole DMA-BUF instance
> > > * struct drv_ipa_addr
> > > * to describe a memory region in a DMA-BUF instance
> > >
> > > for details, see usage sample of each IPA driver
> > >
> > >
> > > #### Image Processing Accelerators overview
> > >
> > > Visconti5 SoC has following image processing accererators
> > >
> > > * AFFINE: 1 input image, 1 output image;
> > Affine transform, Homography transform, Polynomial lens distortion,
> > LUT transform
> > > * DNN: N input feature vector, N output feature vector;
> > Deep neural network operation
> > > * PYRAMID 3 input image, 3 * N output image;
> > Resize grayscale/color image with N different parameters
> > > * DSPIF: M input image, N output image;
> > Various opeations on images
> > > * HOX: 1 input image (multi ROI), 1 input dictionary1 likelihood/feature
> > vector; Extended Histogram of Oriented Gradient based pattern
> > matching
> > > * HAMAT: 2 input feature vectors: 1 output corrdinate vector;
> > Hamming distance matching for stereo vision
> > > * FLMAT: 3 input image, N input feature point, N output matched
> > > point;
> > Optical flow matching
> > > * SMLDB: 1 input image, N input feature point, N output feature
> > > vector;
> > Accelerated-KAZE feature descriptor accelerator
> > > * STMAT: 2 input image, 1 output disparity image;
> > Stereo disparity
> >
> > It's not really easy to decide what is best. I would say that if both
> > input and output are images (RGB, YUV, Grayscale), then a V4L2
> > memory-to-memory driver is what I would expect to see.
> >
> > Where that is not the case, then a more custom approach makes sense.
> >
> > In the list above I would put AFFINE, PYRAMID, DSPIF and possible
> > STMAT in the V4L2 driver group, and the others more as custom drivers.
> >
> > I think it also depends on how it is used: if a captured sensor image
> > is typically passed in for further processing, i.e. it is closely
> > related to the video ISP, then
> > V4L2 is a reasonable choice.
> >
> > A DNN driver, on the other hand, isn't using images at all, so for
> > that something like this driver makes sense.
> >
> > Regards,
> >
> > Hans
> >
> > >
> > > see [0] Fig 7.2.1 for block diagram (of prototype chip)
> > >
> > >
> > > #### DNN accelerator overview
> > >
> > > DNN accelerator is a proprietary CNN/DCNN processing accelerator
> > developed by Toshiba.
> > > Visconti5 SoC has 2 instances of DNN acclerator hardware.
> > > Users convert existing Caffe/ONNX models to Visconti compatible
> > > models
> > with an offline tool.
> > > A converted model "Configuration Binary" includes:
> > > * instruction sequence for given network
> > > * weight/bias information
> > > * DMA configuration from/to global memory (for input/output
> > > feature)
> > >
> > > DNN acccelerator can handle either 1 plane or multiple ROIs at a single call.
> > >
> > > see [0] Fig 7.2.2 for block diagram of DNN accelerator
> > >
> > > CNN: Convolutional Neural Network
> > > DCNN: Deep Convolutional Neural Network
> > >
> > >
> > > #### Input / Output
> > >
> > > Input image or feature: base type is either of FP16, FP32, INT8,
> > > UINT8, INT16 Output feature vector: base type is either of FP16,
> > > FP32, INT8, UINT8, INT16
> > >
> > > Input, Output, Weight, Bias can be placed on global memory and
> > loaded/stored with DMA within DNN accelerator.
> > > These data on global memory can be specified as either of:
> > > * single address to point single data block
> > > * list of address to point multiple data blocks (i.e. ROIs)
> > >
> > > DNN acclerator driver accepts an instance of "struct drv_dnn_descriptor"
> > which includes addresses of input/output features and a configuration
> binary.
> > >
> > >
> > > #### Descriptor Builder at userland
> > >
> > > Following APIs are provided to build a descriptor instance at userland.
> > >
> > > /* defined in drv_dnn_util.h */
> > > int32_t drv_DNN_config_descript_init(struct drv_dnn_descriptor
> > > *desc, struct drv_ipa_buffer_info *buffer, int32_t buffer_num);
> > > int32_t
> > drv_DNN_config_exec_configuration(struct drv_dnn_descriptor *desc,
> > const void *configuration_binary,
> > > struct drv_ipa_addr
> > configuration_binary_addr, struct drv_ipa_addr *src_list,
> > > struct drv_ipa_addr
> > > *dst_list,
> > int32_t list_num, struct drv_ipa_addr temporary_addr,
> > > int32_t temporary_size);
> > > int32_t drv_DNN_config_descript_finalize(struct drv_dnn_descriptor
> > > *desc);
> > >
> > > struct drv_dnn_descriptor is defined in drivers/soc/visconti/uapi/dnn.h.
> > > I think this header should be placed anywhere else to be collected
> > > on "make
> > headers_install" action of kernel building.
> > >
> > >
> > > #### Usage sample (without error handlers)
> > >
> > > #include <linux/dma-heap.h>
> > > #include "drv_ipa.h"
> > > #include "drv_dnn.h"
> > > #include "drv_dnn_util.h"
> > >
> > > int allocate_buffer(int fd_heap, int size)
> > > {
> > > struct dma_heap_allocation_data heap_data_in={0};
> > > int ret;
> > >
> > > heap_data_in.len = ROUNDUP_POW2(size);
> > > heap_data_in.fd_flags = O_RDWR | O_CLOEXEC;
> > >
> > > ret = ioctl(fd_heap, DMA_HEAP_IOCTL_ALLOC, &heap_data_in);
> > > if (ret <0)
> > > return -1;
> > > else
> > > return heap_data_in.fd;
> > > }
> > >
> > > void dnn_sample(int fd_dnn, int fd_conf, int fd_src, int fd_dst,
> > > int
> > fd_temp)
> > > {
> > > int32_t ret;
> > > struct drv_ipa_buffer_info bufinfo[4] = {
> > >
> > {.fd=fd_conf, .coherent=true, .direction=DRV_IPA_DIR_TO_DEVICE},
> > >
> > {.fd=fd_src, .coherent=true, .direction=DRV_IPA_DIR_TO_DEVICE},
> > >
> > {.fd=fd_dst, .coherent=true, .direction=DRV_IPA_DIR_FROM_DEVICE},
> > >
> > {.fd=fd_temp, .coherent=true, .direction=DRV_IPA_DIR_FROM_DEVICE},
> > > };
> > > struct drv_ipa_addr conf_addr = {.buffer_index=0, .offset=0};
> > > struct drv_ipa_addr src_addr = {.buffer_index=1, .offset=0};
> > > struct drv_ipa_addr dst_addr = {.buffer_index=2, .offset=0};
> > > struct drv_ipa_addr temp_addr = {.buffer_index=3, .offset=0};
> > > struct drv_dnn_descriptor desc;
> > >
> > > struct drv_ipa_addr src_list[] = {src_addr};
> > > struct drv_ipa_addr dst_list[] = {dst_addr};
> > >
> > > uint8_t *config = (uint8_t*)mmap(NULL, DNN_CONF_BIN_SIZE,
> > > PROT_READ, MAP_SHARED, fd_conf, 0);
> > >
> > > drv_DNN_config_descript_init(&desc, bufinfo, 4);
> > > drv_DNN_config_exec_configuration(&desc, config, conf_addr,
> > src_list, dst_list, 1, temp_addr, TEMP_BUF_SIZE);
> > > drv_DNN_config_descript_finalize(&desc);
> > >
> > > ioctl(fd_dnn, IOC_IPA_START, &desc);
> > >
> > > {
> > > struct pollfd fds[] =
> > {.fd=fd_dnn, .events=POLL_IN, .revents=0};
> > > poll(fds, 1, 1000);
> > > }
> > > }
> > >
> > > void sample()
> > > {
> > > int fd_dnn, fd_heap, fd_conf, fd_src, fd_dst, fd_temp;
> > >
> > > fd_dnn = open("/dev/dnn0", O_RDWR);
> > > fd_heap = open("/dev/dma_heap/linux,cma", O_RDWR);
> > > fd_conf = allocate_buffer(fd_heap,
> DNN_CONF_BIN_ALLOC_SIZE);
> > > fd_src = allocate_buffer(fd_heap, INPUT_IMG_ALLOC_SIZE);
> > > fd_dst = allocate_buffer(fd_heap, OUTPUT_IMG_ALLOC_SIZE);
> > > fd_temp = allocate_buffer(fd_heap, TEMP_BUF_ALLOC_SIZE);
> > >
> > > /* fill in input image and configuration here */
> > >
> > > dnn_sample(fd_dnn, fd_conf, fd_src, fd_dst, fd_temp);
> > >
> > > ...
> > > };
> > >
> > >
> > > #### Reference
> > >
> > > * [0]
> >
> https://toshiba.semicon-storage.com/content/dam/toshiba-ss-v2/master/e
> > n /company/technical-review/pdf/technical-review-18_e.pdf
> > > * Fig 7.2.1 shows the whole architecture of prototype chip
> > > * Fig 7.2.2 shows the architecture of DNN accelerator
> > >
> > >
> > > Regards,
> > > Yuji
> > >
> > >> -----Original Message-----
> > >> From: Hans Verkuil <hverkuil@xxxxxxxxx>
> > >> Sent: Friday, May 20, 2022 7:03 PM
> > >> To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開)
> > >> <yuji2.ishikawa@xxxxxxxxxxxxx>; robh+dt@xxxxxxxxxx; iwamatsu
> > >> nobuhiro(岩松
> > >> 信洋 □SWC◯ACT) <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>;
> > >> sumit.semwal@xxxxxxxxxx; christian.koenig@xxxxxxx
> > >> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
> > >> linux-kernel@xxxxxxxxxxxxxxx; linux-media@xxxxxxxxxxxxxxx;
> > >> dri-devel@xxxxxxxxxxxxxxxxxxxxx; linaro-mm-sig@xxxxxxxxxxxxxxxx
> > >> Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing
> > >> accelerator driver
> > >>
> > >> Hi Yuji,
> > >>
> > >> On 5/20/22 11:48, yuji2.ishikawa@xxxxxxxxxxxxx wrote:
> > >>> Hi Hans,
> > >>>
> > >>> Thank you for your comment.
> > >>> I agree that this submission lacks documents sharing basic idea of
> > >>> the
> > >> accelerators; what do they accept and what do they yield.
> > >>> Where can I put a new document? Can I put it as a comment in a
> > >>> source? Can
> > >> I add a file under Documentation/misc-devices directory?
> > >>
> > >> Start with explaining it by replying to this mail. Without knowing
> > >> anything about the hardware, it is difficult to say what the best
> > >> place is. Usually it is either the public API header, or somewhere
> > >> in
> > Documentation.
> > >>
> > >> The first step is to have a better understanding of the Visconti
> > >> image hardware and to see what the best subsystem would be to
> > >> support
> > that hardware.
> > >>
> > >> Regards,
> > >>
> > >> Hans
> > >>
> > >>>
> > >>> Thanks,
> > >>> Yuji Ishikawa
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Hans Verkuil <hverkuil@xxxxxxxxx>
> > >>>> Sent: Thursday, May 12, 2022 8:15 PM
> > >>>> To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開)
> > >>>> <yuji2.ishikawa@xxxxxxxxxxxxx>; Rob Herring <robh+dt@xxxxxxxxxx>;
> > >>>> iwamatsu nobuhiro(岩松 信洋 □SWC◯ACT)
> > >>>> <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>; Sumit Semwal
> > >>>> <sumit.semwal@xxxxxxxxxx>; Christian König
> > >> <christian.koenig@xxxxxxx>
> > >>>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
> > >>>> linux-kernel@xxxxxxxxxxxxxxx; linux-media@xxxxxxxxxxxxxxx;
> > >>>> dri-devel@xxxxxxxxxxxxxxxxxxxxx; linaro-mm-sig@xxxxxxxxxxxxxxxx
> > >>>> Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image
> > >>>> processing accelerator driver
> > >>>>
> > >>>> Hi Yuji,
> > >>>>
> > >>>> On 4/28/22 15:11, Yuji Ishikawa wrote:
> > >>>>> This series is the DNN image processing accelerator driver for
> > >>>>> Toshiba's ARM
> > >>>> SoC, Visconti[0].
> > >>>>> This provides DT binding documentation, device driver,
> > >>>>> MAINTAINER
> > >> files.
> > >>>>>
> > >>>>> The second patch "soc: visconti: Add Toshiba Visconti image
> > >>>>> processing
> > >>>> accelerator common source"
> > >>>>> and the fourth patch "MAINTAINERS: ..." are the same as the ones
> > >>>>> in the
> > >>>> preceding post for affine driver.
> > >>>>
> > >>>> There appears to be no documentation whatsoever, unless I am
> > >>>> missing something.
> > >>>>
> > >>>> How is the uAPI supposed to be used? What does it do? What
> > >>>> formats does it accept or produce?
> > >>>>
> > >>>> If this processes images, then (as Laurent mentioned) this is
> > >>>> more suitable as a
> > >>>> V4L2 mem2mem driver.
> > >>>>
> > >>>> See
> > >>>> https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/
> > >>>> de
> > >>>> v-
> > >>>> me
> > >>>> m2mem.html
> > >>>> and the many drivers in drivers/media that use it (git grep
> > >> v4l2-mem2mem.h).
> > >>>>
> > >>>> But without any explanation whatsoever I have no idea what does
> > >>>> or does not make sense.
> > >>>>
> > >>>> Regards,
> > >>>>
> > >>>> Hans
> > >>>>
> > >>>>>
> > >>>>> Best regards,
> > >>>>> Yuji
> > >>>>>
> > >>>>> [0]:
> > >>>>>
> > >>>>
> > >>
> > https://toshiba.semicon-storage.com/ap-en/semiconductor/product/image
> > >>>> -
> > >>>>> recognition-processors-visconti.html
> > >>>>>
> > >>>>> Yuji Ishikawa (4):
> > >>>>> dt-bindings: soc: visconti: Add Toshiba Visconti DNN image
> > processing
> > >>>>> accelerator bindings
> > >>>>> soc: visconti: Add Toshiba Visconti image processing accelerator
> > >>>>> common source
> > >>>>> soc: visconti: Add Toshiba Visconti DNN image processing
> > accelerator
> > >>>>> MAINTAINERS: Add entries for Toshiba Visconti DNN image
> > processing
> > >>>>> accelerator
> > >>>>>
> > >>>>> .../soc/visconti/toshiba,visconti-dnn.yaml | 54 ++
> > >>>>> MAINTAINERS | 2 +
> > >>>>> drivers/soc/Kconfig | 1 +
> > >>>>> drivers/soc/Makefile | 1 +
> > >>>>> drivers/soc/visconti/Kconfig | 7 +
> > >>>>> drivers/soc/visconti/Makefile | 8 +
> > >>>>> drivers/soc/visconti/dnn/Makefile | 6 +
> > >>>>> drivers/soc/visconti/dnn/dnn.c | 533
> > >>>> ++++++++++++++++++
> > >>>>> drivers/soc/visconti/dnn/hwd_dnn.c | 183 ++++++
> > >>>>> drivers/soc/visconti/dnn/hwd_dnn.h | 68 +++
> > >>>>> drivers/soc/visconti/dnn/hwd_dnn_reg.h | 228
> ++++++++
> > >>>>> drivers/soc/visconti/ipa_common.c | 55 ++
> > >>>>> drivers/soc/visconti/ipa_common.h | 18 +
> > >>>>> drivers/soc/visconti/uapi/dnn.h | 77 +++
> > >>>>> drivers/soc/visconti/uapi/ipa.h | 88 +++
> > >>>>> 15 files changed, 1329 insertions(+) create mode 100644
> > >>>>>
> > Documentation/devicetree/bindings/soc/visconti/toshiba,visconti-dnn.
> > >>>>> ya ml create mode 100644 drivers/soc/visconti/Kconfig create
> > >>>>> mode
> > >>>>> 100644 drivers/soc/visconti/Makefile create mode 100644
> > >>>>> drivers/soc/visconti/dnn/Makefile create mode 100644
> > >>>>> drivers/soc/visconti/dnn/dnn.c create mode 100644
> > >>>>> drivers/soc/visconti/dnn/hwd_dnn.c
> > >>>>> create mode 100644 drivers/soc/visconti/dnn/hwd_dnn.h
> > >>>>> create mode 100644 drivers/soc/visconti/dnn/hwd_dnn_reg.h
> > >>>>> create mode 100644 drivers/soc/visconti/ipa_common.c create
> > mode
> > >>>>> 100644 drivers/soc/visconti/ipa_common.h create mode 100644
> > >>>>> drivers/soc/visconti/uapi/dnn.h create mode 100644
> > >>>>> drivers/soc/visconti/uapi/ipa.h
> > >>>>>