Re: [PATCH v2 1/2] media: docs-rst: Document memory-to-memory video decoder interface

From: Nicolas Dufresne
Date: Fri Nov 16 2018 - 23:31:52 EST


Le jeudi 15 novembre 2018 Ã 15:34 +0100, Hans Verkuil a Ãcrit :
> On 10/22/2018 04:48 PM, Tomasz Figa wrote:
> > Due to complexity of the video decoding process, the V4L2 drivers of
> > stateful decoder hardware require specific sequences of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > decoding, seek, pause, dynamic resolution change, drain and end of
> > stream.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in DÃsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or coda.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the decoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@xxxxxxxxxxxx>
> > ---
> > Documentation/media/uapi/v4l/dev-decoder.rst | 1082 +++++++++++++++++
> > Documentation/media/uapi/v4l/devices.rst | 1 +
> > Documentation/media/uapi/v4l/pixfmt-v4l2.rst | 5 +
> > Documentation/media/uapi/v4l/v4l2.rst | 10 +-
> > .../media/uapi/v4l/vidioc-decoder-cmd.rst | 40 +-
> > Documentation/media/uapi/v4l/vidioc-g-fmt.rst | 14 +
> > 6 files changed, 1137 insertions(+), 15 deletions(-)
> > create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-decoder.rst b/Documentation/media/uapi/v4l/dev-decoder.rst
> > new file mode 100644
> > index 000000000000..09c7a6621b8e
> > --- /dev/null
> > +++ b/Documentation/media/uapi/v4l/dev-decoder.rst
> > @@ -0,0 +1,1082 @@
> > +.. -*- coding: utf-8; mode: rst -*-
> > +
> > +.. _decoder:
> > +
> > +*************************************************
> > +Memory-to-memory Stateful Video Decoder Interface
> > +*************************************************
> > +
> > +A stateful video decoder takes complete chunks of the bitstream (e.g. Annex-B
> > +H.264/HEVC stream, raw VP8/9 stream) and decodes them into raw video frames in
> > +display order. The decoder is expected not to require any additional information
> > +from the client to process these buffers.
> > +
> > +Performing software parsing, processing etc. of the stream in the driver in
> > +order to support this interface is strongly discouraged. In case such
> > +operations are needed, use of the Stateless Video Decoder Interface (in
> > +development) is strongly advised.
> > +
> > +Conventions and notation used in this document
> > +==============================================
> > +
> > +1. The general V4L2 API rules apply if not specified in this document
> > + otherwise.
> > +
> > +2. The meaning of words âmustâ, âmayâ, âshouldâ, etc. is as per RFC
> > + 2119.
> > +
> > +3. All steps not marked âoptionalâ are required.
> > +
> > +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used
> > + interchangeably with :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`,
> > + unless specified otherwise.
> > +
> > +5. Single-plane API (see spec) and applicable structures may be used
> > + interchangeably with Multi-plane API, unless specified otherwise,
> > + depending on decoder capabilities and following the general V4L2
> > + guidelines.
> > +
> > +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> > + [0..2]: i = 0, 1, 2.
> > +
> > +7. Given an ``OUTPUT`` buffer A, Aâ represents a buffer on the ``CAPTURE``
> > + queue containing data (decoded frame/stream) that resulted from processing
> > + buffer A.
> > +
> > +.. _decoder-glossary:
> > +
> > +Glossary
> > +========
> > +
> > +CAPTURE
> > + the destination buffer queue; for decoder, the queue of buffers containing
> > + decoded frames; for encoder, the queue of buffers containing encoded
> > + bitstream; ``V4L2_BUF_TYPE_VIDEO_CAPTURE```` or
> > + ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``; data are captured from the hardware
> > + into ``CAPTURE`` buffers
> > +
> > +client
> > + application client communicating with the decoder or encoder implementing
> > + this interface
> > +
> > +coded format
> > + encoded/compressed video bitstream format (e.g. H.264, VP8, etc.); see
> > + also: raw format
> > +
> > +coded height
> > + height for given coded resolution
> > +
> > +coded resolution
> > + stream resolution in pixels aligned to codec and hardware requirements;
> > + typically visible resolution rounded up to full macroblocks;
> > + see also: visible resolution
> > +
> > +coded width
> > + width for given coded resolution
> > +
> > +decode order
> > + the order in which frames are decoded; may differ from display order if the
> > + coded format includes a feature of frame reordering; for decoders,
> > + ``OUTPUT`` buffers must be queued by the client in decode order; for
> > + encoders ``CAPTURE`` buffers must be returned by the encoder in decode order
> > +
> > +destination
> > + data resulting from the decode process; ``CAPTURE``
> > +
> > +display order
> > + the order in which frames must be displayed; for encoders, ``OUTPUT``
> > + buffers must be queued by the client in display order; for decoders,
> > + ``CAPTURE`` buffers must be returned by the decoder in display order
> > +
> > +DPB
> > + Decoded Picture Buffer; an H.264 term for a buffer that stores a decoded
> > + raw frame available for reference in further decoding steps.
> > +
> > +EOS
> > + end of stream
> > +
> > +IDR
> > + Instantaneous Decoder Refresh; a type of a keyframe in H.264-encoded stream,
> > + which clears the list of earlier reference frames (DPBs)
> > +
> > +keyframe
> > + an encoded frame that does not reference frames decoded earlier, i.e.
> > + can be decoded fully on its own.
> > +
> > +macroblock
> > + a processing unit in image and video compression formats based on linear
> > + block transforms (e.g. H.264, VP8, VP9); codec-specific, but for most of
> > + popular codecs the size is 16x16 samples (pixels)
> > +
> > +OUTPUT
> > + the source buffer queue; for decoders, the queue of buffers containing
> > + encoded bitstream; for encoders, the queue of buffers containing raw frames;
> > + ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``; the
> > + hardware is fed with data from ``OUTPUT`` buffers
> > +
> > +PPS
> > + Picture Parameter Set; a type of metadata entity in H.264 bitstream
> > +
> > +raw format
> > + uncompressed format containing raw pixel data (e.g. YUV, RGB formats)
> > +
> > +resume point
> > + a point in the bitstream from which decoding may start/continue, without
> > + any previous state/data present, e.g.: a keyframe (VP8/VP9) or
> > + SPS/PPS/IDR sequence (H.264); a resume point is required to start decode
> > + of a new stream, or to resume decoding after a seek
> > +
> > +source
> > + data fed to the decoder or encoder; ``OUTPUT``
> > +
> > +source height
> > + height in pixels for given source resolution; relevant to encoders only
> > +
> > +source resolution
> > + resolution in pixels of source frames being source to the encoder and
> > + subject to further cropping to the bounds of visible resolution; relevant to
> > + encoders only
> > +
> > +source width
> > + width in pixels for given source resolution; relevant to encoders only
> > +
> > +SPS
> > + Sequence Parameter Set; a type of metadata entity in H.264 bitstream
> > +
> > +stream metadata
> > + additional (non-visual) information contained inside encoded bitstream;
> > + for example: coded resolution, visible resolution, codec profile
> > +
> > +visible height
> > + height for given visible resolution; display height
> > +
> > +visible resolution
> > + stream resolution of the visible picture, in pixels, to be used for
> > + display purposes; must be smaller or equal to coded resolution;
> > + display resolution
> > +
> > +visible width
> > + width for given visible resolution; display width
> > +
> > +State machine
> > +=============
> > +
> > +.. kernel-render:: DOT
> > + :alt: DOT digraph of decoder state machine
> > + :caption: Decoder state machine
> > +
> > + digraph decoder_state_machine {
> > + node [shape = doublecircle, label="Decoding"] Decoding;
> > +
> > + node [shape = circle, label="Initialization"] Initialization;
> > + node [shape = circle, label="Capture\nsetup"] CaptureSetup;
> > + node [shape = circle, label="Dynamic\nresolution\nchange"] ResChange;
> > + node [shape = circle, label="Stopped"] Stopped;
> > + node [shape = circle, label="Drain"] Drain;
> > + node [shape = circle, label="Seek"] Seek;
> > + node [shape = circle, label="End of stream"] EoS;
> > +
> > + node [shape = point]; qi
> > + qi -> Initialization [ label = "open()" ];
> > +
> > + Initialization -> CaptureSetup [ label = "CAPTURE\nformat\nestablished" ];
> > +
> > + CaptureSetup -> Stopped [ label = "CAPTURE\nbuffers\nready" ];
> > +
> > + Decoding -> ResChange [ label = "Stream\nresolution\nchange" ];
> > + Decoding -> Drain [ label = "V4L2_DEC_CMD_STOP" ];
> > + Decoding -> EoS [ label = "EoS mark\nin the stream" ];
> > + Decoding -> Seek [ label = "VIDIOC_STREAMOFF(OUTPUT)" ];
> > + Decoding -> Stopped [ label = "VIDIOC_STREAMOFF(CAPTURE)" ];
> > + Decoding -> Decoding;
> > +
> > + ResChange -> CaptureSetup [ label = "CAPTURE\nformat\nestablished" ];
> > + ResChange -> Seek [ label = "VIDIOC_STREAMOFF(OUTPUT)" ];
> > +
> > + EoS -> Drain [ label = "Implicit\ndrain" ];
> > +
> > + Drain -> Stopped [ label = "All CAPTURE\nbuffers dequeued\nor\nVIDIOC_STREAMOFF(CAPTURE)" ];
> > + Drain -> Seek [ label = "VIDIOC_STREAMOFF(OUTPUT)" ];
> > +
> > + Seek -> Decoding [ label = "VIDIOC_STREAMON(OUTPUT)" ];
> > + Seek -> Initialization [ label = "VIDIOC_REQBUFS(OUTPUT, 0)" ];
> > +
> > + Stopped -> Decoding [ label = "V4L2_DEC_CMD_START\nor\nVIDIOC_STREAMON(CAPTURE)" ];
> > + Stopped -> Seek [ label = "VIDIOC_STREAMOFF(OUTPUT)" ];
> > + }
> > +
> > +Querying capabilities
> > +=====================
> > +
> > +1. To enumerate the set of coded formats supported by the decoder, the
> > + client may call :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
> > +
> > + * The full set of supported formats will be returned, regardless of the
> > + format set on ``CAPTURE``.
> > +
> > +2. To enumerate the set of supported raw formats, the client may call
> > + :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
> > +
> > + * Only the formats supported for the format currently active on ``OUTPUT``
> > + will be returned.
> > +
> > + * In order to enumerate raw formats supported by a given coded format,
> > + the client must first set that coded format on ``OUTPUT`` and then
> > + enumerate formats on ``CAPTURE``.
> > +
> > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > + resolutions for a given format, passing desired pixel format in
> > + :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> > +
> > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` for a coded pixel
> > + formats will include all possible coded resolutions supported by the
> > + decoder for given coded pixel format.
> > +
> > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` for a raw pixel format
> > + will include all possible frame buffer resolutions supported by the
> > + decoder for given raw pixel format and the coded format currently set on
> > + ``OUTPUT``.
> > +
> > +4. Supported profiles and levels for given format, if applicable, may be
> > + queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> > +
> > +Initialization
> > +==============
> > +
> > +1. **Optional.** Enumerate supported ``OUTPUT`` formats and resolutions. See
> > + `Querying capabilities` above.
> > +
> > +2. Set the coded format on ``OUTPUT`` via :c:func:`VIDIOC_S_FMT`
> > +
> > + * **Required fields:**
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > + ``pixelformat``
> > + a coded pixel format
> > +
> > + ``width``, ``height``
> > + required only if cannot be parsed from the stream for the given
> > + coded format; optional otherwise - set to zero to ignore
> > +
> > + ``sizeimage``
> > + desired size of ``OUTPUT`` buffers; the decoder may adjust it to
> > + match hardware requirements
> > +
> > + other fields
> > + follow standard semantics
> > +
> > + * **Return fields:**
> > +
> > + ``sizeimage``
> > + adjusted size of ``CAPTURE`` buffers
> > +
> > + * If width and height are set to non-zero values, the ``CAPTURE`` format
> > + will be updated with an appropriate frame buffer resolution instantly.
> > + However, for coded formats that include stream resolution information,
> > + after the decoder is done parsing the information from the stream, it will
> > + update the ``CAPTURE`` format with new values and signal a source change
> > + event.
> > +
> > + .. warning::
> > +
> > + Changing the ``OUTPUT`` format may change the currently set ``CAPTURE``
> > + format. The decoder will derive a new ``CAPTURE`` format from the
> > + ``OUTPUT`` format being set, including resolution, colorimetry
> > + parameters, etc. If the client needs a specific ``CAPTURE`` format, it
> > + must adjust it afterwards.
> > +
> > +3. **Optional.** Query the minimum number of buffers required for ``OUTPUT``
> > + queue via :c:func:`VIDIOC_G_CTRL`. This is useful if the client intends to
> > + use more buffers than the minimum required by hardware/format.
> > +
> > + * **Required fields:**
> > +
> > + ``id``
> > + set to ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > +
> > + * **Return fields:**
> > +
> > + ``value``
> > + the minimum number of ``OUTPUT`` buffers required for the currently
> > + set format
> > +
> > +4. Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
> > + ``OUTPUT``.
> > +
> > + * **Required fields:**
> > +
> > + ``count``
> > + requested number of buffers to allocate; greater than zero
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > + ``memory``
> > + follows standard semantics
> > +
> > + * **Return fields:**
> > +
> > + ``count``
> > + the actual number of buffers allocated
> > +
> > + .. warning::
> > +
> > + The actual number of allocated buffers may differ from the ``count``
> > + given. The client must check the updated value of ``count`` after the
> > + call returns.
> > +
> > + .. note::
> > +
> > + To allocate more than the minimum number of buffers (for pipeline
> > + depth), the client may query the ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > + control to get the minimum number of buffers required by the
> > + decoder/format, and pass the obtained value plus the number of
> > + additional buffers needed in the ``count`` field to
> > + :c:func:`VIDIOC_REQBUFS`.
> > +
> > + Alternatively, :c:func:`VIDIOC_CREATE_BUFS` on the ``OUTPUT`` queue can be
> > + used to have more control over buffer allocation.
> > +
> > + * **Required fields:**
> > +
> > + ``count``
> > + requested number of buffers to allocate; greater than zero
> > +
> > + ``type``
> > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > + ``memory``
> > + follows standard semantics
> > +
> > + ``format``
> > + follows standard semantics
> > +
> > + * **Return fields:**
> > +
> > + ``count``
> > + adjusted to the number of allocated buffers
> > +
> > + .. warning::
> > +
> > + The actual number of allocated buffers may differ from the ``count``
> > + given. The client must check the updated value of ``count`` after the
> > + call returns.
> > +
> > +5. Start streaming on the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`.
> > +
> > +6. **This step only applies to coded formats that contain resolution information
> > + in the stream.**
>
> As far as I know all codecs have resolution/metadata in the stream.

Was this comment about what we currently support in V4L2 interface ? In
real life, there is CODEC that works only with out-of-band codec data.
A well known one is AVC1 (and HVC1). In this mode, the AVC H264 does
not have start code, and the headers are not allowed in the bitstream
itself. This format is much more efficient to process then AVC Annex B,
since you can just read the NAL size and jump over instead of scanning
for start code. This is the format used in the very popular ISOMP4
container.

The other sign that these codecs do exist is the recurence of the
notion of codec data in pretty much every codec abstraction there is
(ffmpeg, GStreamer, Android Media Codec)
>
> As discussed in the "[PATCH vicodec v4 0/3] Add support to more pixel formats in
> vicodec" thread, it is easiest to assume that there is always metadata.
>
> Perhaps there should be a single mention somewhere that such codecs are not
> supported at the moment, but to be frank how can you decode a stream without
> it containing such essential information? You are much more likely to implement
> such a codec as a stateless codec.

That is I believe a miss-interpretation of what a stateless codec is.
It's not because you have to set one blob of CODEC data after S_FMT on
a specific control that this CODEC becomes stateless. The fact is not
supported now is just become we didn't have come accross HW that
supports these.

FFMPEG offers a state full software codec, and you still have this
codec_data blob for many of the format. They also only supports AVC1,
they parser will always convert the stream to that, because it's just
more efficient format. Android Media Codec works similarly. What keeps
them stateful, is that you don't need to parse it, you don't even need
to know what these data contains. They are blobs placed in the
container that you pass as-is to the decoder.

>
> So I would just drop this sentence here (and perhaps at other places in this
> document or the encoder document as well).
>
> Regards,
>
> Hans