Re: [PATCH] QEMU fw_cfg DMA interface documentation

From: Laszlo Ersek
Date: Thu Aug 06 2015 - 11:17:39 EST


On 08/06/15 16:28, Andrew Jones wrote:
> On Thu, Aug 06, 2015 at 04:19:18PM +0200, Marc Marí wrote:
>> On Thu, 6 Aug 2015 16:08:49 +0200
>> Andrew Jones <drjones@xxxxxxxxxx> wrote:
>>
>>> On Thu, Aug 06, 2015 at 01:03:07PM +0200, Marc Marí wrote:
>>>> Add fw_cfg DMA interface specfication in the fw_cfg documentation.
>>>>
>>>> Signed-off-by: Marc Marí <markmb@xxxxxxxxxx>
>>>> ---
>>>> Documentation/devicetree/bindings/arm/fw-cfg.txt | 36
>>>> ++++++++++++++++++++++++ 1 file changed, 36 insertions(+)
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/arm/fw-cfg.txt
>>>> b/Documentation/devicetree/bindings/arm/fw-cfg.txt index
>>>> 953fb64..c880eec 100644 ---
>>>> a/Documentation/devicetree/bindings/arm/fw-cfg.txt +++
>>>> b/Documentation/devicetree/bindings/arm/fw-cfg.txt @@ -49,6 +49,41
>>>> @@ The guest kernel is not expected to use these registers
>>>> (although it is certainly allowed to); the device tree bindings are
>>>> documented here because this is where device tree bindings reside
>>>> in general. +Starting from revision 2, a DMA interface has also
>>>> been added. This can be used +through a write-only, 64-bit wide
>>>> address register. +
>>>> +In this register, a pointer to a FWCfgDmaAccess structure can be
>>>> written, in +big endian mode. This is the format of the
>>>> FWCfgDmaAccess structure:
>>> s/mode/format/
>>>> +
>>>> +typedef struct FWCfgDmaAccess {
>>>> + uint64_t address;
>>>> + uint32_t length;
>>>> + uint32_t control;
>>>> +} FWCfgDmaAccess;
>>>> +
>>>> +Once the address to this structure has been written, an DMA
>>>> operation is +started. If the "control" field has value 2, a read
>>>> operation will be performed. +"length" bytes for the current
>>>> selector and offset will be mapped into the +address specified by
>>>> the "address" field. +
>>>> +If the field "address" has value 0, the read is considered a skip,
>>>> and +the data is not copied anywhere, but the offset is still
>>>> incremented.
>>>
>>> So we can't DMA to address == 0? That might not generally be a good
>>> idea, but why limit ourselves? Can't we add another control input for
>>> skip instead? Actually, what inputs are accepted now? READ == 2,
>>> WRITE? ??
>>>
>>
>> Write was already disabled for PIO:
>>
>> static void fw_cfg_write(FWCfgState *s, uint8_t value)
>> {
>> /* nothing, write support removed in QEMU v2.4+ */
>> }
>>
>>>> +
>>>> +To check result, read the control register:
>>>> + error bit set -> something went wrong.
>>> echo Stefan's which bit question, and also what types of errors? If
>>> there are many, then how about an error bit, plus field of bits for
>>> an error code?
>>>
>>
>> Bit 0. At this moment the only error possible is DMA mapping failure.
>> But its true that it might be useful to have some bits in the control
>> field or another field to indicate the type of error, in case of future
>> extensions.
>>
>>>> + all bits cleared -> transfer finished successfully.
>>>> + otherwise -> transfer still in progress (doesn't happen
>>>> + today due to implementation not being
>>>> async,
>>>> + but may in the future).
>>> I don't think we need to point out that this isn't implemented yet,
>>> but may be in the future. If that's how it may work, then let's just
>>> document it. And why not specify an in-progress bit?
>>
>> Is this a feature we will want?
>
> I don't know. Need firmware person like Laszlo to give an opinion. Maybe
> he'd want OVMF/AAVMF to be able to output progress while transferring,
> to keep users more patient?

I don't yet know how to use this interface from the firmware.

FWIW the library interface to fw_cfg that we use in OVMF & AAVMF is
synchronous. (It doesn't make sense to do "something else" until the
fw_cfg transfer completes "in the background".)

So I imagined the new interface would behave in one of the following ways:
(1) I perform the "final" MMIO register write, and bam, by the time the
next instruction is executed, the target memory area is populated.
(2) Or, I perform the same final MMIO register write, and then
busy-loop on reading some other status register, until it tells me
that the transfer is done, or there has been an error.

(Under (2), if there's an error, I'll just log an error message, and
hang the firmware. Our internal library interface doesn't return any
status codes (no errors are possible with the MMIO/PIO interfaces, on
ARM and x86 alike), and I don't think this change warrants updating a
zillion call sites, in order to propagate and handle such DMA errors in
all modules that use fw-cfg. (We always check return values -- unless
the return type is VOID. And it is now.))

Regarding progress info: aside from the kernel and the initrd, the
fw-cfg items (files included) are tiny. (Remember, they were meant as
*config* items :)) So, normally, I don't bother about progress info.

The kernel and the initrd are exceptions. So, at the moment the code
that reads them (which is specific Boot Device Selection library code,
not generic fw-cfg client libary code) breaks up the transfer into
chunks of 1MB, and logs progress info in between. That is, the progress
info is not emitted by the direct fw-cfg client, it's printed by the
specific module that relies on the fw-cfg library.

So, regardless of whether I'll have to write (1) or (2), I expect this
same approach to just work with the DMA interface. The DMA thing will be
hidden inside the library, the API will remain synchronous. At best the
module (in BDS) that downloads the kernel/initrd blobs will print those
1MB progress messages "super fast".

Summary: the only thing I need to know is when a transfer has finished,
successfully or unsuccessfully.

(If a transfer can terminate prematurely, but still successfully -- ie.
the firwmare should contine "with the rest" -- then the exact size of
the short transfer should also be returned. But I don't think short
transfers would be very useful.)

Thanks
Laszlo



>
>>
>>>> +
>>>> +Target address goes up and transfer length goes down as the
>>>> transfer +happens, so after a successful transfer the length
>>>> register is zero +and the address register points right after the
>>>> memory block written. +
>>>> +If a partial transfer happened before an error occured the address
>>>> and +length registers indicate how much data has been transfered
>>>> +successfully.
>>>> +
>>>> Required properties:
>>>>
>>>> - compatible: "qemu,fw-cfg-mmio".
>>>> @@ -56,6 +91,7 @@ Required properties:
>>>> - reg: the MMIO region used by the device.
>>>> * Bytes 0x0 to 0x7 cover the data register.
>>>> * Bytes 0x8 to 0x9 cover the selector register.
>>>> + * From revision 2: Bytes 0xa to 0x11 cover the DMA address
>>>> register.
>>>
>>> I think we should naturally align the register, i.e we can reserve
>>> bytes 0xa - 0xf, and then put the DMA register at 0x10.
>>>
>>>> * Further registers may be appended to the region in case of
>>>> future interface revisions / feature bits.
>>>>
>>>> --
>>>> 2.4.3
>>>>
>>>
>>
>> Thanks
>> Marc

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/