Re: [PATCH 3/4] remoteproc: add support for a new 64-bit trace version

From: ClÃment Leger
Date: Fri May 22 2020 - 15:28:53 EST


Hi Suman,

----- On 22 May, 2020, at 20:59, s-anna s-anna@xxxxxx wrote:

> Hi Clement,
>
>> > ----- On 22 May, 2020, at 20:03, ClÃment Leger cleger@xxxxxxxxx wrote:>
>>> Hi Suman,
>>>
>>> ----- On 22 May, 2020, at 19:33, Bjorn Andersson bjorn.andersson@xxxxxxxxxx
>>> wrote:
>>>
>>>> On Fri 22 May 09:54 PDT 2020, Suman Anna wrote:
>>>>
>>>>> On 5/21/20 2:42 PM, Suman Anna wrote:
>>>>>> Hi Bjorn,
>>>>>>
>>>>>> On 5/21/20 1:04 PM, Bjorn Andersson wrote:
>>>>>>> On Wed 25 Mar 13:47 PDT 2020, Suman Anna wrote:
>>>> [..]
>>>>>>>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
>>>> [..]
>>>>>>>> +struct fw_rsc_trace2 {
>>>>>>>
>>>>>>> Sounds more like fw_rsc_trace64 to me - in particular since the version
>>>>>>> of trace2 is 1...
>>>>>>
>>>>>> Yeah, will rename this.
>>>>>>
>>>>>>>
>>>>>>>> +ÂÂÂ u32 padding;
>>>>>>>> +ÂÂÂ u64 da;
>>>>>>>> +ÂÂÂ u32 len;
>>>>>>>> +ÂÂÂ u32 reserved;
>>>>>>>
>>>>>>> What's the purpose of this reserved field?
>>>>>>
>>>>>> Partly to make sure the entire resource is aligned on an 8-byte, and
>>>>>> partly copied over from fw_rsc_trace entry. I guess 32-bits is already
>>>>>> large enough of a size for trace entries irrespective of 32-bit or
>>>>>> 64-bit traces, so I doubt if we want to make the len field also a u64.
>>>>>
>>>>> Looking at this again, I can drop both padding and reserved fields, if I
>>>>> move the len field before da. Any preferences/comments?
>>>
>> Sorry, my message went a bit too fast... So as I was saying:
>>
>> Not only the in-structure alignment matters but also in the resource table.
>> Since the resource table is often packed (see [1] for instance), if a trace
>> resource is embedded in the resource table after another resource aligned
>> on 32 bits only, your 64 bits trace field will potentially end up
>> misaligned.
>
> Right. Since one can mix and match the resources of different sizes and
> include them in any order, the onus is going to be on the resource table
> constructors to ensure the inter-resource alignments, if any are
> required. The resource table format allows you to add padding fields in
> between if needed, and the remoteproc core relies on the offsets.

Agreed

>
> I can only ensure the alignment within this resource structure with
> ready-available access and conversion to/from a 64-bit type, as long as
> the resource is starting on a 64-bit boundary.
>
>>
>> To overcome this, there is multiple solutions:
>>
>> - Split the 64 bits fields into 32bits low and high parts:
>> Since all resources are aligned on 32bits, it will be ok
>
> Yes, this is one solution. At the same time, this means you need
> additional conversion logic for converting to and from 64-bit field. In
> this particular case, da is the address of the trace buffer pointer on a
> 64-bit processor, so we can directly use the address of the trace
> buffer. Guess it is a question of easier translation vs packing the
> resource table as tight as possible.

You simply have to add two wrapper such as the following:

static inline void rproc_rsc_set_addr(u32 *low, u32 *hi, u64 val)
{
*low = lower_32_bits(val);
*hi = upper_32_bits(val);
}

static inline u64 rproc_rsc_get_addr(u32 low, u32 hi)
{
return ((u64) hi << 32) | low;
}

This is not really difficult to use and will ensure your new trace
resource can be used easily without breaking 32 bits alignment.
Breaking compatibility is an option also and it is probably needed
to document it clearly if it is chosen to do so.

>
>>
>> - Use memcpy_from/to_io when reading/writing such fields
>> As I said in a previous message this should probably be used since
>> the memories that are accessed by rproc are io mem (ioremap in almost
>> all drivers).
>
> Anything running out of DDR actually doesn't need the io mem semantics,
> so we actually need to be fixing the drivers. Blindly changing the
> current memcpy to memcpy_to_io in the core loader is also not right. Any
> internal memories properties will actually depend on the processor and
> SoC. Eg: The R5 TCM interfaces in general can be treated as normal memories.

Agreed, this is most of the case indeed where the memories are actually
accessible directly. But using ioremap potentially creates a mapping that
does not support misaligned accesses and so accesses should be always aligned.
memcpy_from/to_io ensures that.
IMHO, there is probably something to be rework since the drivers are mapping
the memories but the core is accessing this memory, knowing nothing about
how it was mapped.

Regards,

ClÃment

>
> regards
> Suman
>
>>
>> Regards,
>>
>> ClÃment
>>
>> [1]
>> https://github.com/OpenAMP/open-amp/blob/master/apps/machine/zynqmp_r5/rsc_table.h
>>>
>>>
>>>
>>>
>>>>>
>>>>
>>>> Sounds good to me.
>>>>
>>>> Thanks,
> >>> Bjorn