Re: [RFC 3/3] CMDQ: Mediatek CMDQ driver

From: Daniel Kurtz
Date: Fri Jan 29 2016 - 03:43:26 EST


On Fri, Jan 29, 2016 at 3:39 PM, Horng-Shyang Liao <hs.liao@xxxxxxxxxxxx> wrote:
> Hi Dan,
>
> Many thanks for your comments and time.
> I reply my plan inline.
>
>
> On Thu, 2016-01-28 at 12:49 +0800, Daniel Kurtz wrote:
>> Hi HS,
>>
>> Sorry for the delay. It is hard to find time to review a >3700 line
>> driver :-o in detail....
>>
>> Some review comments inline, although I still do not completely
>> understand how all that this driver does and how it works.
>> I'll try to find time to go through this driver in detail again next
>> time you post it for review.
>>
>> On Tue, Jan 19, 2016 at 9:14 PM, <hs.liao@xxxxxxxxxxxx> wrote:
>> > From: HS Liao <hs.liao@xxxxxxxxxxxx>
>> >
>> > This patch is first version of Mediatek Command Queue(CMDQ) driver. The
>> > CMDQ is used to help read/write registers with critical time limitation,
>> > such as updating display configuration during the vblank. It controls
>> > Global Command Engine (GCE) hardware to achieve this requirement.
>> > Currently, CMDQ only supports display related hardwares, but we expect
>> > it can be extended to other hardwares for future requirements.
>> >
>> > Signed-off-by: HS Liao <hs.liao@xxxxxxxxxxxx>
>>
>> [snip]
>>
>> > diff --git a/drivers/soc/mediatek/mtk-cmdq.c b/drivers/soc/mediatek/mtk-cmdq.c
>> > new file mode 100644
>> > index 0000000..7570f00
>> > --- /dev/null
>> > +++ b/drivers/soc/mediatek/mtk-cmdq.c
>>
>> [snip]
>>
>> > +/*
>> > + * Maximum prefetch buffer size.
>> > + * Unit is instructions.
>> > + */
>> > +#define CMDQ_MAX_PREFETCH_INSTUCTION 240
>>
>> INSTRUCTION
>
> will fix
>
>> [snip]
>>
>> > +
>> > +/* get lsb for subsys encoding in arg_a (range: 0 - 31) */
>> > +
>> > +enum cmdq_eng {
>> > + CMDQ_ENG_DISP_UFOE = 0,
>> > + CMDQ_ENG_DISP_AAL,
>> > + CMDQ_ENG_DISP_COLOR0,
>> > + CMDQ_ENG_DISP_COLOR1,
>> > + CMDQ_ENG_DISP_RDMA0,
>> > + CMDQ_ENG_DISP_RDMA1,
>> > + CMDQ_ENG_DISP_RDMA2,
>> > + CMDQ_ENG_DISP_WDMA0,
>> > + CMDQ_ENG_DISP_WDMA1,
>> > + CMDQ_ENG_DISP_OVL0,
>> > + CMDQ_ENG_DISP_OVL1,
>> > + CMDQ_ENG_DISP_GAMMA,
>> > + CMDQ_ENG_DISP_DSI0_CMD,
>> > + CMDQ_ENG_DISP_DSI1_CMD,
>>
>> Why do these last two have "_CMD" at the end?
>
> will remove
>
>> > + CMDQ_MAX_ENGINE_COUNT /* ALWAYS keep at the end */
>> > +};
>> > +
>> > +struct cmdq_command {
>> > + struct cmdq *cqctx;
>> > + u32 scenario;
>> > + /* task priority (NOT HW thread priority) */
>> > + u32 priority;
>> > + /* bit flag of used engines */
>> > + u64 engine_flag;
>> > + /*
>> > + * pointer of instruction buffer
>> > + * This must point to an 64-bit aligned u32 array
>> > + */
>> > + u32 *va_base;
>>
>> All of your "va" and "va_base" should be void *, not u32 *.
>
> will do
>
>> > + /* size of instruction buffer, in bytes. */
>> > + u32 block_size;
>>
>> Better to use size_t for "size in bytes".
>
> will fix
>
>> > +};
>> > +
>> > +enum cmdq_code {
>> > + /* These are actual HW op code. */
>> > + CMDQ_CODE_MOVE = 0x02,
>> > + CMDQ_CODE_WRITE = 0x04,
>> > + CMDQ_CODE_JUMP = 0x10,
>> > + CMDQ_CODE_WFE = 0x20, /* wait for event (and clear) */
>> > + CMDQ_CODE_CLEAR_EVENT = 0x21, /* clear event */
>> > + CMDQ_CODE_EOC = 0x40, /* end of command */
>> > +};
>> > +
>> > +enum cmdq_task_state {
>> > + TASK_STATE_IDLE, /* free task */
>> > + TASK_STATE_BUSY, /* task running on a thread */
>> > + TASK_STATE_KILLED, /* task process being killed */
>> > + TASK_STATE_ERROR, /* task execution error */
>> > + TASK_STATE_DONE, /* task finished */
>> > + TASK_STATE_WAITING, /* allocated but waiting for available thread */
>> > +};
>> > +
>> > +struct cmdq_cmd_buf {
>> > + atomic_t used;
>> > + void *va;
>> > + dma_addr_t pa;
>> > +};
>> > +
>> > +struct cmdq_task_cb {
>> > + /* called by isr */
>> > + cmdq_async_flush_cb isr_cb;
>> > + void *isr_data;
>> > + /* called by releasing task */
>> > + cmdq_async_flush_cb done_cb;
>> > + void *done_data;
>> > +};
>> > +
>> > +struct cmdq_task {
>> > + struct cmdq *cqctx;
>> > + struct list_head list_entry;
>> > +
>> > + /* state for task life cycle */
>> > + enum cmdq_task_state task_state;
>> > + /* virtual address of command buffer */
>> > + u32 *va_base;
>> > + /* physical address of command buffer */
>> > + dma_addr_t mva_base;
>> > + /* size of allocated command buffer */
>> > + u32 buf_size;
>>
>> size_t
>
> will fix
>
>> > + /* It points to a cmdq_cmd_buf if this task use command buffer pool. */
>> > + struct cmdq_cmd_buf *cmd_buf;
>> > +
>> > + int scenario;
>> > + int priority;
>> > + u64 engine_flag;
>> > + u32 command_size;
>> > + u32 num_cmd;
>> > + int reorder;
>> > + /* HW thread ID; CMDQ_INVALID_THREAD if not running */
>> > + int thread;
>>
>> I think this driver will be a lot more clear if you do this:
>>
>> struct cmdq_thread *thread;
>>
>> And then use "NULL" for "invalid thread" instead of CMDQ_INVALID_THREAD.
>
> I will try to use cmdq_thread instead of thread id if possible.
> If CMDQ_INVALID_THREAD id is not necessary any more, I will remove it.
>
>> > + /* flag of IRQ received */
>> > + int irq_flag;
>> > + /* callback functions */
>> > + struct cmdq_task_cb cb;
>> > + /* work item when auto release is used */
>> > + struct work_struct auto_release_work;
>> > +
>> > + unsigned long long submit; /* submit time */
>> > +
>> > + pid_t caller_pid;
>> > + char caller_name[TASK_COMM_LEN];
>> > +};
>> > +
>> > +struct cmdq_thread {
>> > + u32 task_count;
>> > + u32 wait_cookie;
>> > + u32 next_cookie;
>> > + struct cmdq_task *cur_task[CMDQ_MAX_TASK_IN_THREAD];
>> > +};
>> > +
>> > +struct cmdq {
>> > + struct device *dev;
>> > + struct notifier_block pm_notifier;
>> > +
>> > + void __iomem *base_va;
>> > + unsigned long base_pa;
>>
>> I think we can remove base_pa (it is only used in a debug print), and just call
>> "base_va" as "base".
>
> will do
>
>> > + u32 irq;
>> > +
>> > + /*
>> > + * task information
>> > + * task_cache: struct cmdq_task object cache
>> > + * task_free_list: unused free tasks
>> > + * task_active_list: active tasks
>> > + * task_consume_wait_queue_item: task consumption work item
>> > + * task_auto_release_wq: auto-release workqueue
>> > + * task_consume_wq: task consumption workqueue (for queued tasks)
>> > + */
>> > + struct kmem_cache *task_cache;
>> > + struct list_head task_free_list;
>> > + struct list_head task_active_list;
>> > + struct list_head task_wait_list;
>> > + struct work_struct task_consume_wait_queue_item;
>> > + struct workqueue_struct *task_auto_release_wq;
>> > + struct workqueue_struct *task_consume_wq;
>> > + u16 task_count[CMDQ_MAX_THREAD_COUNT];
>>
>> AFAICT, this task_count is not used?
>
> will remove
>
>> > +
>> > + struct cmdq_thread thread[CMDQ_MAX_THREAD_COUNT];
>> > +
>> > + /* mutex, spinlock, flag */
>> > + struct mutex task_mutex; /* for task list */
>> > + struct mutex clock_mutex; /* for clock operation */
>> > + spinlock_t thread_lock; /* for cmdq hardware thread */
>> > + int thread_usage;
>> > + spinlock_t exec_lock; /* for exec task */
>> > +
>> > + /* suspend */
>> > + bool suspended;
>> > +
>> > + /* command buffer pool */
>> > + struct cmdq_cmd_buf cmd_buf_pool[CMDQ_CMD_BUF_POOL_BUF_NUM];
>> > +
>> > + /*
>> > + * notification
>> > + * wait_queue: for task done
>> > + * thread_dispatch_queue: for thread acquiring
>> > + */
>> > + wait_queue_head_t wait_queue[CMDQ_MAX_THREAD_COUNT];
>> > + wait_queue_head_t thread_dispatch_queue;
>> > +
>> > + /* ccf */
>> > + struct clk *clock;
>> > +};
>> > +
>> > +struct cmdq_event_name {
>> > + enum cmdq_event event;
>> > + char *name;
>>
>> const char *
>
> will do
>
>> > +};
>> > +
>> > +struct cmdq_subsys {
>> > + u32 base_addr;
>> > + int id;
>> > + char *grp_name;
>>
>> const char *
>
> will do
>
>> > +};
>> > +
>> > +static const struct cmdq_event_name g_event_name[] = {
>> > + /* Display start of frame(SOF) events */
>> > + {CMDQ_EVENT_DISP_OVL0_SOF, "CMDQ_EVENT_DISP_OVL0_SOF",},
>>
>> You can drop the "," inside "}".
>
> will do
>
>> > + {CMDQ_EVENT_DISP_OVL1_SOF, "CMDQ_EVENT_DISP_OVL1_SOF",},
>> > + {CMDQ_EVENT_DISP_RDMA0_SOF, "CMDQ_EVENT_DISP_RDMA0_SOF",},
>> > + {CMDQ_EVENT_DISP_RDMA1_SOF, "CMDQ_EVENT_DISP_RDMA1_SOF",},
>> > + {CMDQ_EVENT_DISP_RDMA2_SOF, "CMDQ_EVENT_DISP_RDMA2_SOF",},
>> > + {CMDQ_EVENT_DISP_WDMA0_SOF, "CMDQ_EVENT_DISP_WDMA0_SOF",},
>> > + {CMDQ_EVENT_DISP_WDMA1_SOF, "CMDQ_EVENT_DISP_WDMA1_SOF",},
>> > + /* Display end of frame(EOF) events */
>> > + {CMDQ_EVENT_DISP_OVL0_EOF, "CMDQ_EVENT_DISP_OVL0_EOF",},
>> > + {CMDQ_EVENT_DISP_OVL1_EOF, "CMDQ_EVENT_DISP_OVL1_EOF",},
>> > + {CMDQ_EVENT_DISP_RDMA0_EOF, "CMDQ_EVENT_DISP_RDMA0_EOF",},
>> > + {CMDQ_EVENT_DISP_RDMA1_EOF, "CMDQ_EVENT_DISP_RDMA1_EOF",},
>> > + {CMDQ_EVENT_DISP_RDMA2_EOF, "CMDQ_EVENT_DISP_RDMA2_EOF",},
>> > + {CMDQ_EVENT_DISP_WDMA0_EOF, "CMDQ_EVENT_DISP_WDMA0_EOF",},
>> > + {CMDQ_EVENT_DISP_WDMA1_EOF, "CMDQ_EVENT_DISP_WDMA1_EOF",},
>> > + /* Mutex end of frame(EOF) events */
>> > + {CMDQ_EVENT_MUTEX0_STREAM_EOF, "CMDQ_EVENT_MUTEX0_STREAM_EOF",},
>> > + {CMDQ_EVENT_MUTEX1_STREAM_EOF, "CMDQ_EVENT_MUTEX1_STREAM_EOF",},
>> > + {CMDQ_EVENT_MUTEX2_STREAM_EOF, "CMDQ_EVENT_MUTEX2_STREAM_EOF",},
>> > + {CMDQ_EVENT_MUTEX3_STREAM_EOF, "CMDQ_EVENT_MUTEX3_STREAM_EOF",},
>> > + {CMDQ_EVENT_MUTEX4_STREAM_EOF, "CMDQ_EVENT_MUTEX4_STREAM_EOF",},
>> > + /* Display underrun events */
>> > + {CMDQ_EVENT_DISP_RDMA0_UNDERRUN, "CMDQ_EVENT_DISP_RDMA0_UNDERRUN",},
>> > + {CMDQ_EVENT_DISP_RDMA1_UNDERRUN, "CMDQ_EVENT_DISP_RDMA1_UNDERRUN",},
>> > + {CMDQ_EVENT_DISP_RDMA2_UNDERRUN, "CMDQ_EVENT_DISP_RDMA2_UNDERRUN",},
>> > + /* Keep this at the end of HW events */
>> > + {CMDQ_MAX_HW_EVENT_COUNT, "CMDQ_MAX_HW_EVENT_COUNT",},
>> > + /* GPR events */
>>
>> What are "GPR" events?
>
> They are events of General Purpose Register of GCE.
> I will remove them if driver doesn't need to take care of them.
>
>> > + {CMDQ_SYNC_TOKEN_GPR_SET_0, "CMDQ_SYNC_TOKEN_GPR_SET_0",},
>> > + {CMDQ_SYNC_TOKEN_GPR_SET_1, "CMDQ_SYNC_TOKEN_GPR_SET_1",},
>> > + {CMDQ_SYNC_TOKEN_GPR_SET_2, "CMDQ_SYNC_TOKEN_GPR_SET_2",},
>> > + {CMDQ_SYNC_TOKEN_GPR_SET_3, "CMDQ_SYNC_TOKEN_GPR_SET_3",},
>> > + {CMDQ_SYNC_TOKEN_GPR_SET_4, "CMDQ_SYNC_TOKEN_GPR_SET_4",},
>> > + /* This is max event and also can be used as mask. */
>> > + {CMDQ_SYNC_TOKEN_MAX, "CMDQ_SYNC_TOKEN_MAX",},
>> > + /* Invalid event */
>> > + {CMDQ_SYNC_TOKEN_INVALID, "CMDQ_SYNC_TOKEN_INVALID",},
>> > +};
>> > +
>> > +static const struct cmdq_subsys g_subsys[] = {
>> > + {0x1400, 1, "MMSYS"},
>> > + {0x1401, 2, "DISP"},
>> > + {0x1402, 3, "DISP"},
>>
>> This isn't going to scale. These addresses could be different on
>> different chips.
>> Instead of a static table like this, we probably need specify to the
>> connection between gce and other devices via devicetree phandles, and
>> then use the phandles to lookup the corresponding device address
>> range.
>
> I will define them in device tree.
> E.g.
> cmdq {
> reg_domain = 0x14000000, 0x14010000, 0x14020000
> }

The devicetree should only model hardware relationships, not software
considerations.

Is the hardware constraint here for using gce with various other
hardware blocks? I think we already model this by only providing a
gce phandle in the device tree nodes for those devices that can use
gce.

Looking at the driver closer, as far as I can tell, the whole subsys
concept is a purely software abstraction, and only used to debug the
CMDQ_CODE_WRITE command. In fact, AFAICT, everything would work fine
if we just completely removed the 'subsys' concept, and just passed
through the raw address provided by the driver.

So, I recommend just removing 'subsys' completely from the driver -
from this array, and in the masks.

Instead, if there is an error on the write command, just print the
address that fails. There are other ways to deduce the subsystem from
a physical address.

Thanks,

-Dan