Re: [RFC PATCH] Crypto-engine support for parallel requests

From: Iuliana Prodan
Date: Tue Jan 21 2020 - 09:20:33 EST


On 1/21/2020 12:00 PM, Corentin Labbe wrote:
> On Tue, Jan 21, 2020 at 01:32:29AM +0200, Iuliana Prodan wrote:
>> Added support for executing multiple requests, in parallel,
>> for crypto engine.
>> A no_reqs is initialized and set in the new
>> crypto_engine_alloc_init_and_set function.
>> Here, is also set the maximum size for crypto-engine software
>> queue (not hardcoded anymore).
>> On crypto_pump_requests the no_reqs is increased, until the
>> max_no_reqs is reached, and decreased on crypto_finalize_request,
>> or on error path (in case a prepare_request or do_one_request
>> operation was unsuccessful).
>>
>> Signed-off-by: Iuliana Prodan <iuliana.prodan@xxxxxxx>
>> ---
>> crypto/crypto_engine.c | 112 +++++++++++++++++++++++++++++++++---------------
>> include/crypto/engine.h | 11 +++--
>> 2 files changed, 84 insertions(+), 39 deletions(-)
>>
>> diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
>> index eb029ff..5219141 100644
>> --- a/crypto/crypto_engine.c
>> +++ b/crypto/crypto_engine.c
>> @@ -14,6 +14,7 @@
>> #include "internal.h"
>>
>> #define CRYPTO_ENGINE_MAX_QLEN 10
>> +#define CRYPTO_ENGINE_MAX_CONCURRENT_REQS 1
>>
>> /**
>> * crypto_finalize_request - finalize one request if the request is done
>> @@ -22,32 +23,27 @@
>> * @err: error number
>> */
>> static void crypto_finalize_request(struct crypto_engine *engine,
>> - struct crypto_async_request *req, int err)
>> + struct crypto_async_request *req, int err)
>> {
>> unsigned long flags;
>> - bool finalize_cur_req = false;
>> + bool finalize_req = false;
>> int ret;
>> struct crypto_engine_ctx *enginectx;
>>
>> spin_lock_irqsave(&engine->queue_lock, flags);
>> - if (engine->cur_req == req)
>> - finalize_cur_req = true;
>> + if (engine->no_reqs > 0) {
>> + finalize_req = true;
>> + engine->no_reqs--;
>> + }
>> spin_unlock_irqrestore(&engine->queue_lock, flags);
>>
>> - if (finalize_cur_req) {
>> - enginectx = crypto_tfm_ctx(req->tfm);
>> - if (engine->cur_req_prepared &&
>> - enginectx->op.unprepare_request) {
>> - ret = enginectx->op.unprepare_request(engine, req);
>> - if (ret)
>> - dev_err(engine->dev, "failed to unprepare request\n");
>> - }
>> - spin_lock_irqsave(&engine->queue_lock, flags);
>> - engine->cur_req = NULL;
>> - engine->cur_req_prepared = false;
>> - spin_unlock_irqrestore(&engine->queue_lock, flags);
>> + enginectx = crypto_tfm_ctx(req->tfm);
>> + if (finalize_req && enginectx->op.prepare_request &&
>> + enginectx->op.unprepare_request) {
>> + ret = enginectx->op.unprepare_request(engine, req);
>> + if (ret)
>> + dev_err(engine->dev, "failed to unprepare request\n");
>> }
>> -
>> req->complete(req, err);
>>
>> kthread_queue_work(engine->kworker, &engine->pump_requests);
>> @@ -73,8 +69,8 @@ static void crypto_pump_requests(struct crypto_engine *engine,
>>
>> spin_lock_irqsave(&engine->queue_lock, flags);
>>
>> - /* Make sure we are not already running a request */
>> - if (engine->cur_req)
>> + /* Make sure we have space, for more requests to run */
>> + if (engine->no_reqs >= engine->max_no_reqs)
>> goto out;
>>
>> /* If another context is idling then defer */
>> @@ -108,13 +104,16 @@ static void crypto_pump_requests(struct crypto_engine *engine,
>> goto out;
>> }
>>
>> +retry:
>> /* Get the fist request from the engine queue to handle */
>> backlog = crypto_get_backlog(&engine->queue);
>> async_req = crypto_dequeue_request(&engine->queue);
>> if (!async_req)
>> goto out;
>>
>> - engine->cur_req = async_req;
>> + /* Increase the number of concurrent requests that are in execution */
>> + engine->no_reqs++;
>> +
>> if (backlog)
>> backlog->complete(backlog, -EINPROGRESS);
>>
>> @@ -130,7 +129,7 @@ static void crypto_pump_requests(struct crypto_engine *engine,
>> ret = engine->prepare_crypt_hardware(engine);
>> if (ret) {
>> dev_err(engine->dev, "failed to prepare crypt hardware\n");
>> - goto req_err;
>> + goto req_err_2;
>> }
>> }
>>
>> @@ -141,26 +140,45 @@ static void crypto_pump_requests(struct crypto_engine *engine,
>> if (ret) {
>> dev_err(engine->dev, "failed to prepare request: %d\n",
>> ret);
>> - goto req_err;
>> + goto req_err_2;
>> }
>> - engine->cur_req_prepared = true;
>> }
>> if (!enginectx->op.do_one_request) {
>> dev_err(engine->dev, "failed to do request\n");
>> ret = -EINVAL;
>> - goto req_err;
>> + goto req_err_1;
>> }
>> +
>> ret = enginectx->op.do_one_request(engine, async_req);
>> if (ret) {
>> dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret);
>> - goto req_err;
>> + goto req_err_1;
>> }
>> - return;
>> -
>> -req_err:
>> - crypto_finalize_request(engine, async_req, ret);
>> - return;
>>
>> + /*
>> + * If there is still space for concurrent requests,
>> + * try and send a new one
>> + */
>> + spin_lock_irqsave(&engine->queue_lock, flags);
>> + if (engine->no_reqs < engine->max_no_reqs)
>> + goto retry;
>
> You should check if engine->queue.qlen > 0 before retry.
>
>> + goto out;
>> +
>> +req_err_1:
>> + if (enginectx->op.unprepare_request) {
>> + ret = enginectx->op.unprepare_request(engine, async_req);
>> + if (ret)
>> + dev_err(engine->dev, "failed to unprepare request\n");
>> + }
>> +req_err_2:
>> + async_req->complete(async_req, ret);
>> + spin_lock_irqsave(&engine->queue_lock, flags);
>> + /*
>> + * If unable to prepare or execute the request,
>> + * decrease the number of concurrent requests
>> + */
>> + engine->no_reqs--;
>> + goto retry;
>
> You should check if engine->queue.qlen > 0 before retry.
>
Here (and above) is not needed to check for qlen > 0, since on retry,
first thing is tryin to dequeue an async_req from crypto-engine queue.
In crypto_dequeue_request function is a check for qlen, that means than
in pump_request will goto out.

>> out:
>> spin_unlock_irqrestore(&engine->queue_lock, flags);
>> }
>> @@ -386,15 +404,21 @@ int crypto_engine_stop(struct crypto_engine *engine)
>> EXPORT_SYMBOL_GPL(crypto_engine_stop);
>>
>> /**
>> - * crypto_engine_alloc_init - allocate crypto hardware engine structure and
>> - * initialize it.
>> + * crypto_engine_alloc_init_and_set - allocate crypto hardware engine structure
>> + * and initialize it by setting the maximum number of entries in the software
>> + * crypto-engine queue and the maximum number of concurrent requests that can
>> + * be executed at once.
>> * @dev: the device attached with one hardware engine
>> * @rt: whether this queue is set to run as a realtime task
>> + * @max_no_reqs: maximum number of request that can be executed in parallel
>> + * @qlen: maximum size of the crypto-engine queue
>> *
>> * This must be called from context that can sleep.
>> * Return: the crypto engine structure on success, else NULL.
>> */
>> -struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
>> +struct crypto_engine *crypto_engine_alloc_init_and_set(struct device *dev,
>> + bool rt, int max_no_reqs,
>> + int qlen)
>> {
>> struct sched_param param = { .sched_priority = MAX_RT_PRIO / 2 };
>> struct crypto_engine *engine;
>> @@ -411,12 +435,13 @@ struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
>> engine->running = false;
>> engine->busy = false;
>> engine->idling = false;
>> - engine->cur_req_prepared = false;
>> engine->priv_data = dev;
>> snprintf(engine->name, sizeof(engine->name),
>> "%s-engine", dev_name(dev));
>> + engine->max_no_reqs = max_no_reqs;
>> + engine->no_reqs = 0;
>>
>> - crypto_init_queue(&engine->queue, CRYPTO_ENGINE_MAX_QLEN);
>> + crypto_init_queue(&engine->queue, qlen);
>> spin_lock_init(&engine->queue_lock);
>>
>> engine->kworker = kthread_create_worker(0, "%s", engine->name);
>> @@ -433,6 +458,23 @@ struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
>>
>> return engine;
>> }
>> +EXPORT_SYMBOL_GPL(crypto_engine_alloc_init_and_set);
>> +
>> +/**
>> + * crypto_engine_alloc_init - allocate crypto hardware engine structure and
>> + * initialize it.
>> + * @dev: the device attached with one hardware engine
>> + * @rt: whether this queue is set to run as a realtime task
>> + *
>> + * This must be called from context that can sleep.
>> + * Return: the crypto engine structure on success, else NULL.
>> + */
>> +struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
>> +{
>> + return crypto_engine_alloc_init_and_set(dev, rt,
>> + CRYPTO_ENGINE_MAX_CONCURRENT_REQS,
>> + CRYPTO_ENGINE_MAX_QLEN);
>> +}
>> EXPORT_SYMBOL_GPL(crypto_engine_alloc_init);
>>
>> /**
>> diff --git a/include/crypto/engine.h b/include/crypto/engine.h
>> index e29cd67..5f9a6df 100644
>> --- a/include/crypto/engine.h
>> +++ b/include/crypto/engine.h
>> @@ -24,7 +24,6 @@
>> * @idling: the engine is entering idle state
>> * @busy: request pump is busy
>> * @running: the engine is on working
>> - * @cur_req_prepared: current request is prepared
>> * @list: link with the global crypto engine list
>> * @queue_lock: spinlock to syncronise access to request queue
>> * @queue: the crypto queue of the engine
>> @@ -38,14 +37,14 @@
>> * @kworker: kthread worker struct for request pump
>> * @pump_requests: work struct for scheduling work to the request pump
>> * @priv_data: the engine private data
>> - * @cur_req: the current request which is on processing
>> + * @max_no_reqs: maximum number of request which can be processed in parallel
>> + * @no_reqs: current number of request which are processed in parallel
>> */
>> struct crypto_engine {
>> char name[ENGINE_NAME_LEN];
>> bool idling;
>> bool busy;
>> bool running;
>> - bool cur_req_prepared;
>>
>> struct list_head list;
>> spinlock_t queue_lock;
>> @@ -61,7 +60,8 @@ struct crypto_engine {
>> struct kthread_work pump_requests;
>>
>> void *priv_data;
>> - struct crypto_async_request *cur_req;
>> + int max_no_reqs;
>> + int no_reqs;
>> };
>>
>> /*
>> @@ -102,6 +102,9 @@ void crypto_finalize_skcipher_request(struct crypto_engine *engine,
>> int crypto_engine_start(struct crypto_engine *engine);
>> int crypto_engine_stop(struct crypto_engine *engine);
>> struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt);
>> +struct crypto_engine *crypto_engine_alloc_init_and_set(struct device *dev,
>> + bool rt, int max_no_reqs,
>> + int qlen);
>> int crypto_engine_exit(struct crypto_engine *engine);
>>
>> #endif /* _CRYPTO_ENGINE_H */
>> --
>> 2.1.0
>>
>
> Hello
>
> In your model, who is running finalize_request() ?
finalize_request() in CAAM, and in other drivers, is called on the _done
callback (stm32, virtio and omap).

> In caam it seems that you have a taskqueue dedicated for that but you cannot assume that all drivers will have this.
> I think the crypto_engine should be sufficient by itself and does not need external thread/taskqueue.
>
> But in your case, it seems that you dont have the choice, since do_one_request does not "do" but simply enqueue the request in the "jobring".
>
But, do_one_request it shouldn't, necessary, execute the request. Is ok
to enqueue it, since we have asynchronous requests. do_one_request is
not blocking.

> What about adding along prepare/do_one_request/unprepare a new enqueue()/can_do_more() function ?
>
> The stream will be:
> retry:
> optionnal prepare
> optionnal enqueue
> optionnal can_do_more() (goto retry)
> optionnal do_one_request
>
> then
> finalize()
> optionnal unprepare
>

I'm planning to improve crypto-engine incrementally, so I'm taking one
step at a time :)
But I'm not sure if adding an enqueue operation is a good idea, since,
my understanding, is that do_one_request is a non-blocking operation and
it shouldn't execute the request.

IMO, the crypto-engine flow should be kept simple:
1. a request comes to hw -> this is doing transfer_request_to_engine;
2. CE enqueue the requests
3. on pump_requests:
3. a) optional prepare operation
3. b) sends the reqs to hw, by do_one_request operation. To wait for
completion here it contradicts the asynchronous crypto API.
do_one_request operation has a crypto_async_request type as argument.
Note: Step 3. b) can be done several times, depending on size of hw queue.
4. in driver, when req is done:
4. a) optional unprepare operation
4. b) crypto_finalize_request is called


Thanks,
Iulia

> The can_do_more simply will tell if we can enqueue more (this will handle your case(ringjob), and my case(batching)
> Instead of storing the limit in the crypto_engine, you keep control on the driver side.
>
> For your case the do_one_request will be unset, for mine I will use to ran the batch.
> But for other drivers, no change will be necessary (appart adding some enqueue=NULL,can_do_more=NULL).
>
> We can also imagine an easier solution like enqueue returning a positive value saying to queue more.
>
> Regards
>