Re: [dm-devel] dm-writeboost testing

From: Mikulas Patocka
Date: Fri Oct 04 2013 - 09:39:14 EST




On Fri, 4 Oct 2013, Akira Hayakawa wrote:

> Hi, Mikulas,
>
> I am sorry to say that
> I don't have such machines to reproduce the problem.
>
> But agree with that I am dealing with workqueue subsystem
> in a little bit weird way.
> I should clean them up.
>
> For example,
> free_cache() routine below is
> a deconstructor of the cache metadata
> including all the workqueues.
>
> void free_cache(struct wb_cache *cache)
> {
> cache->on_terminate = true;
>
> /* Kill in-kernel daemons */
> cancel_work_sync(&cache->sync_work);
> cancel_work_sync(&cache->recorder_work);
> cancel_work_sync(&cache->modulator_work);
>
> cancel_work_sync(&cache->flush_work);
> destroy_workqueue(cache->flush_wq);
>
> cancel_work_sync(&cache->barrier_deadline_work);
>
> cancel_work_sync(&cache->migrate_work);
> destroy_workqueue(cache->migrate_wq);
> free_migration_buffer(cache);
>
> /* Destroy in-core structures */
> free_ht(cache);
> free_segment_header_array(cache);
>
> free_rambuf_pool(cache);
> }
>
> cancel_work_sync() before destroy_workqueue()
> can probably be removed because destroy_workqueue() first
> flush all the works.
>
> Although I prepares independent workqueue
> for each flush_work and migrate_work
> other four works are queued into the system_wq
> through schedule_work() routine.
> This asymmetricity is not welcome for
> architecture-portable code.
> Dependencies to the subsystem should be minimized.
> In detail, workqueue subsystem is really changing
> about its concurrency support so
> trusting only the single threaded workqueue
> will be a good idea for stability.

The problem is that you are using workqueues the wrong way. You submit a
work item to a workqueue and the work item is active until the device is
unloaded.

If you submit a work item to a workqueue, it is required that the work
item finishes in finite time. Otherwise, it may stall stall other tasks.
The deadlock when I terminate Xserver is caused by this - the nvidia
driver tries to flush system workqueue and it waits for all work items to
terminate - but your work items don't terminate.

If you need a thread that runs for a long time, you should use
kthread_create, not workqueues (see this
http://people.redhat.com/~mpatocka/patches/kernel/dm-crypt-paralelizace/old-3/dm-crypt-encryption-threads.patch
or this
http://people.redhat.com/~mpatocka/patches/kernel/dm-crypt-paralelizace/old-3/dm-crypt-offload-writes-to-thread.patch
as an example how to use kthreads).

Mikulas

> To begin with,
> these works are never out of queue
> until the deconstructor is called
> but they are repeating running and sleeping.
> Queuing these kind of works to system_wq
> may be unsupported.
>
> So,
> my strategy is to clean them up in a way that
> 1. all daemons are having their own workqueue
> 2. never use cancel_work_sync() but only calls destroy_workqueue()
> in the deconstructor free_cache() and error handling in resume_cache().
>
> Could you please run the same test again
> after I fixed these points
> to see whether it is still reproducible?
>
>
> > On 3.11.3 on PA-RISC without preemption, the device unloads (although it
> > takes many seconds and vmstat shows that the machine is idle during this
> > time)
> This behavior is benign but probably should be improved.
> In said free_cache() it first turns `on_terminate` flag to true
> to notify all the daemons that we are shutting down.
> Since the `update_interval` and `sync_interval` are 60 seconds by default
> we must wait for them to finish for a while.
>
> Akira
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/