Re: [PATCH 4/5] trace: Make removal of ring buffer pages atomic

From: Steven Rostedt
Date: Fri Jul 29 2011 - 17:23:28 EST


On Tue, 2011-07-26 at 15:59 -0700, Vaibhav Nagarnaik wrote:

> + ret = NULL;
> + if (((unsigned long)to_remove & RB_FLAG_MASK) == RB_PAGE_HEAD) {
> + /*
> + * this is a head page, we have to set RB_PAGE_HEAD
> + * flag while updating the next pointer
> + */
> + unsigned long tmp = (unsigned long)next_page |
> + RB_PAGE_HEAD;
> + ret = cmpxchg(&tail_page->next, to_remove,
> + (struct list_head *) tmp);

This is fine, it will work.

> +
> + } else if (((unsigned long)to_remove & ~RB_PAGE_HEAD) ==
> + (unsigned long)to_remove) {
> +
> + /* not a head page, just update the next pointer */
> + ret = cmpxchg(&tail_page->next, to_remove, next_page);

This is not, it wont work.

You can *only* remove the HEAD from the ring buffer without causing
issues.

As you probably know, the trick is done with the list pointers. We or
the pointer with 1 for head, and the writer will or it with 2 when it
updates the page.

This only works if we have a 1 or 2 here. Now if we try to do what you
suggest, by starting with a 0, and ending with 0, we may fail. Between
the to_remove = tail_page->next and the cmpxchg(), the writer could
easily move to the tail page, and you would never know it.

Now we just removed the tail page with no idea that the write is on it.
The writer could have also moved on to the next page, and we just
removed the most recently recorded data.

The only way to really make this work is to always get it from the HEAD
page. If there's data there, we could just store it separately, so that
the read_page can read from it first. We will still need to be careful
with the writer on the page. But I think this is doable.

That is, read the pages from head, if there's no data on it, simply
remove the pages. If there is data, we store it off later. If the writer
happens to be on the page, we will can check that. We could even
continue to get pages, because we will be moving the header page with
the cmpxchg, and the writer does that too. It will take some serious
thought, but it is possible to do this.

-- Steve



> +
> + } else {
> + /*
> + * this means that this page is being operated on
> + * try the next page in the list
> + */
> + }
> +
> + if (ret != to_remove) {
> + /*
> + * Well, try again with the next page.
> + * If we cannot move the page in 3 retries, there are
> + * lot of interrupts on this cpu and probably causing
> + * some weird behavior. Warn in this case and stop
> + * tracing
> + */
> + if (RB_WARN_ON(cpu_buffer, !retries--))
> + break;
> + else
> + continue;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/