Re: [PATCH] vc4: Add an ioctl for labeling GEM BOs for summary stats

From: Chris Wilson
Date: Fri Jul 28 2017 - 14:53:29 EST


Quoting Eric Anholt (2017-07-25 19:27:25)
> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes:
>
> > Quoting Eric Anholt (2017-06-22 21:50:54)
> >> This has proven immensely useful for debugging memory leaks and
> >> overallocation (which is a rather serious concern on the platform,
> >> given that we typically run at about 256MB of CMA out of up to 1GB
> >> total memory, with framebuffers that are about 8MB ecah).
> >>
> >> The state of the art without this is to dump debug logs from every GL
> >> application, guess as to kernel allocations based on bo_stats, and try
> >> to merge that all together into a global picture of memory allocation
> >> state. With this, you can add a couple of calls to the debug build of
> >> the 3D driver and get a pretty detailed view of GPU memory usage from
> >> /debug/dri/0/bo_stats (or when we debug print to dmesg on allocation
> >> failure).
> >>
> >> The Mesa side currently labels at the gallium resource level (so you
> >> see that a 1920x20 pixmap has been created, presumably for the window
> >> system panel), but we could extend that to be even more useful with
> >> glObjectLabel() names being sent all the way down to the kernel.
> >>
> >> (partial) example of sorted debugfs output with Mesa labeling all
> >> resources:
> >>
> >> kernel BO cache: 16392kb BOs (3)
> >> tiling shadow 1920x1080: 8160kb BOs (1)
> >> resource 1920x1080@32/0: 8160kb BOs (1)
> >> scanout resource 1920x1080@32/0: 8100kb BOs (1)
> >> kernel: 8100kb BOs (1)
> >>
> >> Signed-off-by: Eric Anholt <eric@xxxxxxxxxx>
> >> ---
> >
> >> static void vc4_bo_stats_dump(struct vc4_dev *vc4)
> >> {
> >
> > Now unused?
>
> Still used from the splat when CMA allocation fails. It's less
> catastrophic than it used to be, but still bad, so we're dumping to
> dmesg for now.
>
> >> - DRM_INFO("num bos allocated: %d\n",
> >> - vc4->bo_stats.num_allocated);
> >> - DRM_INFO("size bos allocated: %dkb\n",
> >> - vc4->bo_stats.size_allocated / 1024);
> >> - DRM_INFO("num bos used: %d\n",
> >> - vc4->bo_stats.num_allocated - vc4->bo_stats.num_cached);
> >> - DRM_INFO("size bos used: %dkb\n",
> >> - (vc4->bo_stats.size_allocated -
> >> - vc4->bo_stats.size_cached) / 1024);
> >> - DRM_INFO("num bos cached: %d\n",
> >> - vc4->bo_stats.num_cached);
> >> - DRM_INFO("size bos cached: %dkb\n",
> >> - vc4->bo_stats.size_cached / 1024);
> >> + int i;
> >> +
> >> + for (i = 0; i < vc4->num_labels; i++) {
> >> + if (!vc4->bo_labels[i].num_allocated)
> >> + continue;
> >> +
> >> + DRM_INFO("%30s: %6dkb BOs (%d)\n",
> >> + vc4->bo_labels[i].name,
> >> + vc4->bo_labels[i].size_allocated / 1024,
> >> + vc4->bo_labels[i].num_allocated);
> >> + }
> >> }
> >>
> >> #ifdef CONFIG_DEBUG_FS
> >
> >> +/* Must be called with bo_lock held. */
> >> +static void vc4_bo_set_label(struct drm_gem_object *gem_obj, int label)
> >> +{
> >> + struct vc4_bo *bo = to_vc4_bo(gem_obj);
> >> + struct vc4_dev *vc4 = to_vc4_dev(gem_obj->dev);
> >
> > lockdep_assert_held(&vc4->bo_lock);
>
> Nice. I've converted the other instances of this comment, too.
>
> >> +
> >> + vc4->bo_labels[label].num_allocated++;
> >> + vc4->bo_labels[label].size_allocated += gem_obj->size;
> >
> > This gets called with label=-1 on destroy.
>
> Oh, good catch, thanks. This code got reworked a couple of times and I
> lost that check.
>
> >> + vc4->bo_labels[bo->label].num_allocated--;
> >> + vc4->bo_labels[bo->label].size_allocated -= gem_obj->size;
> >
> > Ok, preassigned to TYPE_KERNEL on creation.
> >
> >> + if (vc4->bo_labels[bo->label].num_allocated == 0 &&
> >> + is_user_label(bo->label)) {
> >> + /* Free user BO label slots on last unreference. */
> >> + kfree(vc4->bo_labels[bo->label].name);
> >> + vc4->bo_labels[bo->label].name = NULL;
> >> + }
> >
> > This seems dubious. As a user I would expect the label I created to last
> > until I closed the fd. Otherwise if I ever close all bo of one type,
> > wait a few seconds for the cache to be reaped, then reallocate a new bo,
> > someone else may have assigned my label a new name.
> >
> > If that's guarded against, a comment would help.
>
> Userspace can't see the label index, though, and can only set string
> names on BOs. New text:
>
> /* Free user BO label slots on last unreference.
> * Slots are just where we track the stats for a given
> * name, and once a name is unused we can reuse that
> * slot.
> */

Ah. My mistake was in thinking that you passed the name_id to the create
ioctl, but names are assigned to the bo themselves, and so the slot does
only exist whilst the bo exist.

BOs are assigned to slots, and not slots to BOs.

Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
-Chris