Re: [PATCH 0/4] slab: implement byte sized indexes for the freelistof a slab

From: Joonsoo Kim
Date: Thu Sep 05 2013 - 02:55:59 EST


On Wed, Sep 04, 2013 at 05:33:05PM +0900, Joonsoo Kim wrote:
> On Tue, Sep 03, 2013 at 02:15:42PM +0000, Christoph Lameter wrote:
> > On Mon, 2 Sep 2013, Joonsoo Kim wrote:
> >
> > > This patchset implements byte sized indexes for the freelist of a slab.
> > >
> > > Currently, the freelist of a slab consist of unsigned int sized indexes.
> > > Most of slabs have less number of objects than 256, so much space is wasted.
> > > To reduce this overhead, this patchset implements byte sized indexes for
> > > the freelist of a slab. With it, we can save 3 bytes for each objects.
> > >
> > > This introduce one likely branch to functions used for setting/getting
> > > objects to/from the freelist, but we may get more benefits from
> > > this change.
> > >
> > > Below is some numbers of 'cat /proc/slabinfo' related to my previous posting
> > > and this patchset.
> >
> > You may also want to run some performance tests. The cache footprint
> > should also be reduced with this patchset and therefore performance should
> > be better.
>
> Yes, I did a hackbench test today, but I'm not ready for posting it.
> The performance is improved for my previous posting and futher improvement is
> founded by this patchset. Perhaps I will post it tomorrow.
>

Here are the results from both patchsets on my 4 cpus machine.

* Before *

Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

238,309,671 cache-misses ( +- 0.40% )

12.010172090 seconds time elapsed ( +- 0.21% )

* After my previous posting *

Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

229,945,138 cache-misses ( +- 0.23% )

11.627897174 seconds time elapsed ( +- 0.14% )


* After my previous posting + this patchset *

Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs):

218,640,472 cache-misses ( +- 0.42% )

11.504999837 seconds time elapsed ( +- 0.21% )



cache-misses are reduced whenever applying each patchset, roughly 5% respectively.
And elapsed times are also improved by 3.1% and 4.2% to baseline, respectively.

I think that all patchsets deserve to be merged, since it reduces memory usage and
also improves performance. :)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/