Re: some bad numbers with Java/database threading

From: Nick Piggin
Date: Thu Sep 13 2007 - 11:15:52 EST


On Thursday 13 September 2007 17:18, David Schwartz wrote:
> > I was working on some unit tests and thought I'd give CFS a whirl to see
> > if it had any impact on my workloads (to see what the fuss was about),
> > and I came up with some pretty disturbing numbers:
> > http://devloop.org.uk/documentation/database-performance/Linux-Ker
> > nels/Kernels-ManyThreads-CombinedTests-noload2.png
> > As above but also showing the load average:
> > http://devloop.org.uk/documentation/database-performance/Linux-Ker
> > nels/Kernels-ManyThreads-CombinedTests2.png
> > Looks like a regression to me...
>
> I've tried reasonalby diligently to figure out what the hell you're doing

(cc's readded please reply to all when replying to lkml)

Hi David,

You might be sounding a bit too abrasive here... I understand you're
also trying to help, but your tone just might be taken the wrong way.

Antonie is really doing the right thing here to test such a new feature
early and on the code he cares about as a user. And most importantly,
reporting it here. This is probably the most useful resource we have in
Linux.

Maybe the workload is quirky, but regardless, if it is a *regression*
from a previous kernel then it is really important to be brought to our
attention.


> and gone through quite a bit of your documentation, and I just can't figure
> it out. This could entirely be the result of your test's sensitivity to
> execution order.
>
> For example, if you run ten threads that all insert, query, and delete from
> the *same* table, then the exact interleaving pattern will determine the
> size of the results. A slight change in the scheduling quantum could
> multiply the size of the result data by a huge factor. There is a big
> difference between:
>
> 1) Thread A inserts data.
> 2) Thread A queries data.
> 3) Thread A deletes data.
> 4) Thread B inserts data.
> ...
>
>
> and
> 1) Thread A inserts data.
> 2) Thread B insers data.
> ...
> 101) Thread A queries data.
> 102) Thread B queries data.
> ...
>
> Now, even if they're using separate tables, your test is still very
> sensitive to execution order. If thread A runs to completion and then
> thread B does, the database data will fit better into cache. If thread A
> runs partially, then thread B runs partially, when thread A runs again, its
> database stuff will not be hot.
>
> >* java threads are created first and the data is prepared, then all the
> >threads are started in a tight loop. Each thread runs multiple queries
> >with a 10ms pause (to allow the other threads to get scheduled)
>
> There are a number of ways you might be measuring nothing but how the
> scheduler chooses to interleave your threads. Benchmarking threads that
> yield suggests just this type of thing -- if a thread has useful work to do
> and another thread is not going to help it, *why* *yield*?
>
> Are you worried the scheduler isn't going to schedule other threads?! Or is
> there some sane reason to force suboptimal scheduling when you're trying to
> benchmark a scheduler? Are you trying to see how it deals with pathological
> patterns? ;)
>
> The only documentation I can see about what you're actually *doing* says
> things like "The schema and statements are almost identical to the
> non-threaded tests." Do you see why that's not helpful?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/