Re: Mercurial 0.3 vs git benchmarks

From: Magnus Damm
Date: Tue Apr 26 2005 - 11:31:42 EST

On 4/26/05, Chris Mason <mason@xxxxxxxx> wrote:
> On Tuesday 26 April 2005 11:09, Magnus Damm wrote:
> > On 4/26/05, Chris Mason <mason@xxxxxxxx> wrote:
> > > This agrees with my tests here, the time to apply patches is somewhat
> > > disk bound, even for the small 100 or 200 patch series. The io should be
> > > coming from data=ordered, since the commits are still every 5 seconds or
> > > so.
> >
> > Yes, as long as you apply the patches to disk that is. I've hacked up
> > a small backend tool that applies patches to files kept in memory and
> > uses a modifed rabin-karp search to match hunks. So you basically read
> > once and write once per file instead of moving data around for each
> > applied patch. But it needs two passes.
> >
> > And no, the source code for the entire Linux kernel is not kept in
> > memory - you need a smart frontend to manage the file cache. Drop me a
> > line if you are interested.
> Sorry, you've lost me. Right now the cycle goes like this:

Ehrm, maybe I'm way off. =)

> 1) patch reads patch file, reads source file, writes source file
> 2) update-cache reads source file, writes git file


> Which of those writes are you avoiding? We have a smart way to manage the
> cache already for the source files...the vm does pretty well. There's
> nothing to manage for the git files. For the apply a bunch of patches
> workload, they are write once, read never (except for the index).

Well, maybe I misunderstood everything, but I thought you were
applying a lot of patches and complained that it took a lot of time
due to the data order.

When I applied a lot of patches to the kernel recently the cpu load
dropped to zero after a while and the HD worked hard a sec or two and
then things came back again. My primitive guess is that it was because
the ext3 journal became full. To workaround this fact I started
hacking on this in-memory patcher.

In the cycle above, I'm trying to speed up step 1:
If the patch modifies each source file multiple times (either using
multiple hunks or multiple ---/+++) then the lines below the hunk in
the source file will be moved multiple times. And if the source file
is written to disk after each hunk or ---/+++ is applied then this
will generate a lot of writes that can be avoided if the entire patch
procedure is broken down into a first pass that analyzes the patches
and a second pass that applies the patches and keeps source files in

But my rather trivial observation above is of course only suitable if
you have a lot of patches that should be applied and you are only
interested in the final version of the patched source files. If you
apply one patch at a time and import each source file as a new
revision then my little hack is probably not for you.

/ magnus
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at