Re: Improve lseek scalability v3

From: Andres Freund
Date: Fri Sep 16 2011 - 17:02:47 EST


On Friday, September 16, 2011 10:08:17 PM Benjamin LaHaise wrote:
> On Fri, Sep 16, 2011 at 07:27:33PM +0200, Andres Freund wrote:
> > many tuples does the table have. Those statistics are only updated every
> > now and then though.
> > So it uses those old stats to check how many tuples are normally stored
> > on a page and then uses that to extrapolate the number of tuples from
> > the current nr of pages (which is computed by lseek(SEEK_END) over the
> > 1GB segements of a table).
> >
> > I am not sure how interested you are on the relevant postgres internals?
>
> For such tables, can't Postgres track the size of the file internally? I'm
> assuming it's keeping file descriptors open on the tables it manages, in
> which case when it writes to a file to extend it, the internally stored
> size could be updated. Not making a syscall at all would scale far better
> than even a modified lseek() will perform.
Yes, it tracks the fds internally. The problem is that postgres is process
based so those tables are not reachable by all processes. It could start
tracking those in shared memory but the synchronization overhead for that
would likely be more expensive than the syscall overhead (Given that the
fdsets are possibly (and realistically) disjunct between the individual
backends you would have to reserve enough shared memory for a fully seperate
fds between each process... Which would complicate efficient lookup).

Also with fstat() instead of lseek() there was no bottleneck anymore, so I
don't think the benefits would warrant that.

Greetings,

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/