Re: POHMELFS is back

From: Evgeniy Polyakov
Date: Tue Sep 20 2011 - 10:19:09 EST


On Tue, Sep 20, 2011 at 09:41:57AM -0400, Valdis.Kletnieks@xxxxxx (Valdis.Kletnieks@xxxxxx) wrote:
> > If you get 10 times more bandwidth you will not be able to saturate it
> > with 10 times less servers.
>
> The point is that the solutions we're looking at are able to drive enough I/O
> *per server* that we need to look at 10GigE and Infiniband connections. Your
> numbers currently indicate about 5T of disk and 75 megabit of throughput per
> node, while current solutions are doing about 100T and pushing a 10GigE per
> node. So you have a *lot* of per-server scaling work to do still...

Number of server nodes is smaller, and number of physical servers may be
even less. There is a fair number of proxy servers for cluster.

But overall of course every server does not saturate own 1gige link,
since, well, our uplinks are just gigabits :)

> > Scaling to hundreds of server nodes is a
> > good result, since we evenly balance all IO between nodes and no single
> > server is disk or network bound.
>
> You missed the point. Scaling to hundreds of server nodes is a nice
> *theoretical* result, but one that's not going to get a lot of traction out in
> the real world, where the *per server* scaling matters too. Which is my boss
> more likely to be willing to spend money on - a solution that has 50 servers
> per datacenter to deliver 4 Gb/sec per data center, or one that is delivering
> that much *per server*? Remember - servers cost money, rack space costs money,

You are not able to setup 1 server and deliver 4Gb/sec of random IO.
If you think this is possible, than you actualyl did not try to do it
with existing solutions.

> Looked at differently - if I'm currently targeting multiple gigabytes/sec throughput
> to a petabyte of disk from a half-dozen servers, how big and fast a disk farm
> could I build if I had 50 servers in the room, or 200 across datacenters?

A simple question, what RPS rate you got for random reads and writes?

Your solution may scale to bandwifth limits, which is not interesting
for us. Huge single-or-small-node solution is random IO limited, but if
you read big file, then you will be network limited, and can show nice
numbers of Gb/ solution is random IO limited, but if you read big file,
then you will be network limited, and can show nice numbers of Gb/s.

As of GPFS you mentioned, then you did not try to setup cluster with
weak links (i.e. between physically different datacenters), since it
resynchronizes nodes on every glitch and does not scale to RPS, although
quite good ad bulk IO.

So, basically, Elliptics was created for low-latency RPS loads and not
BULK IO.

--
Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/