Re: Using compression before encryption in device-mapper

From: Paulo Marques
Date: Wed Apr 14 2004 - 10:28:55 EST


Guillaume Lacôte wrote:

...
Actually (see my reply to Timothy Miller) I really want to do "compression" even if it does not reduce space: it is a matter of growing the per-bit entropy rather than to gain space (see http://jsam.sourceforge.net). Moreover I do not want to use sophisticated algorithms (in order to be able to compute plain text random distributions that ensure that the compressed distributions will be uniform, which is very difficult with for e.g zlib; in particular having any kind of "meta-data", "signatures" or "dictionnary" is a no-go for me). See details at the end of this post.


Point taken


...

A while ago I started working on a proof of concept kind of thing, that was
a network block device server that compressed the data sent to it.

Would it be possible for you to point me to the relevant material ?



I just need to tidy it up a little :)

Maybe I can publish it tomorrow or something like that.


....

2 - The compression layer should report a large block size upwards, and use
a little block size downwards, so that compression is as efficient as
possible. Good results are obtained with a 32kB / 512 byte ratio. This can
cause extra read-modify-write cycles upwards.

I failed to understand; could you provide me with more details please ?



If we are to compress on a block basis, the bigger the block the higher the compression ratio we'll be able to achieve (using zlib, for instance). However, data sent to the actual block device will have to go in blocks itself.

For instance, if we compress a 32kB block and it only needs 8980 bytes to be stored, we need 18 512byte blocks to store it. On average, we will lose 1/2 of the actual block size per "upper level" block size bytes of data. In a 32kB/512 byte ratio, we would lose on average 256 bytes per 32kb of data ~ 0.8% (which is more than acceptable).


...
As I said earlier I my point is definetely not to gain space, but to grow the "per-bit entropy". I really want to encode my data even if this grows its length, as is done in http://jsam.sourceforge.net . My final goal is the following: for each plain block first draw a chunk of random bytes, and then compresse both the random bytes followed by the plain data with a dynamic huffman encoding. The random bytes are _not_ drawn uniformly, but rather so that the distribution on huffman trees (and thus on encodings) is uniform. This ensures (?) that an attacker really has not other solution to decipher the data than brute-force: each and every key is possible, and more precisely, each and every key is equi-probable.


Ok, we are definitely fighting different wars here.

Anyway, I'll try to gather what I did with the network block device server and place it somewhere where you can look at it. It will probably help you do some tests, too. Because it is a block device in "user space", it is much simpler to develop and test different approaches, and gather some results, before trying things inside the kernel.

If you want to start now, just go to:

http://nbd.sourceforge.net/

and download the source for the network block device server. My server is probably more complex than the original, because of all the metadata handling.

I hope this helps,

--
Paulo Marques - www.grupopie.com
"In a world without walls and fences who needs windows and gates?"

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/