Re: linux-kernel EROS discussion

shapj@us.ibm.com
Mon, 28 Jun 1999 15:55:12 -0400


Since the question is generic, I want to answer this publicly, but since it was
sent only to me I'm stripping the name of the person who asked.

>- about Iain McClatchie's memoryful computation: I can see why this
>uses disk bandwidth and wastes CPU time doing it; it writes the whole
>memory state every checkpoint, when that isn't really necessary.

It's necessary to write that memory if you want a recoverable computation. As I
understood it, this wasn't really Iain's objection. His objection was to the
overhead of saving this state, which is ~3%.

>I can
>see that making the working set just a little bit larger ends up
>requiring some swapping, in proportion to how often you page-fault
>(which is likely to be the fraction that the paged-out data is of the
>working set). But I don't see how that makes the issues quite
>different; I just see that it makes them just a little bit different.
>I also don't see how making the working set 75% or 50% of main memory
>makes the issues quite different, either. What am I missing?

I'll answer these in reverse order.

50% of main memory is a magic number, because this is the size at which the
entire computation can be efficiently cloned using in-memory copy on write
techniques. You take the snapshot, mark the active state copy on write, and
then allow the computation to resume by doing an in-memory copy on write of each
data page as the application touches it. EROS already uses in-memory copy on
write where possible, so if you had more memory it would pretty much address the
case Iain was concerned about, which is where the application just barely fit in
memory, but was not otherwise swapping or paging. If, that is, you really care
that much about 3% overhead in this one special case.

The question then becomes "How much marginal memory do I need? Would an extra
33% rather than an extra 100% suffice?" (33% is the case where your application
fills 75% of memory: 25%/75%=1/3=33%).

75% would probably be a reasonable proportion of memory because the I/O's for
the page writes to clean memory can be batched and handled sequentially, so in
practice you wouldn't fault on every dirty page. By the time you hit all the
pages, you'ld find that a bunch of them had already been cleaned for you. How
many, and what impact this actually has in practice, would need to be measured
on an application by application basis.

A small change in fit in the *other* direction (i.e. when the app gets too big
for main memory) is completely different.

Iain's application has an essentially random access pattern. Think about what
happens in a UNIX paging system when the application goes over the memory size.
Let M be the number of memory pages and A be the number of application pages.
The likelihood that a given reference will induce a page fault is roughly
(A-M)/M. This likelihood is stable because for every page that comes in, a
dirty page has to go out. The performance impact of this is *much* worse than
checkpointing, because it's a continuous hit, not a periodic blip in behavior.

Ironically, if your app behaves this way, on EROS it will end up forcing it's
own checkpoint to occur as a side-effect of the paging activity, and the
overhead of the checkpoint "magically" disappears. The same write that pages
out the dirty page that the incoming page replaces checkpoints the page you
kicked out, so there is no marginal I/O delay induced by the checkpoint itself.

So if we were to graph the "overhead" of checkpointing as a function of the
amount of dirty memory your app generates relative to system memory, we'ld see a
very strange graph. Roughly 0% overhead up to the "50% of memory" marker,
rising to something in the range of 0.5% overhead at the "75% of memory" point,
rising to a max of 3% when your app just perfectly fills memory, and then
dropping to very near 0% overhead shortly after your app size exceeds the size
of memory. Counterintuitive as hell, which is why a lot of people don't grok
checkpointing.

>- about MAC: why are MAC provably impossible to enforce? Are you
>talking about covert channels? I believe they can be closed
>completely, albeit at a heavy cost.

I probably misspoke.

First, it has now been proven that covert channels cannot be detected. It
follows that they cannot be completely closed. Think steganography.

Second, the access issue is not MAC in general, but principal-based MAC.
Policies of the form "Fred can communicate with Mary, Fred is authorized to see
X, but Mary must not see X" are unenforcable. If Fred and Mary can communicate,
Fred can proxy. Since Fred and Mary are real people who talk in the hall, Fred
and Mary can always communicate. Policies of the form "Principal X must not
perform operation Y on this data" may be useful as statements of social
contract, but they cannot be enforced without the cooperation of all parties.

Incorporating into a design the ability to specify a security policy that cannot
be enforced leads users to mismodel their exposures, which is deeply
counterproductive.

Jonathan S. Shapiro, Ph. D.
IBM T.J. Watson Research Center
Email: shapj@us.ibm.com
Phone: +1 914 784 7085 (Tieline: 863)
Fax: +1 914 784 7595

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/