[...]
* Pekka J Enberg <penberg@xxxxxxxxxxxxxx> wrote:
> Hi Ingo,
>
> On Mon, 12 Jun 2006, Ingo Molnar wrote:
> > i dont know - i feel uneasy about the 'any pointer' method - it has a
> > high potential for false negatives, especially for structures that
> > contain strings (or other random data), etc.
>
> Is that a problem in practice? Structures that contain data are
> usually allocated from the slab. There needs to be a link to that
> struct from the gc roots to get a false negative. Or am I missing
> something here?
you should think of this in terms of a 'graph of data', where each node
is a block of memory. The edges between nodes are represented by
pointers. The graph roots from .data/bss, but it may go indefinitely
into dynamically allocated blocks as well - just think of a hash-list
where the hash list table is in .data, but all the chain entries are in
allocated blocks and the chaining can be arbitrarily deep.
Currently kmemleak does not track the per-block position of 'outgoing
pointers': it assumes that all fields within a block may be an outgoing
pointer. This is a source of false negatives. (fields that do not
contain a real pointer might accidentally contain a value that is
interpreted as a false edge - falsely connecting a leaked block to the
graph.)
Kmemleak does recognize 'incoming pointers' via the offsetof tracking
method, but it's limited in that it is not a type-accurate method
either: it tracks per-size offsets, so two types accidentally having the
same size merges their 'possible incoming pointer offset' lists, which
introduces false negatives. (a pointer may be considered an incoming
edge while in reality the pointer is not validly pointing into this
structure)
The full matching that was suggested before would further weaken the
'incoming pointers' logic and would introduce yet another source of
false negatives: we'd match every block pointer against every possible
target address that points to within another block.
My suggestion would be to attempt to achieve perfect matches: annotate
structures to figure out the offset of pointers, and thus to figure out
the precise source addresses and a precise list of valid target
addresses. This is a quite elaborate task to pull off though, and i'm
not sure it's possible without intolerable maintainance overhead, but we
should consider it nevertheless. It will also be _much_ faster, because
per block we'd only have to scan a handful of outgoing pointers.
This also means that by default we'd have no false positives at all,
but
that there is a capable annotation method to reduce the amount of false
negatives, in a gradual and managable way - down to zero if everything
is annotated.