Re: Net device byte statistics

From: jw schultz (jw@pegasys.ws)
Date: Fri Jul 25 2003 - 16:55:48 EST


On Fri, Jul 25, 2003 at 10:58:18AM -0700, Randy.Dunlap wrote:
> On Fri, 25 Jul 2003 13:55:14 -0400 Jeff Sipek <jeffpc@optonline.net> wrote:
>
> | -----BEGIN PGP SIGNED MESSAGE-----
> | Hash: SHA1
> |
> | On Friday 25 July 2003 13:20, Randy.Dunlap wrote:
> | > Yes, a common solution for this is to use some SNMP agent that does
> | > 64-bit counter accumulation.
> |
> | Interesting...I haven't thought of SNMP.
> |
> | > IETF expects that some high-speed interfaces will have 64-bit
> | > counters. From RFC 2233 (Interfaces Group MIB using SMIv2):
> | >
> | > <quote>
> | > For interfaces that operate at 20,000,000 (20 million) bits per
> | > second or less, 32-bit byte and packet counters MUST be used.
> | > For interfaces that operate faster than 20,000,000 bits/second,
> | > and slower than 650,000,000 bits/second, 32-bit packet counters
> | > MUST be used and 64-bit octet counters MUST be used. For
> | > interfaces that operate at 650,000,000 bits/second or faster,
> | > 64-bit packet counters AND 64-bit octet counters MUST be used.
> | > </quote>
> |
> | It is just easier to have everything 64-bits.
>
> I think the counterpoint is that if it were easy & safe, it would
> already be in the kernel.
>
> | > However, this is a MIB spec. It does not require a Linux
> | > (/proc) interface to support 64-bit counters.
> |
> | Agreed, however if we are going to change some counters, we should do it for
> | all of them. (Btw, /proc is not the only point where users can get stats....
> | there is also /sys and something else...I can't remember now...)
>
> Right, I was just saying that the kernel interface doesn't have
> to support 64-bit counters in lots of cases. That can often be
> done in userspace.

I've been watching this discussion for several months. If i
may, let me summarise what i see as the salient points.

        1. Uptime is such that many 32bit counters wrap.

        2. Userspace can easily detect wrapping when
        measuring deltas. Provided it only wraps once.

        3. Some counters can wrap at intervals so small that
        userspace cannot accurately detect the wrap without
        the monitoring tool becoming a significant system
        load.

        4. 64bit counters would be sufficient. At least for
        most of these counters.

        6. Without atomicity the counters will have windows
        where they report garbage. And if the code paths
        writing the counter aren't otherwise protected they
        can likewise corrupt the counter.

        5. The locking overhead needed for atomicity of
        64bit counters on 32bit architectures is excessive
        for fast-paths.

It seems to me that what is needed is a in-kernel component
that can intermediate between internal 32bit counters and
userspace-visible 64bit (or larger) counters. This
component would need to be active often enough that the
counters don't wrap without detection and so that userspace
will see sufficiently accurate numbers.

My thought would be to use 96bits for each counter. In-kernel
code would run periodically doing something like this:

        curval = counter.in_kernel;
                        /* get it in a register for atomicity */
        if (counter.user_low < curval)
                ++counter.user_high;
        counter.user_low = curval;

This code would run every N jiffies or be in a high priority
kernel thread. As an in-kernel service it could loop over a
set of counters that have been registered with it. If
needed you could even have user_high be larger than 32 bits.

It could even be possible to make the code accessing the
userspace counter fall-back to the kernel one if the 64bit
counter is zero. That way registration could potentially be
userspace triggered.

This is just the acorn of an idea. It does mean that
userspace visible counters will not have instantaneous
resolution but it seems to me that HZ should be more than
tight enough. There are certainly other ways to achieve
this and implementation should take into account cache
effects.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

Remember Cernan and Schmitt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jul 31 2003 - 22:00:27 EST