RE: [patch v1, kernel version 3.2.1] rtnetlink workaround aroundthe skb buff size issue

From: Rose, Gregory V
Date: Mon Feb 06 2012 - 12:29:13 EST


> -----Original Message-----
> From: steweg@xxxxxxxxx [mailto:steweg@xxxxxxxxx] On Behalf Of Štefan Gula
> Sent: Monday, February 06, 2012 12:53 AM
> To: Rose, Gregory V
> Cc: David Miller; linux-kernel@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx
> Subject: Re: [patch v1, kernel version 3.2.1] rtnetlink workaround around
> the skb buff size issue
>
> 2012/2/6 Rose, Gregory V <gregory.v.rose@xxxxxxxxx>:
> >
> > The patch below is what I've got so far.  Right now the bit mask array
> is global so if you enable display of VF (n) on one interface it will
> enable display of the same VF on other interfaces.  I intend to move the
> bit mask array into the net_device structure so we can set the display
> mask for each interface independently.
> >
> > The command to set the filter mask is "set only", I see no reason to add
> it to the info dump.  If other folks see it differently then I can do that
> too.
> >
> > Anyway, it will allow the user to control which VFs are getting
> displayed during the info dump.  They all default to off so initially no
> VF info gets displayed.
> >
> > I've also whipped up a patch for the iproute2 ip command.  It'll work
> like this:
> >
> > 'ip link set <dev> vf (n) filter [on|off]'
> >
> > So if you have 128 VFs on the device you could enable info dumps for
> arbitrary VFs, e.g. VFs 3, 9, 16, 21, and 31.  Only the info for those VFs
> would display.  This method has the advantage of not breaking scripts
> which parse the current VF info display.  Of course, one could also script
> up something to sequentially enable the display of a single VF, dump the
> info for it, and then move on to the next.
> As this patch will allow one to filter some information and possible
> lower the need on skb buffer size, the general idea is ok. On the
> other hand it will not eliminate the problem. e.g.:
> - assume that one didn't know the limits behind it and put all options
> enabled
> - it also doesn't fix the need to fill relevant info by interface
> bigger than buffer size, e.g. my macvlan interface mac address list.
> If I try to request for it, it will eventually fail with a lot of
> records even with filtering...
>
> So I would rather see a proper general method for requesting some
> information in cycles inside the single interface like sending request
> to kernel per VF for particular device or per MAC address from macvlan
> associated lists. This approach is I believe slightly more scalable as
> it can be potentially reused on other types of network devices as
> well.
>
> My original idea was to have these methods:
> 1st kernel method will return some info about the absolute number of
> cycles needed per given interface - this can be done in standard
> GETLINK operation with some associated IFLA_* value.
> 2nd user-space method will based on that number sends netlink request
> per records or reasonable page of records (e.g. 10) and parse the
> output in the user-space.
> - this is needed to overcome another issue when kernel generates so
> many netlink messages with NLM_F_MULTI that netlink socket will not be
> able to hold and further write/read code will fail.
> 3rd kernel method, which will allocated, fill, and send required info
> per record -> this one can be done by ops/command netlink association
> (in my proposal it is DEVDUMP)
>
> To sum up, I believe that both approaches (using cycles and filtering)
> should be allowed to coexists in kernel, but they should be considered
> separately as they are doing different jobs.

Stefan,

That is exactly my approach. We currently have a *bug* in the kernel that this patch is addressing. The kernel is attempting to provide too much information for the netlink interface to handle and it's breaking things. So what I want to do is fix the immediate problem while still providing a way for folks to get the information they need. I've accomplished this by doing exactly what Dave asked me to do, provide a filter that defaults to off and then provide a way for the user to request discrete chunks of information in the dump that won't exceed the netlink buffer limits.

The patch is fairly unobtrusive and simple to understand.

I appreciate that it doesn't do all that you'd like to see done and I see no reason why you couldn't go on and develop the extended features that you would like to see, correct? There's nothing in my patch that would prevent that so far as I can tell, although I'm not that familiar with your requirements or proposals yet.

- Greg

>
> I can provide whole example of code (not just rtnetlink part), if the
> list is interested to see live example.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/