Re: [rfc 5/7] fs, epoll: Add procfs fdinfo helper

From: Cyrill Gorcunov
Date: Thu Jul 19 2012 - 11:03:25 EST


On Thu, Jul 19, 2012 at 07:52:41AM -0700, Matthew Helsley wrote:
> On Wed, Jun 27, 2012 at 4:01 AM, Cyrill Gorcunov <gorcunov@xxxxxxxxxx> wrote:
> > This allow us to print out eventpoll target file descriptor,
> > events and data, the /proc/pid/fdinfo/fd consists of
> >
> > | pos: 0
> > | flags: 02
> > | tfd: 5 events: 1d data: ffffffffffffffff
> >
> > +#if defined(CONFIG_PROC_FS) && defined(CONFIG_CHECKPOINT_RESTORE)
> > +
> > +struct epitem_fdinfo {
> > + struct epoll_event ev;
> > + int fd;
> > +};
> > +
> > +static struct epitem_fdinfo *
> > +seq_lookup_fdinfo(struct proc_fdinfo_extra *extra, struct eventpoll *ep, loff_t num)
> > +{
> > + struct epitem_fdinfo *fdinfo = extra->priv;
> > + struct epitem *epi = NULL;
> > + struct rb_node *rbp;
> > +
> > + mutex_lock(&ep->mtx);
> > + for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
> > + if (num-- == 0) {
> > + epi = rb_entry(rbp, struct epitem, rbn);
> > + fdinfo->fd = epi->ffd.fd;
> > + fdinfo->ev = epi->event;
> > + break;
>
> This will be incredibly slow. epoll was designed to scale to tens of
> thousands of file descriptors. This algorithm is O(N^2) because each
> time we show a new epoll item we walk through the whole rb tree again
> (we're not doing a search so it isn't O(NlogN)).

Yeah, I know, it's quadratic. I'll be reworking this series to use
immediate seq-printf and print out the whole tree once the appropriate
fdinfo file get read.

> Also, we could miss one or more later items if one of the earlier
> items is removed from the epoll set in between "seq_lookup_fdinfo"
> calls. This isn't a problem for checkpoint because we assume the task
> (and everything with this eventpoll file in its fd table) is frozen.
> However it means the file will be worse than useless for almost any
> other purpose because they are unlikely to realize they need to freeze
> all the task(s) to get consistent data.

Well, a bunch of data read from proc is consistent only at moment of
reading.

Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/