Re: Problem with percpu values when bringing up second CPU?

From: Tejun Heo
Date: Tue Aug 04 2009 - 21:45:22 EST


Hello,

Jeremy Fitzhardinge wrote:
> I just tracked down a bug I was having to a change where I changed one
> of my Xen event channel variables to a percpu variable, relating to
> masking an event channel.
>
> The symptom was that shortly after bringing up the second CPU, the first
> CPU's timer events stopped arriving, apparently because they had become
> masked.

Hmmmm...

> The event channels masks are declared as:
>
> #define NR_EVENT_CHANNEL_LONGS (NR_EVENT_CHANNELS/BITS_PER_LONG)
> static DEFINE_PER_CPU(unsigned long,
> cpu_evtchn_mask[NR_EVENT_CHANNEL_LONGS]) =
> {[0 ... NR_EVENT_CHANNEL_LONGS-1] = ~0ul }; /* everything masked by default */
>
> My theory about what's happening is that when the second CPU comes up,
> it allocates separate percpu areas for each CPU, but it is somehow
> failing to accurately copy CPU 0's percpu data over; either it isn't
> copying it all (ie, using the initialized values rather than the current
> values), or failing to copy the values in an interrupt-atomic way.
>
> Does this sound plausible?

Percpu areas aren't setup when the first cpu comes up. They're
allocated and copied from the master copy during early init when only
the boot cpu is running.

> When I convert this back to an ad-hoc percpu variable (an array indexed
> by cpu number), it goes back to working. Also, if I boot with maxcpus=1
> it also works with percpu data.

Hmmm... strange. Can you try to print out the values along the boot
process and see when things go wrong?

> Also, because we don't have large pages under Xen, it always allocates
> percpu as 4k pages:
>
> PERCPU: Allocated 21 4k pages, static data 82080 bytes

I don't think the choice of first chunk allocator would cause any
difference.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/