Re: [RFC] memory tiering: use small chunk size and more tiers

From: Huang, Ying
Date: Wed Nov 02 2022 - 04:46:32 EST


Michal Hocko <mhocko@xxxxxxxx> writes:

> On Wed 02-11-22 16:28:08, Huang, Ying wrote:
>> Michal Hocko <mhocko@xxxxxxxx> writes:
>>
>> > On Wed 02-11-22 16:02:54, Huang, Ying wrote:
>> >> Michal Hocko <mhocko@xxxxxxxx> writes:
>> >>
>> >> > On Wed 02-11-22 08:39:49, Huang, Ying wrote:
>> >> >> Michal Hocko <mhocko@xxxxxxxx> writes:
>> >> >>
>> >> >> > On Mon 31-10-22 09:33:49, Huang, Ying wrote:
>> >> >> > [...]
>> >> >> >> In the upstream implementation, 4 tiers are possible below DRAM. That's
>> >> >> >> enough for now. But in the long run, it may be better to define more.
>> >> >> >> 100 possible tiers below DRAM may be too extreme.
>> >> >> >
>> >> >> > I am just curious. Is any configurations with more than couple of tiers
>> >> >> > even manageable? I mean applications have been struggling even with
>> >> >> > regular NUMA systems for years and vast majority of them is largerly
>> >> >> > NUMA unaware. How are they going to configure for a more complex system
>> >> >> > when a) there is no resource access control so whatever you aim for
>> >> >> > might not be available and b) in which situations there is going to be a
>> >> >> > demand only for subset of tears (GPU memory?) ?
>> >> >>
>> >> >> Sorry for confusing. I think that there are only several (less than 10)
>> >> >> tiers in a system in practice. Yes, here, I suggested to define 100 (10
>> >> >> in the later text) POSSIBLE tiers below DRAM. My intention isn't to
>> >> >> manage a system with tens memory tiers. Instead, my intention is to
>> >> >> avoid to put 2 memory types into one memory tier by accident via make
>> >> >> the abstract distance range of each memory tier as small as possible.
>> >> >> More possible memory tiers, smaller abstract distance range of each
>> >> >> memory tier.
>> >> >
>> >> > TBH I do not really understand how tweaking ranges helps anything.
>> >> > IIUC drivers are free to assign any abstract distance so they will clash
>> >> > without any higher level coordination.
>> >>
>> >> Yes. That's possible. Each memory tier corresponds to one abstract
>> >> distance range. The larger the range is, the higher the possibility of
>> >> clashing is. So I suggest to make the abstract distance range smaller
>> >> to reduce the possibility of clashing.
>> >
>> > I am sorry but I really do not understand how the size of the range
>> > actually addresses a fundamental issue that each driver simply picks
>> > what it wants. Is there any enumeration defining basic characteristic of
>> > each tier? How does a driver developer knows which tear to assign its
>> > driver to?
>>
>> The smaller range size will not guarantee anything. It just tries to
>> help the default behavior.
>>
>> The drivers are expected to assign the abstract distance based on the
>> memory latency/bandwidth, etc.
>
> Would it be possible/feasible to have a canonical way to calculate the
> abstract distance from these characteristics by the core kernel so that
> drivers do not even have fall into that trap?

Yes. That sounds a good idea. We can provide a function to map from
the memory latency/bandwidth to the abstract distance for the drivers.

Best Regards,
Huang, Ying