Re: [PATCH] lightnvm: pblk: Introduce hot-cold data separation

From: Igor Konopko
Date: Fri Apr 26 2019 - 09:46:44 EST




On 26.04.2019 12:04, Javier González wrote:

On 26 Apr 2019, at 11.11, Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote:

On 25.04.2019 07:21, Heiner Litz wrote:
Introduce the capability to manage multiple open lines. Maintain one line
for user writes (hot) and a second line for gc writes (cold). As user and
gc writes still utilize a shared ring buffer, in rare cases a multi-sector
write will contain both gc and user data. This is acceptable, as on a
tested SSD with minimum write size of 64KB, less than 1% of all writes
contain both hot and cold sectors.

Hi Heiner

Generally I really like this changes, I was thinking about sth similar since a while, so it is very good to see that patch.

I have a one question related to this patch, since it is not very clear for me - how you ensure the data integrity in following scenarios:
-we have open line X for user data and line Y for GC
-GC writes LBA=N to line Y
-user writes LBA=N to line X
-we have power failure when both line X and Y were not written completely
-during pblk creation we are executing OOB metadata recovery
And here is the question, how we distinguish whether LBA=N from line Y or LBA=N from line X is the valid one?
Line X and Y might have seq_id either descending or ascending - this would create two possible scenarios too.

Thanks
Igor


You are right, I think this is possible in the current implementation.

We need an extra constrain so that we only GC lines above the GC line
ID. This way, when we order lines on recovery, we can guarantee
consistency. This means potentially that we would need several open
lines for GC to avoid padding in case this constrain forces to choose a
line with an ID higher than the GC line ID.

What do you think?

I'm not sure yet about your approach, I need to think and analyze this a little more.

I also believe that probably we need to ensure that current user data line seq_id is always above the current GC line seq_id or sth like that. We cannot also then GC any data from the lines which are still open, but I believe that this is a case even right now.


Thanks,
Javier