Re: [PATCH] binder: fix memory corruption in binder_transaction binder

From: Todd Kjos
Date: Mon Sep 11 2017 - 15:59:22 EST


Amit,

I tested with https://android-git.linaro.org/kernel/linaro-android.git/log/?h=test/hikey-llct.
I added a pr_info() above the patch's single line change and in
binder_init (so I could easily prove that I was running the correct
kernel).

First I did 10 reboots with the patch. I saw one failure to reach the
Android home screen in boot #7 (but the new line of code was never
reached, so the patch cannot be the cause)... so 9 out of 10 reboots
were fine and the failure does not point to this patch.

Then I did 10 reboots without the patch. No failures.

Then 10 more with the patch. No failures.

Then with the patch: power-on, reboot twice, no failures (repeat, no failures).

I think the issue you are seeing cannot be caused by this patch --
take a look at it and see if you think its really possible...

-Todd

On Mon, Sep 11, 2017 at 9:55 AM, Amit Pundir <amit.pundir@xxxxxxxxxx> wrote:
> Hi Todd,
>
> On 11 September 2017 at 21:10, Todd Kjos <tkjos@xxxxxxxxxx> wrote:
>> (resend in plain-text mode -- sorry about that)
>>
>> Amit,
>>
>> Are you sure this patch is the culprit? That is pretty surprising
>> since this change can only be hit in a uncommon case (the target node
>> is valid when we start creating the transaction, but dead when we
>> check right before sending it) so it is unlikely to be hit during a
>> normal boot. It also fixes a corruption -- so if you were actually
>> hitting the case, it would likely have caused issues before and not
>> now. Take a look at it and see if you think it is really possible.
>>
>> I just booted hikey to Android with this patch 10 times in a row with
>> no issues (used hikey-linaro 4.9 kernel which has this patch).
>
> Sorry for not being clear enough in the bug report. android-4.9 is
> fine, I see this issue on linux mainline tree with this patch.
>
> I can reproduce it on John's minimal Android tree for hikey hosted
> here https://git.linaro.org/people/john.stultz/android-dev.git/log/?h=dev/hikey-mainline-WIP
> and hikey-llct (android-4.9 patchset rebased to mainline) tree hosted
> here https://android-git.linaro.org/kernel/linaro-android.git/log/?h=test/hikey-llct.
> I have already reverted this patch in hikey-llct so you have to revert
> that revert to reproduce this issue on hikey-llct tree.
>
> Regards,
> Amit Pundir
>
>>
>> -Todd
>>
>>> On Mon, Sep 11, 2017 at 5:18 AM, Amit Pundir <amit.pundir@xxxxxxxxxx> wrote:
>>>>
>>>> On 5 September 2017 at 22:51, Todd Kjos <tkjos@xxxxxxxxxxx> wrote:
>>>> > From: Xu YiPing <xuyiping@xxxxxxxxxxxxx>
>>>> >
>>>> > commit 7a4408c6bd3e ("binder: make sure accesses to proc/thread are
>>>> > safe") made a change to enqueue tcomplete to thread->todo before
>>>> > enqueuing the transaction. However, in err_dead_proc_or_thread case,
>>>> > the tcomplete is directly freed, without dequeued. It may cause the
>>>> > thread->todo list to be corrupted.
>>>> >
>>>> > So, dequeue it before freeing.
>>>>
>>>> I see Android boot loops with this patch on hikey tracking
>>>> linux/master branch. 1st boot is fine but hikey runs into an
>>>> unexpected short boot loops on 2nd and successive boots.
>>>>
>>>> It takes about 3-4 iterations to finally come to sane state and boot
>>>> to UI. I don't see this behaviour if I revert this patch.
>>>>
>>>> Regards,
>>>> Amit Pundir
>>>>
>>>> >
>>>> > Signed-off-by: Xu YiPing <xuyiping@xxxxxxxxxxxxx>
>>>> > Signed-off-by: Todd Kjos <tkjos@xxxxxxxxxx>
>>>> > ---
>>>> > drivers/android/binder.c | 1 +
>>>> > 1 file changed, 1 insertion(+)
>>>> >
>>>> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
>>>> > index d055b3f2a207..96cc28afa383 100644
>>>> > --- a/drivers/android/binder.c
>>>> > +++ b/drivers/android/binder.c
>>>> > @@ -3083,6 +3083,7 @@ static void binder_transaction(struct binder_proc
>>>> > *proc,
>>>> > err_dead_proc_or_thread:
>>>> > return_error = BR_DEAD_REPLY;
>>>> > return_error_line = __LINE__;
>>>> > + binder_dequeue_work(proc, tcomplete);
>>>> > err_translate_failed:
>>>> > err_bad_object_type:
>>>> > err_bad_offset:
>>>> > --
>>>> > 2.14.1.581.gf28d330327-goog
>>>> >
>>>
>>>