Re: [RFC] security: smack: add hash table for smack for quick labelsearching

From: Casey Schaufler
Date: Fri Apr 12 2013 - 14:07:27 EST


On 4/12/2013 8:12 AM, Åukasz Stelmach wrote:
> It was <2013-04-11 czw 19:59>, when Casey Schaufler wrote:
>> On 4/11/2013 1:46 AM, Tomasz Stanislawski wrote:
>>> Hi everyone,
>>> I am a developer working on optimization of the TIZEN system.
>>> Recently, I've discovered a performance issue in SMACK subsystem.
>>> I used the PERF tool to find performance bottlenecks.
>>>
>>> The test scenario was simple. Run multiple applications and
>>> see what the system does using the following command:
>>>
>>> perf record -a -g
>>>
>>> Next, see the results with the command:
>>>
>>> perf report -s symbol -g graph,0.5
>>>
>>> Among the many lines, the following ones are especially interesting:
>>>
>>> 5.96% [k] smk_find_entry
>>> |
>>> |--5.06%-- smk_access
>>> | |
>>> | --4.99%-- smk_curacc
>>> | |
>>> | |--3.79%-- smack_ptrace_access_check
> [...]
>
>>> To sum up, the result indicates that the CPU spents circa 8% (2.16% + 5.96%)
>>> of cycles searching for a SMACK label in the smk_find_entry function.
>>> The function iterates through smack_known_list to find an entry.
>>> The further analysis showed that the size of the list can reach even 600.
>>> I measured that it takes circa 200 tries to find an entry on average.
>>> The value was computed as a total number iterations in the smk_find_entry's
>>> loop divided by the times smk_find_entry was called in a time-window of
>>> the length of 10 seconds.
>>>
>>> IMO, this is a serious performance issue which scales badly with
>>> a complexity of the system.
>>>
>>> I implemented a patch that makes a use of a hash table to quicken searching
>>> for SMACK's labels. The patch is rebased onto the latest v3.9-rc6 kernel.
>>> The code is thread-safe (I hope) because it shares the RCU mechanism
>>> and locks with smack_known_list.
> [...]
>
>>> I hope you find the measurement and the patch useful.
>>> All comments are welcome.
>> NAK
>>
>> There will be no hash tables in Smack.
>>
>> The correct solution is simple.
>>
>> In the task_smack structure there are two Smack label pointers,
>> smk_task and smk_forked. Replace these fields with pointers to
>> the smack_known structures that contain the Smack label pointers
>> used today. This will require trivial changes throughout the
>> Smack code to accommodate the type change and a few logical twists
>> around smk_import. It will eliminate the need for smk_lookup_entry.
> Allow me to join the conversation with an interesting observation.
> I don't know whether hash table is a good solution to the problem Tomasz
> has mentioned it definitely improves performance during loading rules.

Loading rules is a rare event and an uninteresting case.

> Conditions:
>
> --8<---------------cut here---------------start------------->8---
> # number of subjects
> sh-4.1# cat /etc/smack/accesses.d/* | cut -d\ -f1 | sort -u | wc -l
> 379
> # number of rules
> sh-4.1# cat /etc/smack/accesses.d/* | wc -l
> 23895
> --8<---------------cut here---------------end--------------->8---
>
> Without the patch
>
> --8<---------------cut here---------------start------------->8---
> sh-4.1# time smackctl apply
>
> real 0m1.255s
> user 0m0.115s
> sys 0m0.895s
> --8<---------------cut here---------------end--------------->8---
>
> perfs output is:
>
> --8<---------------cut here---------------start------------->8---
> # ========
> # captured on: Tue Jan 1 09:52:14 2013
> # os release : 3.4.5
> # arch : armv7l
> # nrcpus online : 4
> # nrcpus avail : 4
> # cpudesc : ARMv7 Processor rev 0 (v7l)
> # total memory : 1025456 kB
> # cmdline : /perf record -a -g -- /usr/bin/smackctl apply
> # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 45, 46, 47, 48 }
> # ========
> #
> # Events: 1K cycles
> #
> # Overhead Symbol
> # ........ ..................................
> #
> 47.49% [k] strcmp
> |
> |--41.27%-- smk_find_entry
> | |
> | |--28.87%-- smk_import_entry
> | | smk_import
> | | smk_fill_rule
> | | smk_parse_long_rule
> | | smk_write_rules_list.clone.3
> | | smk_write_load2
> | | vfs_write
> | | sys_write
> | | ret_fast_syscall
> | |
> | --12.41%-- smk_write_rules_list.clone.3
> | smk_write_load2
> | vfs_write
> | sys_write
> | ret_fast_syscall
> |
> |--4.41%-- smk_import_entry
> | smk_import
> | smk_fill_rule
> | smk_parse_long_rule
> | smk_write_rules_list.clone.3
> | smk_write_load2
> | vfs_write
> | sys_write
> | ret_fast_syscall
> |
> --1.80%-- smk_write_rules_list.clone.3
> smk_write_load2
> vfs_write
> sys_write
> ret_fast_syscall
> 19.21% [k] smk_find_entry
> |
> |--12.44%-- smk_import_entry
> | smk_import
> | smk_fill_rule
> | smk_parse_long_rule
> | smk_write_rules_list.clone.3
> | smk_write_load2
> | vfs_write
> | sys_write
> | ret_fast_syscall
> |
> --6.78%-- smk_write_rules_list.clone.3
> smk_write_load2
> vfs_write
> sys_write
> ret_fast_syscall
>
> [ less than 10% per entry ]
>
> --8<---------------cut here---------------end--------------->8---
>
> With the patch
>
> --8<---------------cut here---------------start------------->8---
> sh-4.1# time smackctl apply
>
> real 0m0.349s
> user 0m0.095s
> sys 0m0.250s
> --8<---------------cut here---------------end--------------->8---
>
> still quite long but 3.59598853868 times better.
>
> --8<---------------cut here---------------start------------->8---
> # ========
> # captured on: Tue Jan 1 10:40:45 2013
> # os release : 3.4.5
> # arch : armv7l
> # nrcpus online : 4
> # nrcpus avail : 4
> # cpudesc : ARMv7 Processor rev 0 (v7l)
> # total memory : 1025544 kB
> # cmdline : /perf record -a -g -- smackctl apply
> # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4 }
> # ========
> #
> # Events: 606 cycles
> #
> # Overhead Symbol
> # ........ .....................................
> #
> 12.90% [k] smk_write_rules_list.isra.9
> |
> --- smk_write_load2
> vfs_write
> sys_write
> ret_fast_syscall
> 7.86% [k] exynos_enter_idle
> |
> --- cpuidle_enter
> cpuidle_enter_state
> cpuidle_idle_call
> cpu_idle
> |
> --7.69%-- secondary_start_kernel
> 0x4066df54
> 6.95% [k] v7_dma_inv_range
> |
> |--3.70%-- __dma_page_cpu_to_dev
> | arm_dma_map_page
> | arm_dma_map_sg
> | dw_mci_pre_dma_transfer
> | dw_mci_pre_req
> | mmc_pre_req
> | mmc_start_req
> | mmc_blk_issue_rw_rq
> | mmc_blk_issue_rq
> | mmc_queue_thread
> | kthread
> | do_exit
> |
> --3.25%-- __dma_page_dev_to_cpu
> arm_dma_unmap_page
> arm_dma_unmap_sg
> dw_mci_post_req
> mmc_post_req
> mmc_start_req
> mmc_blk_issue_rw_rq
> mmc_blk_issue_rq
> mmc_queue_thread
> kthread
> do_exit
> 3.56% [k] _raw_spin_unlock_irq
> |
> |--1.40%-- mmc_blk_issue_rw_rq
> | mmc_blk_issue_rq
> | mmc_queue_thread
> | kthread
> | do_exit
> |
> --1.09%-- finish_task_switch
> __schedule
> |
> --0.75%-- schedule
>
>
> 3.45% [.] 0x0000118c
> 2.94% [.] strncpy
> 2.67% [.] vfprintf
> 2.44% [k] strcmp
> |
> --- smk_find_entry
> smk_import_entry
> smk_import
> smk_fill_rule
> smk_parse_long_rule
> smk_write_rules_list.isra.9
> smk_write_load2
> vfs_write
> sys_write
> ret_fast_syscall
>
> [ less than 2.5% per entry ]
> --8<---------------cut here---------------end--------------->8---
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/