On 28/03/2024 17:20, josh lant wrote:
On Wed, 27 Mar 2024 at 08:19, Kevin Brodsky kevin.brodsky@arm.com wrote:
On 26/03/2024 15:50, Joshua Lant wrote: > There is an alignment issue at the user/kernel boundary in xtables with > capabilities, encountered in macro XT_ALIGN, in the function > xt_check_target (with message size of (kernel) and (user) not matching). > This bug occured when running certain iptables commands in the wireguard test > script netns.sh. e.g. > iptables -t nat -A POSTROUTING -s 192.168.1.0/24 <http://192.168.1.0/24> -d 10.0.0.0/24 <http://10.0.0.0/24> -j SNAT > --to 10.0.0.1 > > Signed-off-by: Joshua Lant joshualant@gmail.com > --- > include/uapi/linux/netfilter/x_tables.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/uapi/linux/netfilter/x_tables.h b/include/uapi/linux/netfilter/x_tables.h > index 796af83a963a..c53b46118531 100644 > --- a/include/uapi/linux/netfilter/x_tables.h > +++ b/include/uapi/linux/netfilter/x_tables.h > @@ -95,6 +95,7 @@ struct _xt_align { > __u16 u16; > __u32 u32; > __u64 u64; > + __uintcap_t ucap; I can see how this sort of change might be necessary - if any of the netfilter structs contains user pointers. However I did not manage to find such a struct, either in those mentioned in the comment above (e.g. ipt_entry) or others under include/uapi/linux/netfilter. Do you have an idea which struct causes the issue? The error message printed by xt_check_target() should be helpful. On a separate note, we cannot use __uintcap_t unconditionally in uapi headers - they need to be fully backwards-compatible. I think using void __user * would work just fine here (we want to ensure user pointer alignment regardless of the ABI, and this achieves exactly that). Kevin > }; > > #define XT_ALIGN(s) __ALIGN_KERNEL((s), __alignof__(struct _xt_align))
Hi Kevin,
Thanks for reviewing the patches. Sorry for the long post here…
On the contrary, thank you for providing those details, more is better than less!
So the bug presents itself in iptables using this function which contains a sendmsg syscall:
mnl_nft_socket_sendmsg()
https://git.netfilter.org/iptables/tree/iptables/nft.c#n193
Forgive my ignorance, but the use of the __user annotation is a little unclear to me… Is it generally regarded as a requirement on all pointers passed between userspace and kernel space, or is it an optional way to provide additional safety to stop developers accidentally dereferencing user pointers?
In mainline, __user is an annotation that can be used by tools such as sparse, but has otherwise no effect. In Morello Linux, when PCuABI is selected, it is actually used to turn all user pointers into capabilities, and it is therefore essential to use it for all user pointers. You may find it useful to have a look at the PCuABI overview in the tree [1].
Is this issue related to the fact that the sendmsg syscall is not supported in the transitional purecap ABI? And so could present itself in other places? I.e. my patch fixes my specific problem, but not necessarily all cases that someone may encounter around sendmsg? If this is the case then what sort of work would be required? Is adding support for this a task that I could reasonably undertake?
The transitional ABI is not really relevant any more, as we are now able to test most syscalls and are not aware of regressions in PCuABI (which of course doesn't meant there isn't any!). The documentation does need updating, this is something that is on my TODO list. Like the other syscalls, sendmsg() is expected to work as normal, though our testing is by no means exhaustive and you may well have found a defect here.
Here is the backtrace from the kernel at the point where the error message is printed, along with the message:
[ 218.297499] Call trace: [ 218.299931] dump_backtrace+0xec/0x140 [ 218.303671] show_stack+0x1c/0x30 [ 218.306972] dump_stack_lvl+0x5c/0x78 [ 218.310623] dump_stack+0x14/0x20 [ 218.313925] xt_check_target+0x1dc/0x270 [ 218.317848] nft_target_init+0x1f8/0x2c0 [ 218.321759] nf_tables_newrule+0x6d0/0x920 [ 218.325849] nfnetlink_rcv+0x4b8/0x820 [ 218.329586] netlink_unicast_kernel+0xc4/0x190 [ 218.334017] netlink_unicast+0x100/0x1b0 [ 218.337927] netlink_sendmsg+0x2c4/0x3b8 [ 218.341837] ____sys_sendmsg+0x1a0/0x288 [ 218.345747] __sys_sendmsg+0x154/0x1c8 [ 218.349506] __arm64_sys_sendmsg+0x28/0x38 [ 218.353589] el0_svc_common+0xd8/0x140 [ 218.357326] do_el0_svc+0x4c/0xb8 [ 218.360629] el0_svc+0x24/0x58 [ 218.363670] el0t_64_sync_handler+0x7c/0xe8 [ 218.367839] el0t_64_sync+0x1c0/0x1c8 [ 218.371505] x_tables: ip_tables: SNAT.0 target: invalid size 24 (kernel) != (user) 32
Luckily this error allows us to find the relevant struct easily. "SNAT.0" shows that the relevant struct xt_target is this one [2]. From there we find that the corresponding uapi struct is nf_nat_ipv4_multi_range_compat [3]. Its size is 20 regardless of the ABI (only 16- and 32-bit integers), so its 8-byte aligned size is 24, as printed.
My suspicion is that the issue lies somewhere in iptables rather than in the kernel itself, considering that this struct is clearly fixed-size. It could be that the layout of the message/header that iptables uses doesn't match what the kernel expects in purecap, and as a result the size the kernel reads is not what iptables set. However, I know essentially nothing about iptables or netfilter, so I'm afraid I won't be of much help to investigate this further...
[1] https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/D... [2] https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/n... [3] https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/i... [4] https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/a...
It is unclear to me at this point what the actual message is that’s being sent (i.e. what is causing the size discrepancy). The user size is determined in nft_target_init(), by use of the nla_len() function. The len is determined from the total size-header size, so it is the payload I guess that contains an offending capability? However, I have not been able to dig enough yet to find the payload that is sent.
https://elixir.bootlin.com/linux/v6.4/source/net/netfilter/nft_compat.c#L236
(As a complete aside… In examining the backtrace, I see that this “__arm64_sys_sendmsg” function is not referenced anywhere when i grep the kernel… I don’t really understand where that has come from?)
The syscall wrapper machinery does a lot of magic indeed! [4]
In regards to: “On a separate note, we cannot use __uintcap_t unconditionally in uapi headers ”...
Apologies, I had originally made this conditional using #ifdef CONFIG_CHERI_PURECAP_UABI, but the compiler complained about “leak CONFIG_CHERI_PURECAP_UABI to user-space” so i did not include it…
Indeed, CONFIG macros cannot be used in uapi headers, because userspace cannot know their value when building against the headers.
I have just tried again with #ifdef __CHERI_PURE_CAPABILITY__ and it compiles just fine.
We cannot use that either, because the kernel is not purecap itself. That's why typing is so tricky with PCuABI in uapi headers :/
Is this preferable with using __unitcap_t? Or do you mean that by using void __user * instead it would not be required to be conditional at all?
void __user * works unconditionally. From a kernel perspective, it is always a user pointer (capability in PCuABI). From a user perspective, __user is defined to nothing, so it is a normal pointer (also a capability in purecap).
Hope that clarifies the situation a little!
Kevin
Cheers,
Josh