Hi,
This series of patches enables the use of the Wireguard VPN and all assocaited tools required for running wireguard-tools' test script. Wireguard's test script (netns.sh) runs to completion using purecap compiled:
wireguard-tools, iproute2, iputils (ping/ping6), iptables, nftables, libnftnl, libmnl, libelf, argp-standalone, musl-obstack, fts, libjansson.
Packages used in netns.sh currently not tested in purecap:
ncat, iperf3.
The bulk of the changes required are additions to the kernel config, with a fix for a bug found in iptables.
There is an alignment issue at the user/kernel boundary in xtables with capabilities, encountered in the macro XT_ALIGN, used in the function xt_check_target (with the resulting message indicating size of (kernel) and (user) not matching). This bug occurs when running certain iptables commands in the test script. e.g.
iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.0.0.0/24 -j SNAT --to 10.0.0.1
This is my first patch to the kernel so please forgive me if anything is drastically wrong. I have tried to follow the format of others on here...
Cheers,
Joshua Lant
Joshua Lant (2): morello: enable wireguard kernel config xtables: fix alignment issue
.../morello_transitional_pcuabi_defconfig | 23 +++++++++++++++++++ include/uapi/linux/netfilter/x_tables.h | 1 + 2 files changed, 24 insertions(+)
Add kernel modules to morello_transitional_pcuabi_defconfig required for wireguard-module, wireguard-tools and other userspace programs in the wireguard test script netns.sh.
Signed-off-by: Joshua Lant joshualant@gmail.com --- .../morello_transitional_pcuabi_defconfig | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+)
diff --git a/arch/arm64/configs/morello_transitional_pcuabi_defconfig b/arch/arm64/configs/morello_transitional_pcuabi_defconfig index 6291231b8d9c..eb5e32b4f0c6 100644 --- a/arch/arm64/configs/morello_transitional_pcuabi_defconfig +++ b/arch/arm64/configs/morello_transitional_pcuabi_defconfig @@ -162,3 +162,26 @@ CONFIG_DEBUG_FS=y CONFIG_FTRACE_SYSCALLS=y CONFIG_CORESIGHT=y CONFIG_MEMTEST=y +CONFIG_WIREGUARD=y +CONFIG_WIREGUARD_DEBUG=y +CONFIG_DUMMY=y +CONFIG_IP_ADVANCED_ROUTER=y +CONFIG_IP_MULTIPLE_TABLES=y +CONFIG_IP_NF_MANGLE=y +CONFIG_IPV6_MULTIPLE_TABLES=y +CONFIG_NF_TABLES=y +CONFIG_NFT_COMPAT=y +CONFIG_NF_TABLES_INET=y +CONFIG_NF_TABLES_NETDEV=y +CONFIG_NFT_NAT=y +CONFIG_BRIDGE_NF_EBTABLES=y +CONFIG_NETFILTER_XT_MATCH_LENGTH=y +CONFIG_IP_NF_MATCH_LENGTH=y +CONFIG_NFT_MATCH_LENGTH=y +CONFIG_NETFILTER_ADVANCED=y +CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y +CONFIG_NETFILTER_XTABLES=y +CONFIG_IP_SET=y +CONFIG_NF_TABLES_IPV4=y +CONFIG_NF_TABLES_ARP=y +CONFIG_NF_TABLES_BRIDGE=y
On 26/03/2024 15:50, Joshua Lant wrote:
Add kernel modules to morello_transitional_pcuabi_defconfig required for wireguard-module, wireguard-tools and other userspace programs in the wireguard test script netns.sh.
Signed-off-by: Joshua Lant joshualant@gmail.com
.../morello_transitional_pcuabi_defconfig | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+)
diff --git a/arch/arm64/configs/morello_transitional_pcuabi_defconfig b/arch/arm64/configs/morello_transitional_pcuabi_defconfig index 6291231b8d9c..eb5e32b4f0c6 100644 --- a/arch/arm64/configs/morello_transitional_pcuabi_defconfig +++ b/arch/arm64/configs/morello_transitional_pcuabi_defconfig @@ -162,3 +162,26 @@ CONFIG_DEBUG_FS=y CONFIG_FTRACE_SYSCALLS=y CONFIG_CORESIGHT=y CONFIG_MEMTEST=y +CONFIG_WIREGUARD=y +CONFIG_WIREGUARD_DEBUG=y +CONFIG_DUMMY=y +CONFIG_IP_ADVANCED_ROUTER=y +CONFIG_IP_MULTIPLE_TABLES=y +CONFIG_IP_NF_MANGLE=y +CONFIG_IPV6_MULTIPLE_TABLES=y +CONFIG_NF_TABLES=y +CONFIG_NFT_COMPAT=y +CONFIG_NF_TABLES_INET=y +CONFIG_NF_TABLES_NETDEV=y +CONFIG_NFT_NAT=y +CONFIG_BRIDGE_NF_EBTABLES=y +CONFIG_NETFILTER_XT_MATCH_LENGTH=y +CONFIG_IP_NF_MATCH_LENGTH=y +CONFIG_NFT_MATCH_LENGTH=y +CONFIG_NETFILTER_ADVANCED=y +CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y +CONFIG_NETFILTER_XTABLES=y +CONFIG_IP_SET=y +CONFIG_NF_TABLES_IPV4=y +CONFIG_NF_TABLES_ARP=y +CONFIG_NF_TABLES_BRIDGE=y
Generally speaking, we add options to this defconfig if they are enabled (or =m) in the standard arm64 defconfig (arch/arm64/configs/defconfig), or if they are somehow relevant to the Morello board specifically. I don't think this is the case for CONFIG_WIREGUARD and many of the other options added here. I would recommend writing your own config fragment with those options and merging it into this defconfig using scripts/kconfig/merge_config.sh.
Kevin
There is an alignment issue at the user/kernel boundary in xtables with capabilities, encountered in macro XT_ALIGN, in the function xt_check_target (with message size of (kernel) and (user) not matching). This bug occured when running certain iptables commands in the wireguard test script netns.sh. e.g. iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.0.0.0/24 -j SNAT --to 10.0.0.1
Signed-off-by: Joshua Lant joshualant@gmail.com --- include/uapi/linux/netfilter/x_tables.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/netfilter/x_tables.h b/include/uapi/linux/netfilter/x_tables.h index 796af83a963a..c53b46118531 100644 --- a/include/uapi/linux/netfilter/x_tables.h +++ b/include/uapi/linux/netfilter/x_tables.h @@ -95,6 +95,7 @@ struct _xt_align { __u16 u16; __u32 u32; __u64 u64; + __uintcap_t ucap; };
#define XT_ALIGN(s) __ALIGN_KERNEL((s), __alignof__(struct _xt_align))
On 26/03/2024 15:50, Joshua Lant wrote:
There is an alignment issue at the user/kernel boundary in xtables with capabilities, encountered in macro XT_ALIGN, in the function xt_check_target (with message size of (kernel) and (user) not matching). This bug occured when running certain iptables commands in the wireguard test script netns.sh. e.g. iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.0.0.0/24 -j SNAT --to 10.0.0.1
Signed-off-by: Joshua Lant joshualant@gmail.com
include/uapi/linux/netfilter/x_tables.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/netfilter/x_tables.h b/include/uapi/linux/netfilter/x_tables.h index 796af83a963a..c53b46118531 100644 --- a/include/uapi/linux/netfilter/x_tables.h +++ b/include/uapi/linux/netfilter/x_tables.h @@ -95,6 +95,7 @@ struct _xt_align { __u16 u16; __u32 u32; __u64 u64;
__uintcap_t ucap;
I can see how this sort of change might be necessary - if any of the netfilter structs contains user pointers. However I did not manage to find such a struct, either in those mentioned in the comment above (e.g. ipt_entry) or others under include/uapi/linux/netfilter. Do you have an idea which struct causes the issue? The error message printed by xt_check_target() should be helpful.
On a separate note, we cannot use __uintcap_t unconditionally in uapi headers - they need to be fully backwards-compatible. I think using void __user * would work just fine here (we want to ensure user pointer alignment regardless of the ABI, and this achieves exactly that).
Kevin
}; #define XT_ALIGN(s) __ALIGN_KERNEL((s), __alignof__(struct _xt_align))
On Wed, 27 Mar 2024 at 08:19, Kevin Brodsky kevin.brodsky@arm.com wrote:
On 26/03/2024 15:50, Joshua Lant wrote:
There is an alignment issue at the user/kernel boundary in xtables with capabilities, encountered in macro XT_ALIGN, in the function xt_check_target (with message size of (kernel) and (user) not matching). This bug occured when running certain iptables commands in the wireguard
test
script netns.sh. e.g. iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.0.0.0/24 -j SNAT --to 10.0.0.1
Signed-off-by: Joshua Lant joshualant@gmail.com
include/uapi/linux/netfilter/x_tables.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/netfilter/x_tables.h
b/include/uapi/linux/netfilter/x_tables.h
index 796af83a963a..c53b46118531 100644 --- a/include/uapi/linux/netfilter/x_tables.h +++ b/include/uapi/linux/netfilter/x_tables.h @@ -95,6 +95,7 @@ struct _xt_align { __u16 u16; __u32 u32; __u64 u64;
__uintcap_t ucap;
I can see how this sort of change might be necessary - if any of the netfilter structs contains user pointers. However I did not manage to find such a struct, either in those mentioned in the comment above (e.g. ipt_entry) or others under include/uapi/linux/netfilter. Do you have an idea which struct causes the issue? The error message printed by xt_check_target() should be helpful.
On a separate note, we cannot use __uintcap_t unconditionally in uapi headers - they need to be fully backwards-compatible. I think using void __user * would work just fine here (we want to ensure user pointer alignment regardless of the ABI, and this achieves exactly that).
Kevin
};
#define XT_ALIGN(s) __ALIGN_KERNEL((s), __alignof__(struct _xt_align))
Hi Kevin,
Thanks for reviewing the patches. Sorry for the long post here…
So the bug presents itself in iptables using this function which contains a sendmsg syscall:
mnl_nft_socket_sendmsg()
https://git.netfilter.org/iptables/tree/iptables/nft.c#n193
Forgive my ignorance, but the use of the __user annotation is a little unclear to me… Is it generally regarded as a requirement on all pointers passed between userspace and kernel space, or is it an optional way to provide additional safety to stop developers accidentally dereferencing user pointers?
Is this issue related to the fact that the sendmsg syscall is not supported in the transitional purecap ABI? And so could present itself in other places? I.e. my patch fixes my specific problem, but not necessarily all cases that someone may encounter around sendmsg? If this is the case then what sort of work would be required? Is adding support for this a task that I could reasonably undertake?
Here is the backtrace from the kernel at the point where the error message is printed, along with the message:
[ 218.297499] Call trace: [ 218.299931] dump_backtrace+0xec/0x140 [ 218.303671] show_stack+0x1c/0x30 [ 218.306972] dump_stack_lvl+0x5c/0x78 [ 218.310623] dump_stack+0x14/0x20 [ 218.313925] xt_check_target+0x1dc/0x270 [ 218.317848] nft_target_init+0x1f8/0x2c0 [ 218.321759] nf_tables_newrule+0x6d0/0x920 [ 218.325849] nfnetlink_rcv+0x4b8/0x820 [ 218.329586] netlink_unicast_kernel+0xc4/0x190 [ 218.334017] netlink_unicast+0x100/0x1b0 [ 218.337927] netlink_sendmsg+0x2c4/0x3b8 [ 218.341837] ____sys_sendmsg+0x1a0/0x288 [ 218.345747] __sys_sendmsg+0x154/0x1c8 [ 218.349506] __arm64_sys_sendmsg+0x28/0x38 [ 218.353589] el0_svc_common+0xd8/0x140 [ 218.357326] do_el0_svc+0x4c/0xb8 [ 218.360629] el0_svc+0x24/0x58 [ 218.363670] el0t_64_sync_handler+0x7c/0xe8 [ 218.367839] el0t_64_sync+0x1c0/0x1c8 [ 218.371505] x_tables: ip_tables: SNAT.0 target: invalid size 24 (kernel) != (user) 32
It is unclear to me at this point what the actual message is that’s being sent (i.e. what is causing the size discrepancy). The user size is determined in nft_target_init(), by use of the nla_len() function. The len is determined from the total size-header size, so it is the payload I guess that contains an offending capability? However, I have not been able to dig enough yet to find the payload that is sent.
https://elixir.bootlin.com/linux/v6.4/source/net/netfilter/nft_compat.c#L236
(As a complete aside… In examining the backtrace, I see that this “__arm64_sys_sendmsg” function is not referenced anywhere when i grep the kernel… I don’t really understand where that has come from?)
In regards to: “On a separate note, we cannot use __uintcap_t unconditionally in uapi headers ”...
Apologies, I had originally made this conditional using #ifdef CONFIG_CHERI_PURECAP_UABI, but the compiler complained about “leak CONFIG_CHERI_PURECAP_UABI to user-space” so i did not include it… I have just tried again with #ifdef __CHERI_PURE_CAPABILITY__ and it compiles just fine.
Is this preferable with using __unitcap_t? Or do you mean that by using void __user * instead it would not be required to be conditional at all?
Cheers,
Josh
On 28/03/2024 17:20, josh lant wrote:
On Wed, 27 Mar 2024 at 08:19, Kevin Brodsky kevin.brodsky@arm.com wrote:
On 26/03/2024 15:50, Joshua Lant wrote: > There is an alignment issue at the user/kernel boundary in xtables with > capabilities, encountered in macro XT_ALIGN, in the function > xt_check_target (with message size of (kernel) and (user) not matching). > This bug occured when running certain iptables commands in the wireguard test > script netns.sh. e.g. > iptables -t nat -A POSTROUTING -s 192.168.1.0/24 <http://192.168.1.0/24> -d 10.0.0.0/24 <http://10.0.0.0/24> -j SNAT > --to 10.0.0.1 > > Signed-off-by: Joshua Lant joshualant@gmail.com > --- > include/uapi/linux/netfilter/x_tables.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/uapi/linux/netfilter/x_tables.h b/include/uapi/linux/netfilter/x_tables.h > index 796af83a963a..c53b46118531 100644 > --- a/include/uapi/linux/netfilter/x_tables.h > +++ b/include/uapi/linux/netfilter/x_tables.h > @@ -95,6 +95,7 @@ struct _xt_align { > __u16 u16; > __u32 u32; > __u64 u64; > + __uintcap_t ucap; I can see how this sort of change might be necessary - if any of the netfilter structs contains user pointers. However I did not manage to find such a struct, either in those mentioned in the comment above (e.g. ipt_entry) or others under include/uapi/linux/netfilter. Do you have an idea which struct causes the issue? The error message printed by xt_check_target() should be helpful. On a separate note, we cannot use __uintcap_t unconditionally in uapi headers - they need to be fully backwards-compatible. I think using void __user * would work just fine here (we want to ensure user pointer alignment regardless of the ABI, and this achieves exactly that). Kevin > }; > > #define XT_ALIGN(s) __ALIGN_KERNEL((s), __alignof__(struct _xt_align))
Hi Kevin,
Thanks for reviewing the patches. Sorry for the long post here…
On the contrary, thank you for providing those details, more is better than less!
So the bug presents itself in iptables using this function which contains a sendmsg syscall:
mnl_nft_socket_sendmsg()
https://git.netfilter.org/iptables/tree/iptables/nft.c#n193
Forgive my ignorance, but the use of the __user annotation is a little unclear to me… Is it generally regarded as a requirement on all pointers passed between userspace and kernel space, or is it an optional way to provide additional safety to stop developers accidentally dereferencing user pointers?
In mainline, __user is an annotation that can be used by tools such as sparse, but has otherwise no effect. In Morello Linux, when PCuABI is selected, it is actually used to turn all user pointers into capabilities, and it is therefore essential to use it for all user pointers. You may find it useful to have a look at the PCuABI overview in the tree [1].
Is this issue related to the fact that the sendmsg syscall is not supported in the transitional purecap ABI? And so could present itself in other places? I.e. my patch fixes my specific problem, but not necessarily all cases that someone may encounter around sendmsg? If this is the case then what sort of work would be required? Is adding support for this a task that I could reasonably undertake?
The transitional ABI is not really relevant any more, as we are now able to test most syscalls and are not aware of regressions in PCuABI (which of course doesn't meant there isn't any!). The documentation does need updating, this is something that is on my TODO list. Like the other syscalls, sendmsg() is expected to work as normal, though our testing is by no means exhaustive and you may well have found a defect here.
Here is the backtrace from the kernel at the point where the error message is printed, along with the message:
[ 218.297499] Call trace: [ 218.299931] dump_backtrace+0xec/0x140 [ 218.303671] show_stack+0x1c/0x30 [ 218.306972] dump_stack_lvl+0x5c/0x78 [ 218.310623] dump_stack+0x14/0x20 [ 218.313925] xt_check_target+0x1dc/0x270 [ 218.317848] nft_target_init+0x1f8/0x2c0 [ 218.321759] nf_tables_newrule+0x6d0/0x920 [ 218.325849] nfnetlink_rcv+0x4b8/0x820 [ 218.329586] netlink_unicast_kernel+0xc4/0x190 [ 218.334017] netlink_unicast+0x100/0x1b0 [ 218.337927] netlink_sendmsg+0x2c4/0x3b8 [ 218.341837] ____sys_sendmsg+0x1a0/0x288 [ 218.345747] __sys_sendmsg+0x154/0x1c8 [ 218.349506] __arm64_sys_sendmsg+0x28/0x38 [ 218.353589] el0_svc_common+0xd8/0x140 [ 218.357326] do_el0_svc+0x4c/0xb8 [ 218.360629] el0_svc+0x24/0x58 [ 218.363670] el0t_64_sync_handler+0x7c/0xe8 [ 218.367839] el0t_64_sync+0x1c0/0x1c8 [ 218.371505] x_tables: ip_tables: SNAT.0 target: invalid size 24 (kernel) != (user) 32
Luckily this error allows us to find the relevant struct easily. "SNAT.0" shows that the relevant struct xt_target is this one [2]. From there we find that the corresponding uapi struct is nf_nat_ipv4_multi_range_compat [3]. Its size is 20 regardless of the ABI (only 16- and 32-bit integers), so its 8-byte aligned size is 24, as printed.
My suspicion is that the issue lies somewhere in iptables rather than in the kernel itself, considering that this struct is clearly fixed-size. It could be that the layout of the message/header that iptables uses doesn't match what the kernel expects in purecap, and as a result the size the kernel reads is not what iptables set. However, I know essentially nothing about iptables or netfilter, so I'm afraid I won't be of much help to investigate this further...
[1] https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/D... [2] https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/n... [3] https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/i... [4] https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/a...
It is unclear to me at this point what the actual message is that’s being sent (i.e. what is causing the size discrepancy). The user size is determined in nft_target_init(), by use of the nla_len() function. The len is determined from the total size-header size, so it is the payload I guess that contains an offending capability? However, I have not been able to dig enough yet to find the payload that is sent.
https://elixir.bootlin.com/linux/v6.4/source/net/netfilter/nft_compat.c#L236
(As a complete aside… In examining the backtrace, I see that this “__arm64_sys_sendmsg” function is not referenced anywhere when i grep the kernel… I don’t really understand where that has come from?)
The syscall wrapper machinery does a lot of magic indeed! [4]
In regards to: “On a separate note, we cannot use __uintcap_t unconditionally in uapi headers ”...
Apologies, I had originally made this conditional using #ifdef CONFIG_CHERI_PURECAP_UABI, but the compiler complained about “leak CONFIG_CHERI_PURECAP_UABI to user-space” so i did not include it…
Indeed, CONFIG macros cannot be used in uapi headers, because userspace cannot know their value when building against the headers.
I have just tried again with #ifdef __CHERI_PURE_CAPABILITY__ and it compiles just fine.
We cannot use that either, because the kernel is not purecap itself. That's why typing is so tricky with PCuABI in uapi headers :/
Is this preferable with using __unitcap_t? Or do you mean that by using void __user * instead it would not be required to be conditional at all?
void __user * works unconditionally. From a kernel perspective, it is always a user pointer (capability in PCuABI). From a user perspective, __user is defined to nothing, so it is a normal pointer (also a capability in purecap).
Hope that clarifies the situation a little!
Kevin
Cheers,
Josh
On Tue, 2 Apr 2024 at 13:03, Kevin Brodsky kevin.brodsky@arm.com wrote:
On 28/03/2024 17:20, josh lant wrote:
On Wed, 27 Mar 2024 at 08:19, Kevin Brodsky kevin.brodsky@arm.com
wrote:
On 26/03/2024 15:50, Joshua Lant wrote: > There is an alignment issue at the user/kernel boundary in xtables with > capabilities, encountered in macro XT_ALIGN, in the function > xt_check_target (with message size of (kernel) and (user) not matching). > This bug occured when running certain iptables commands in the wireguard test > script netns.sh. e.g. > iptables -t nat -A POSTROUTING -s 192.168.1.0/24 <http://192.168.1.0/24> -d 10.0.0.0/24 <http://10.0.0.0/24> -j SNAT > --to 10.0.0.1 > > Signed-off-by: Joshua Lant joshualant@gmail.com > --- > include/uapi/linux/netfilter/x_tables.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/uapi/linux/netfilter/x_tables.h b/include/uapi/linux/netfilter/x_tables.h > index 796af83a963a..c53b46118531 100644 > --- a/include/uapi/linux/netfilter/x_tables.h > +++ b/include/uapi/linux/netfilter/x_tables.h > @@ -95,6 +95,7 @@ struct _xt_align { > __u16 u16; > __u32 u32; > __u64 u64; > + __uintcap_t ucap; I can see how this sort of change might be necessary - if any of the netfilter structs contains user pointers. However I did not manage to find such a struct, either in those mentioned in the comment above (e.g. ipt_entry) or others under include/uapi/linux/netfilter. Do you have an idea which struct causes the issue? The error message printed by xt_check_target() should be helpful. On a separate note, we cannot use __uintcap_t unconditionally in uapi headers - they need to be fully backwards-compatible. I think using void __user * would work just fine here (we want to ensure user pointer alignment regardless of the ABI, and this achieves exactly that). Kevin > }; > > #define XT_ALIGN(s) __ALIGN_KERNEL((s), __alignof__(struct _xt_align))
Hi Kevin,
Thanks for reviewing the patches. Sorry for the long post here…
On the contrary, thank you for providing those details, more is better than less!
So the bug presents itself in iptables using this function which contains a sendmsg syscall:
mnl_nft_socket_sendmsg()
https://git.netfilter.org/iptables/tree/iptables/nft.c#n193
Forgive my ignorance, but the use of the __user annotation is a little unclear to me… Is it generally regarded as a requirement on all pointers passed between userspace and kernel space, or is it an optional way to provide additional safety to stop developers accidentally dereferencing user pointers?
In mainline, __user is an annotation that can be used by tools such as sparse, but has otherwise no effect. In Morello Linux, when PCuABI is selected, it is actually used to turn all user pointers into capabilities, and it is therefore essential to use it for all user pointers. You may find it useful to have a look at the PCuABI overview in the tree [1].
Is this issue related to the fact that the sendmsg syscall is not supported in the transitional purecap ABI? And so could present itself in other places? I.e. my patch fixes my specific problem, but not necessarily all cases that someone may encounter around sendmsg? If this is the case then what sort of work would be required? Is adding support for this a task that I could reasonably undertake?
The transitional ABI is not really relevant any more, as we are now able to test most syscalls and are not aware of regressions in PCuABI (which of course doesn't meant there isn't any!). The documentation does need updating, this is something that is on my TODO list. Like the other syscalls, sendmsg() is expected to work as normal, though our testing is by no means exhaustive and you may well have found a defect here.
Here is the backtrace from the kernel at the point where the error message is printed, along with the message:
[ 218.297499] Call trace: [ 218.299931] dump_backtrace+0xec/0x140 [ 218.303671] show_stack+0x1c/0x30 [ 218.306972] dump_stack_lvl+0x5c/0x78 [ 218.310623] dump_stack+0x14/0x20 [ 218.313925] xt_check_target+0x1dc/0x270 [ 218.317848] nft_target_init+0x1f8/0x2c0 [ 218.321759] nf_tables_newrule+0x6d0/0x920 [ 218.325849] nfnetlink_rcv+0x4b8/0x820 [ 218.329586] netlink_unicast_kernel+0xc4/0x190 [ 218.334017] netlink_unicast+0x100/0x1b0 [ 218.337927] netlink_sendmsg+0x2c4/0x3b8 [ 218.341837] ____sys_sendmsg+0x1a0/0x288 [ 218.345747] __sys_sendmsg+0x154/0x1c8 [ 218.349506] __arm64_sys_sendmsg+0x28/0x38 [ 218.353589] el0_svc_common+0xd8/0x140 [ 218.357326] do_el0_svc+0x4c/0xb8 [ 218.360629] el0_svc+0x24/0x58 [ 218.363670] el0t_64_sync_handler+0x7c/0xe8 [ 218.367839] el0t_64_sync+0x1c0/0x1c8 [ 218.371505] x_tables: ip_tables: SNAT.0 target: invalid size 24 (kernel) != (user) 32
Luckily this error allows us to find the relevant struct easily. "SNAT.0" shows that the relevant struct xt_target is this one [2]. From there we find that the corresponding uapi struct is nf_nat_ipv4_multi_range_compat [3]. Its size is 20 regardless of the ABI (only 16- and 32-bit integers), so its 8-byte aligned size is 24, as printed.
My suspicion is that the issue lies somewhere in iptables rather than in the kernel itself, considering that this struct is clearly fixed-size. It could be that the layout of the message/header that iptables uses doesn't match what the kernel expects in purecap, and as a result the size the kernel reads is not what iptables set. However, I know essentially nothing about iptables or netfilter, so I'm afraid I won't be of much help to investigate this further...
[1]
https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/D... [2]
https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/n... [3]
https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/i... [4]
https://git.morello-project.org/morello/kernel/linux/-/blob/morello/master/a...
It is unclear to me at this point what the actual message is that’s being sent (i.e. what is causing the size discrepancy). The user size is determined in nft_target_init(), by use of the nla_len() function. The len is determined from the total size-header size, so it is the payload I guess that contains an offending capability? However, I have not been able to dig enough yet to find the payload that is sent.
https://elixir.bootlin.com/linux/v6.4/source/net/netfilter/nft_compat.c#L236
(As a complete aside… In examining the backtrace, I see that this “__arm64_sys_sendmsg” function is not referenced anywhere when i grep the kernel… I don’t really understand where that has come from?)
The syscall wrapper machinery does a lot of magic indeed! [4]
In regards to: “On a separate note, we cannot use __uintcap_t unconditionally in uapi headers ”...
Apologies, I had originally made this conditional using #ifdef CONFIG_CHERI_PURECAP_UABI, but the compiler complained about “leak CONFIG_CHERI_PURECAP_UABI to user-space” so i did not include it…
Indeed, CONFIG macros cannot be used in uapi headers, because userspace cannot know their value when building against the headers.
I have just tried again with #ifdef __CHERI_PURE_CAPABILITY__ and it compiles just fine.
We cannot use that either, because the kernel is not purecap itself. That's why typing is so tricky with PCuABI in uapi headers :/
Is this preferable with using __unitcap_t? Or do you mean that by using void __user * instead it would not be required to be conditional at all?
void __user * works unconditionally. From a kernel perspective, it is always a user pointer (capability in PCuABI). From a user perspective, __user is defined to nothing, so it is a normal pointer (also a capability in purecap).
Hope that clarifies the situation a little!
Kevin
Cheers,
Josh
Hi Kevin,
Thanks for your input here. I appreciate you taking the time to go through it bit by bit...
I guess I need to do some more digging. So should I not submit a v2 for this patch until I am able to have a more concrete answer about the cause of the issue? Or should I do that regardless, the update with just the void __user *, and not submitting the defconfig patch above at all (I only found 1 overlapping setting between the standard arm64 config and my updated one)?
Cheers,
Josh
On 04/04/2024 11:12, josh lant wrote:
Hi Kevin,
Thanks for your input here. I appreciate you taking the time to go through it bit by bit...
I guess I need to do some more digging. So should I not submit a v2 for this patch until I am able to have a more concrete answer about the cause of the issue? Or should I do that regardless, the update with just the void __user *, and not submitting the defconfig patch above at all (I only found 1 overlapping setting between the standard arm64 config and my updated one)?
I would indeed very much appreciate if you could dig further, because at the moment I am not convinced that this patch is addressing the root cause of the issue, as opposed to hiding it. If you manage to confirm that patch 2 is the appropriate solution, please feel free to send a v2 (with an updated commit message). It is probably best to drop patch 1.
Thanks again for taking the time to investigate this!
Kevin
PS: one small thing, pure plain text emails are preferred when replying on mailing lists - it is possible to toggle this in the Gmail web UI (and classic email clients like Thunderbird).
Cheers,
Josh
I would indeed very much appreciate if you could dig further, because at the moment I am not convinced that this patch is addressing the root cause of the issue, as opposed to hiding it.
Hi Kevin,
Apologies it has taken me so long to respond to this. I thought it better to continue this discussion on this thread rather than create a new patch submission. I have found the source of the issue from the iptables side. There are several other failing test cases in the iptables tests which are fixed by this patch. These are as follows, with the corresponding dmesg output:
././testcases/ipt-save/0001loa d-dumps_0 ip_tables: TCPMSS.0 target: invalid size 8 (kernel) != (user) 16 ././testcases/ipt-save/0002load-fedora27-firewalld_0 ip_tables: CHECKSUM.0 target: invalid size 8 (kernel) != (user) 16 ././testcases/nft-only/0008-basechain-policy_0 ip_tables: CT.2 target: invalid size 72 (kernel) != (user) 80
Taking this last testcase, we can see that the CT.2 target corresponds to this xt_ct_target_info_v1 struct [1], which does indeed contain a pointer/capability (I have no idea what happens when you have that __attribute__((aligned(8)) on a pointer in the morello compiler though? I guess it is ignored or changed to 16?). The size is determined using the macro XT_ALIGN(sizeof(struct xt_ct_target_info_v1)), which gives 72 in non-purecap, and 80 in purecap.
In the example we originally discussed before, using the SNAT.0 xt_target struct, the situation is a little more complex. We see that when the netlink message is composed by iptables, this function append_range is called [2], which increases the size of the netlink attribute to be put into the message. In this function the sizeof(*info) is what causes the discrepancy between purecap and regular aarch64. *info is this struct below, which has size 64B in purecap rather than 56B in non-purecap, due to the need for 16B alignment.:
struct ipt_natinfo { struct xt_entry_target t; [3] struct nf_nat_ipv4_multi_range_compat mr; [4] };
Now the issue that I see in this example above when using the patch first proposed, is that this appends a redundant 8B of data to the end of the netlink attribute. I think that this particular problem could be fixed from the iptables side. However, it does not change the situation with the CT.2 target, and since developers are free to create their own module extensions to netfilter/iptables by adding their own matches/targets, I think that the XT_ALIGN fix I originally proposed would be necessary as long as the XT_ALIGN macro remains in the uapi and is able to be used by userspace tools. I don't know what you reckon?
Many thanks,
Josh
[1] https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/netfilter/... [2] https://git.netfilter.org/iptables/tree/extensions/libipt_SNAT.c?h=v1.8.7#n5... [3] https://git.netfilter.org/iptables/tree/include/linux/netfilter/x_tables.h?h... [4] https://git.netfilter.org/iptables/tree/include/linux/netfilter/nf_nat.h?h=v...
On Thu, 4 Apr 2024 at 10:37, Kevin Brodsky kevin.brodsky@arm.com wrote:
On 04/04/2024 11:12, josh lant wrote:
Hi Kevin,
Thanks for your input here. I appreciate you taking the time to go through it bit by bit...
I guess I need to do some more digging. So should I not submit a v2 for this patch until I am able to have a more concrete answer about the cause of the issue? Or should I do that regardless, the update with just the void __user *, and not submitting the defconfig patch above at all (I only found 1 overlapping setting between the standard arm64 config and my updated one)?
I would indeed very much appreciate if you could dig further, because at the moment I am not convinced that this patch is addressing the root cause of the issue, as opposed to hiding it. If you manage to confirm that patch 2 is the appropriate solution, please feel free to send a v2 (with an updated commit message). It is probably best to drop patch 1.
Thanks again for taking the time to investigate this!
Kevin
PS: one small thing, pure plain text emails are preferred when replying on mailing lists - it is possible to toggle this in the Gmail web UI (and classic email clients like Thunderbird).
Cheers,
Josh
On 20/06/2024 13:05, josh lant wrote:
Taking this last testcase, we can see that the CT.2 target corresponds to this xt_ct_target_info_v1 struct [1], which does indeed contain a pointer/capability (I have no idea what happens when you have that __attribute__((aligned(8)) on a pointer in the morello compiler though? I guess it is ignored or changed to 16?). The size is determined using the macro XT_ALIGN(sizeof(struct xt_ct_target_info_v1)), which gives 72 in non-purecap, and 80 in purecap.
In the example we originally discussed before, using the SNAT.0 xt_target struct, the situation is a little more complex. We see that when the netlink message is composed by iptables, this function append_range is called [2], which increases the size of the netlink attribute to be put into the message. In this function the sizeof(*info) is what causes the discrepancy between purecap and regular aarch64. *info is this struct below, which has size 64B in purecap rather than 56B in non-purecap, due to the need for 16B alignment.:
struct ipt_natinfo { struct xt_entry_target t; [3] struct nf_nat_ipv4_multi_range_compat mr; [4] };
Now the issue that I see in this example above when using the patch first proposed, is that this appends a redundant 8B of data to the end of the netlink attribute. I think that this particular problem could be fixed from the iptables side. However, it does not change the situation with the CT.2 target, and since developers are free to create their own module extensions to netfilter/iptables by adding their own matches/targets, I think that the XT_ALIGN fix I originally proposed would be necessary as long as the XT_ALIGN macro remains in the uapi and is able to be used by userspace tools. I don't know what you reckon?
Thanks a lot for having another look. You have exposed the underlying issue, and I've got to say I didn't expect it. It appears that structs like xt_entry_match or xt_ct_target_info hold *kernel* pointers, strictly for internal kernel usage. This is pretty dubious for uapi structs, honestly, but it is how it is. Now the problem is that we simply cannot have kernel pointers in uapi headers in our PCuABI implementation, because doing so results in the kernel and userspace disagreeing on the layout: the kernel "sees" a 64-bit pointer while userspace sees a capability.
This is what I meant by modifying XT_ALIGN() shadowing the underlying issue: we need to ensure that the kernel and userspace see the same layout for the uapi structs; aligning up by 16 happens to work for at least some of those structs, but that's only because we're lucky with the layout (either unions or the kernel pointer being stored at the end). What we really want to do is to prevent introducing capabilities in those structs (when seen from userspace), as they do not actually hold user pointers.
The robust way to fix this is to modify the struct definitions so that the layout is the same in the kernel and userspace. There's unfortunately no obvious way to do this. What is probably the easiest is to change all those fields that are typed as (kernel) pointers to be unsigned long instead, as this keeps the size of the field unchanged (both in 32- and 64-bit) and doesn't become a capability in purecap. The main downside is that extra casts will be needed whenever that field is accessed in the kernel.
Where it is used, the __attribute__((aligned(8))) should stay - its meaning remains unchanged. It effectively does nothing on arm64 if applied on a pointer or unsigned long (already 8-byte aligned), but it may be needed on other architectures.
Note that there are unfortunately quite a few more uapi structs that hold those internal kernel pointers on top of those you've already found, I've used this to find them: git grep 'struct \w+\s**\w' include/uapi/linux/netfilter*
I'm realising this is probably turning out to be quite a lot more work than you originally planned. If you do not expect to be able to fix all those structs (which would of course be perfectly understandable), I will at least make sure to document this issue.
Thanks again, Kevin
Hi Kevin,
The robust way to fix this is to modify the struct definitions so that the layout is the same in the kernel and userspace. There's unfortunately no obvious way to do this. What is probably the easiest is to change all those fields that are typed as (kernel) pointers to be unsigned long instead, as this keeps the size of the field unchanged (both in 32- and 64-bit) and doesn't become a capability in purecap. The main downside is that extra casts will be needed whenever that field is accessed in the kernel.
Thanks for your input on this again. This is really helpful. I had actually tried to start modifying one of these kernel pointers into unsigned longs while trying to work this problem out. I originally abandoned it because it seemed like a long rabbit hole, and I was unsure if it was actually going to fix the problem. Now I know that this is the way to go, I may make more of a concerted effort to do it.
I'm realising this is probably turning out to be quite a lot more work than you originally planned. If you do not expect to be able to fix all those structs (which would of course be perfectly understandable), I will at least make sure to document this issue.
My time to do the work might be limited, so I cannot make promises. However, having taken the time to dig into netfilter/netlink and iptables I'm in a much better position now than I was to actually tackle the problem. I will look into it and try and get a gauge on whether I have time to do it.
Cheers,
Josh
On Tue, 25 Jun 2024 at 14:23, Kevin Brodsky kevin.brodsky@arm.com wrote:
On 20/06/2024 13:05, josh lant wrote:
Taking this last testcase, we can see that the CT.2 target corresponds to this xt_ct_target_info_v1 struct [1], which does indeed contain a pointer/capability (I have no idea what happens when you have that __attribute__((aligned(8)) on a pointer in the morello compiler though? I guess it is ignored or changed to 16?). The size is determined using the macro XT_ALIGN(sizeof(struct xt_ct_target_info_v1)), which gives 72 in non-purecap, and 80 in purecap.
In the example we originally discussed before, using the SNAT.0 xt_target struct, the situation is a little more complex. We see that when the netlink message is composed by iptables, this function append_range is called [2], which increases the size of the netlink attribute to be put into the message. In this function the sizeof(*info) is what causes the discrepancy between purecap and regular aarch64. *info is this struct below, which has size 64B in purecap rather than 56B in non-purecap, due to the need for 16B alignment.:
struct ipt_natinfo { struct xt_entry_target t; [3] struct nf_nat_ipv4_multi_range_compat mr; [4] };
Now the issue that I see in this example above when using the patch first proposed, is that this appends a redundant 8B of data to the end of the netlink attribute. I think that this particular problem could be fixed from the iptables side. However, it does not change the situation with the CT.2 target, and since developers are free to create their own module extensions to netfilter/iptables by adding their own matches/targets, I think that the XT_ALIGN fix I originally proposed would be necessary as long as the XT_ALIGN macro remains in the uapi and is able to be used by userspace tools. I don't know what you reckon?
Thanks a lot for having another look. You have exposed the underlying issue, and I've got to say I didn't expect it. It appears that structs like xt_entry_match or xt_ct_target_info hold *kernel* pointers, strictly for internal kernel usage. This is pretty dubious for uapi structs, honestly, but it is how it is. Now the problem is that we simply cannot have kernel pointers in uapi headers in our PCuABI implementation, because doing so results in the kernel and userspace disagreeing on the layout: the kernel "sees" a 64-bit pointer while userspace sees a capability.
This is what I meant by modifying XT_ALIGN() shadowing the underlying issue: we need to ensure that the kernel and userspace see the same layout for the uapi structs; aligning up by 16 happens to work for at least some of those structs, but that's only because we're lucky with the layout (either unions or the kernel pointer being stored at the end). What we really want to do is to prevent introducing capabilities in those structs (when seen from userspace), as they do not actually hold user pointers.
The robust way to fix this is to modify the struct definitions so that the layout is the same in the kernel and userspace. There's unfortunately no obvious way to do this. What is probably the easiest is to change all those fields that are typed as (kernel) pointers to be unsigned long instead, as this keeps the size of the field unchanged (both in 32- and 64-bit) and doesn't become a capability in purecap. The main downside is that extra casts will be needed whenever that field is accessed in the kernel.
Where it is used, the __attribute__((aligned(8))) should stay - its meaning remains unchanged. It effectively does nothing on arm64 if applied on a pointer or unsigned long (already 8-byte aligned), but it may be needed on other architectures.
Note that there are unfortunately quite a few more uapi structs that hold those internal kernel pointers on top of those you've already found, I've used this to find them: git grep 'struct \w+\s**\w' include/uapi/linux/netfilter*
I'm realising this is probably turning out to be quite a lot more work than you originally planned. If you do not expect to be able to fix all those structs (which would of course be perfectly understandable), I will at least make sure to document this issue.
Thanks again, Kevin
On 26/03/2024 15:50, Joshua Lant wrote:
Hi,
This series of patches enables the use of the Wireguard VPN and all assocaited tools required for running wireguard-tools' test script. Wireguard's test script (netns.sh) runs to completion using purecap compiled:
wireguard-tools, iproute2, iputils (ping/ping6), iptables, nftables, libnftnl, libmnl, libelf, argp-standalone, musl-obstack, fts, libjansson.
Packages used in netns.sh currently not tested in purecap:
ncat, iperf3.
The bulk of the changes required are additions to the kernel config, with a fix for a bug found in iptables.
There is an alignment issue at the user/kernel boundary in xtables with capabilities, encountered in the macro XT_ALIGN, used in the function xt_check_target (with the resulting message indicating size of (kernel) and (user) not matching). This bug occurs when running certain iptables commands in the test script. e.g.
iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.0.0.0/24 -j SNAT --to 10.0.0.1
This is my first patch to the kernel so please forgive me if anything is drastically wrong. I have tried to follow the format of others on here...
Thank you for taking the time to put together these patches and post them on the list, this is very appreciated! The format is exactly as one would expect overall :) Just a small note, no need to add the [linux-morello] tag explicitly - it is automatically added by the list itself.
I will make further comments on each individual patch.
Kevin
Cheers,
Joshua Lant
Joshua Lant (2): morello: enable wireguard kernel config xtables: fix alignment issue
.../morello_transitional_pcuabi_defconfig | 23 +++++++++++++++++++ include/uapi/linux/netfilter/x_tables.h | 1 + 2 files changed, 24 insertions(+)
linux-morello@op-lists.linaro.org