On 20/06/2024 13:05, josh lant wrote:
Taking this last testcase, we can see that the CT.2 target corresponds to this xt_ct_target_info_v1 struct [1], which does indeed contain a pointer/capability (I have no idea what happens when you have that __attribute__((aligned(8)) on a pointer in the morello compiler though? I guess it is ignored or changed to 16?). The size is determined using the macro XT_ALIGN(sizeof(struct xt_ct_target_info_v1)), which gives 72 in non-purecap, and 80 in purecap.
In the example we originally discussed before, using the SNAT.0 xt_target struct, the situation is a little more complex. We see that when the netlink message is composed by iptables, this function append_range is called [2], which increases the size of the netlink attribute to be put into the message. In this function the sizeof(*info) is what causes the discrepancy between purecap and regular aarch64. *info is this struct below, which has size 64B in purecap rather than 56B in non-purecap, due to the need for 16B alignment.:
struct ipt_natinfo { struct xt_entry_target t; [3] struct nf_nat_ipv4_multi_range_compat mr; [4] };
Now the issue that I see in this example above when using the patch first proposed, is that this appends a redundant 8B of data to the end of the netlink attribute. I think that this particular problem could be fixed from the iptables side. However, it does not change the situation with the CT.2 target, and since developers are free to create their own module extensions to netfilter/iptables by adding their own matches/targets, I think that the XT_ALIGN fix I originally proposed would be necessary as long as the XT_ALIGN macro remains in the uapi and is able to be used by userspace tools. I don't know what you reckon?
Thanks a lot for having another look. You have exposed the underlying issue, and I've got to say I didn't expect it. It appears that structs like xt_entry_match or xt_ct_target_info hold *kernel* pointers, strictly for internal kernel usage. This is pretty dubious for uapi structs, honestly, but it is how it is. Now the problem is that we simply cannot have kernel pointers in uapi headers in our PCuABI implementation, because doing so results in the kernel and userspace disagreeing on the layout: the kernel "sees" a 64-bit pointer while userspace sees a capability.
This is what I meant by modifying XT_ALIGN() shadowing the underlying issue: we need to ensure that the kernel and userspace see the same layout for the uapi structs; aligning up by 16 happens to work for at least some of those structs, but that's only because we're lucky with the layout (either unions or the kernel pointer being stored at the end). What we really want to do is to prevent introducing capabilities in those structs (when seen from userspace), as they do not actually hold user pointers.
The robust way to fix this is to modify the struct definitions so that the layout is the same in the kernel and userspace. There's unfortunately no obvious way to do this. What is probably the easiest is to change all those fields that are typed as (kernel) pointers to be unsigned long instead, as this keeps the size of the field unchanged (both in 32- and 64-bit) and doesn't become a capability in purecap. The main downside is that extra casts will be needed whenever that field is accessed in the kernel.
Where it is used, the __attribute__((aligned(8))) should stay - its meaning remains unchanged. It effectively does nothing on arm64 if applied on a pointer or unsigned long (already 8-byte aligned), but it may be needed on other architectures.
Note that there are unfortunately quite a few more uapi structs that hold those internal kernel pointers on top of those you've already found, I've used this to find them: git grep 'struct \w+\s**\w' include/uapi/linux/netfilter*
I'm realising this is probably turning out to be quite a lot more work than you originally planned. If you do not expect to be able to fix all those structs (which would of course be perfectly understandable), I will at least make sure to document this issue.
Thanks again, Kevin