On 25/10/2023 18:57, Teo Couprie Diaz wrote:
argv and envp strings can be large enough so that a capability cannot be created to access them exclusively: their bounds would not be exactly representable and go outside of the strings.
During allocation on the stack, align the strings that need an adjusted base and pad if needed for alignment or a representable length. Handle this potential padding in fs/binfmt_elf by detecting unexpected 0s.
"unexpected" is a bit... unexpected for something we actually expect? :D Maybe "extra".
This is done irrespective of the calling binary or loaded binary ABIs. Indeed, at the point where strings are put on the stack during `execve` we can only know the ABI of the calling binary, but not that of the loaded binary. Thus the compat path will be slightly slower and the memory used by the strings will be larger, as if to create capabilities with exact bounds. However, this memory impact should stay minimal and in rare cases where argv or envp strings are more than a couple thousand characters.
Same comment as in v2.
Add a a helper function to put already created capabilities on the stack
s/a a/a/
during elf setup.
Additionally, align the newly created stack to a multiple of the maximum argument length to make sure the relocated arguments will still be respresentable with their new alignment in a PC-uABI kernel.
s/respresentable/representable/
Also I would say "in their new location", because the point is that we preserve a sufficient alignment.
A big improvement over v2 overall, well done!
Signed-off-by: Teo Couprie Diaz teo.coupriediaz@arm.com
fs/binfmt_elf.c | 101 +++++++++++++++++++++++++++++++++++++++++------- fs/exec.c | 58 +++++++++++++++++++++++++-- 2 files changed, 142 insertions(+), 17 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 6455313d7f36..198c1ed442a6 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -151,6 +151,84 @@ static int padzero(unsigned long elf_bss) } return 0; } +/*
- Put a pointer or capability to a string on the stack, return the effective
- size of the string on the stack or an error.
- When building a PC-uABI kernel, this will take into account padding and
"PCuABI" without dash (just to make grepping and such more reliable).
- create an exactly representable capability if in purecap.
- */
+static long put_str_ptr(char __user *ustr,
elf_stack_item_t __user *stack_item)
+{
- size_t len;
+#if defined(CONFIG_CHERI_PURECAP_UABI)
- char c;
- void __user *str_ptr;
Nit: char __user *
- len = strnlen_user(ustr, MAX_ARG_STRLEN);
- if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
/*
* Check if the next arg string starts right after the current one.
* If not, the len was not exactly representable and there is padding.
*/
That comment looks misindented.
- if (get_user(c, ustr + len))
return -EFAULT;
- if (c == '\0')
/*
* If the string has been adjusted for exact bounds representation,
* its start should already be properly aligned. Get the representable
* length by using the same length that was used during allocation:
* the length of the original string.
* This takes into account the padding due to length change after
* the string, but not that for alignment. Thus we might not end up
* at the start of the next arg. If not, we will need to take a slow
* path to go through the padding.
*/
len = cheri_representable_length(len);
I'm realising: do we even need to do this conditionally? The representable length should always be equal to the length if there is no padding. Executing RRLEN unconditionally is definitely faster than get_user() (and of course that's simpler too).
+#if (ELF_COMPAT == 0)
- str_ptr = cheri_perms_and(ustr,
(CHERI_PERM_GLOBAL | CHERI_PERM_STORE | CHERI_PERM_LOAD));
- str_ptr = cheri_bounds_set_exact(str_ptr, len);
- if (put_user_ptr(str_ptr, stack_item++))
return -EFAULT;
+#else
- if (elf_stack_put_user(ustr, stack_item++))
Nit: to reduce duplication some more and make it even clearer what's happening, this line could be common to the two paths, simply setting str_ptr to ustr in compat.
return -EFAULT;
+#endif
- /*
* If right after the end of the argument length we have a zero,
* that means the argument alignment was adjusted in order to create a
* representable capability in purecap, even if we are not loading a
* purecap binary. This padding is added at the end, so find the real
* end by going through the padding.
*/
- if (get_user(c, ustr + len))
return -EFAULT;
- if (c == '\0') {
size_t pad_len = 0;
while (pad_len < MAX_ARG_STRLEN && c == '\0') {
pad_len++;
I think there's an off-by-one, since we increment pad_len, then call get_user(), and then only after that check that pad_len < MAX_ARG_STRLEN. I was actually going to suggest using a for loop instead, which makes such issues less likely.
Also, I'm not sure we need to have that initial get_user() + if. We can directly have a loop, which may exit right away (need to be careful about limit conditions of course).
if (get_user(c, ustr + len + pad_len))
return -EFAULT;
}
ustr += pad_len;
len += pad_len;
- }
+#else /* CONFIG_CHERI_PURECAP_UABI */
- len = strnlen_user(ustr, MAX_ARG_STRLEN);
- if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
- if (elf_stack_put_user(ustr, stack_item++))
return -EFAULT;
+#endif /* CONFIG_CHERI_PURECAP_UABI */
- return len;
+} /* Let's use some macros to make this stack manipulation a little clearer */ #ifdef CONFIG_STACK_GROWSUP @@ -213,6 +291,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, #if defined(CONFIG_CHERI_PURECAP_UABI) && (ELF_COMPAT == 0) elf_stack_item_t *mm_at_argv, *mm_at_envp; #endif
- long str_ret = 0;
/* * In some cases (e.g. Hyper-Threading), we want to avoid L1 @@ -397,13 +476,10 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, /* In PCuABI, this derives a capability from SP pointing to arg_start */ ustr = sp + (mm->arg_start - user_ptr_addr(sp)); while (argc-- > 0) {
size_t len;
if (elf_stack_put_user(ustr, stack_item++))
return -EFAULT;
len = strnlen_user(ustr, MAX_ARG_STRLEN);
if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
ustr += len;
str_ret = put_str_ptr(ustr, stack_item++);
if (str_ret < 0)
return str_ret;
} if (elf_stack_put_user(0, stack_item++)) return -EFAULT;ustr += str_ret;
@@ -415,13 +491,10 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, #endif mm->env_start = user_ptr_addr(ustr); while (envc-- > 0) {
size_t len;
if (elf_stack_put_user(ustr, stack_item++))
return -EFAULT;
len = strnlen_user(ustr, MAX_ARG_STRLEN);
if (!len || len > MAX_ARG_STRLEN)
return -EINVAL;
ustr += len;
str_ret = put_str_ptr(ustr, stack_item++);
if (str_ret < 0)
return str_ret;
} if (elf_stack_put_user(0, stack_item++)) return -EFAULT;ustr += str_ret;
diff --git a/fs/exec.c b/fs/exec.c index 892cd911549d..7bc423a6fc9a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -66,6 +66,7 @@ #include <linux/coredump.h> #include <linux/time_namespace.h> #include <linux/user_events.h> +#include <linux/cheri.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -76,6 +77,12 @@ #include <trace/events/sched.h> +#ifdef CONFIG_CHERI_PURECAP_UABI +#define STACK_ALIGN_SIZE MAX_ARG_STRLEN +#else +#define STACK_ALIGN_SIZE PAGE_SIZE +#endif
static int bprm_creds_from_file(struct linux_binprm *bprm); int suid_dumpable = 0; @@ -515,6 +522,39 @@ static int bprm_stack_limits(struct linux_binprm *bprm) return 0; } +#if defined(CONFIG_CHERI_PURECAP_UABI) +/*
- Compute the a new value for p that allows for creating an exactly
- representable capability with all its padding after.
- */
+size_t align_p_for_exact_bounds(unsigned long p, size_t len)
I would say s/exact/representable/ (it is indeed in order to be able to set exact capability bounds, but representability is the root cause).
+{
- size_t repr_len = 0, repr_align;
- repr_len = cheri_representable_length(len);
- repr_align = ~cheri_representable_alignment_mask(len) + 1;
- if (repr_len > len || repr_align > 0) {
Do we actually need the conditions? It feels like the calculation should work in any case. That certainly appears to be true for the length, I would expect repr_align to compute to 1 if there is no requirement but worth checking. If it is the case, removing the conditions is not only more readable but also probably faster, as the calculation is straightforward arithmetic (better than adding a branch even if it is a no-op).
/*
* We want the capability base to be aligned. As we work from the
* last to the first argument, we can place the start of the string
* to be aligned for representability.
* All padding then goes after it, both for the representable
* length and for the alignment, to fill the space we left before
* the previous argument we put on the stack.
*
* This is slightly different than the usual way representable
* capabilities are created: usually we cannot change the address
* pointed by the capability, so there can be padding between the
* capability base and the capability value.
* Here we can decide where to place the string, so we build a
* capability with equal base and value, with all padding after.
*/
p = ALIGN_DOWN(p - repr_len, repr_align) + len;
- }
- return p;
+} +#endif /* CONFIG_CHERI_PURECAP_UABI */
/*
- 'copy_strings()' copies argument/environment strings from the old
- processes's memory to the new process's stack. The call to get_user_pages()
@@ -530,7 +570,7 @@ static int copy_strings(int argc, struct user_arg_ptr argv, while (argc-- > 0) { const char __user *str;
int len;
unsigned long pos;size_t len;
ret = -EFAULT; @@ -542,6 +582,18 @@ static int copy_strings(int argc, struct user_arg_ptr argv, if (!len) goto out; +#if defined(CONFIG_CHERI_PURECAP_UABI)
The nice thing with having moved everything to a function is that we can easily have a #else there, returning p unconditionally, so that we can avoid this #ifdef.
/*
* Handle all strings as if they were for purecap binaries as we only
* know the calling process compat status, which might be different
* from the binary to be exec'ed.
*
* Start directly at the string end, with all the padding already
* taken into account.
"taken into account" as in skipped? I'm not sure how to interpret this sentence.
Kevin
*/
bprm->p = align_p_for_exact_bounds(bprm->p, len);
+#endif
- ret = -E2BIG; if (!valid_arg_len(bprm, len)) goto out;
@@ -775,14 +827,14 @@ int setup_arg_pages(struct linux_binprm *bprm, if (vma->vm_end - vma->vm_start > stack_base) return -ENOMEM;
- stack_base = PAGE_ALIGN(stack_top - stack_base);
- stack_base = ALIGN(stack_top - stack_base, STACK_ALIGN_SIZE);
stack_shift = vma->vm_start - stack_base; mm->arg_start = bprm->p - stack_shift; bprm->p = vma->vm_end - stack_shift; #else stack_top = arch_align_stack(stack_top);
- stack_top = PAGE_ALIGN(stack_top);
- stack_top = ALIGN(stack_top, STACK_ALIGN_SIZE);
if (unlikely(stack_top < mmap_min_addr) || unlikely(vma->vm_end - vma->vm_start >= stack_top - mmap_min_addr))