Hi Kevin,
Thanks for the detailed review.
On 10/27/23 19:33, Kevin Brodsky wrote:
On 25/10/2023 07:36, Amit Daniel Kachhap wrote:
[...]
+/**
- reserv_vma_insert_entry() - Adds the reservation details in the VMA for the
- virtual address range from start to (start + len) with perm permission as
- the entry. The start address is expected to be CHERI representable base
- and the length may expand to CHERI representable length without interfering
- with the successive vma. This function should be called with mmap_lock
- held.
- @vma: VMA pointer to insert the reservation entry.
- @start: Reservation start value.
- @len: Reservation length.
- @perm: Memory mapping permission for the reserved range.
- Return: 0 if reservation entry added successfully or -ERESERVATION
otherwise.
- */
+int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start,
unsigned long len, unsigned long perm);
The name of this function doesn't really correspond to what it does any more. Since we store the reservation information directly in the VMA, there is nothing to "insert". Maybe something like reserv_vma_set_reserv()? Same comment regarding
Yes agreed. The above name was more appropriate for v1 and not v2.
reserv_range_insert_entry().
Also, as a general comment: for new code, it would be preferable to use ptraddr_t instead of unsigned long. Of course it's not always clear-cut, as we should try to be consistent with existing code too, but this being a whole new header I think using ptraddr_t throughout makes sense.
ok
[...]
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 12e87f83287d..38bad6b201ca 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -571,6 +571,11 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_CHERI_PURECAP_UABI
- unsigned long reserv_start;
- unsigned long reserv_len;
- unsigned long reserv_perm;
I've been wondering, are these fields automatically preserved when splitting or merging VMAs?
Yes these are preserved and if same then only vma's are merged.
By the way, the overall approach to manage reservations makes a lot more sense to me now, I think it's going in the right direction.
+#endif } __randomize_layout; #ifdef CONFIG_SCHED_MM_CID diff --git a/mm/Makefile b/mm/Makefile index e29afc890cde..c2ca31fe5798 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -39,7 +39,7 @@ mmu-y := nommu.o mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ msync.o page_vma_mapped.o pagewalk.o \
pgtable-generic.o rmap.o vmalloc.o
pgtable-generic.o rmap.o vmalloc.o cap_addr_mgmt.o
ifdef CONFIG_CROSS_MEMORY_ATTACH diff --git a/mm/cap_addr_mgmt.c b/mm/cap_addr_mgmt.c new file mode 100644 index 000000000000..fe318c92dc7a --- /dev/null +++ b/mm/cap_addr_mgmt.c @@ -0,0 +1,222 @@ +// SPDX-License-Identifier: GPL-2.0
+#include <linux/bug.h> +#include <linux/cap_addr_mgmt.h> +#include <linux/cheri.h> +#include <linux/mm.h> +#include <linux/slab.h>
+#ifdef CONFIG_CHERI_PURECAP_UABI
+int reserv_vma_insert_entry(struct vm_area_struct *vma, unsigned long start,
unsigned long len, unsigned long perm)
+{
- /* TODO [PCuABI] - capability permission conversion from memory permission */
- cheri_perms_t cheri_perms = CHERI_PERMS_READ | CHERI_PERMS_WRITE |
CHERI_PERMS_EXEC | CHERI_PERMS_ROOTCAP;
- if (is_compat_task() || !(vma->vm_flags & VM_PCUABI_RESERVE))
I've been pondering about VM_PCUABI_RESERVE. I think it's quite clear it can't be a permanent solution - in PCuABI, reservation management is required for all mappings, with no exception. I do however see the issue with vm_mmap() and friends. My feeling is that it is not worth trying to move to capability-based address space management in kernel subsystems, like binfmt_elf. The benefits are small and it seems that the hassle is quite high. On the other hand, reservation management can never be skipped. At the moment, it seems to me that the following approach would > be the most practical:
- Moving all capability operations to the top-level syscall handler (or
corresponding utility function, i.e. ksys_mmap_pgoff() for mmap). This way lower level helpers that can be called from the kernel too, such as vm_mmap(), keep operating purely on addresses and we don't have to worry about their callers. That should also mean fewer changes overall.
Yes agree with you here on skipping address space checks here. In the next iteration I will try moving capability checks in ksys_mmap_pgoff()
- The reservation management remains next to the mapping management,
like in these patches. It is done unconditionally in PCuABI.
The way binfmt_elf calls vm_mmap() is for each elf segment so the approach I am thinking is to defer the reservation till all the elf segments are mapped. In the end, call the api reserv_range_insert_entry() with the range including the complete elf segments. This api will also return a capability with the bound same as reservation range.
- No new vma->vm_flags. However, we could reduce the #ifdef'ing + checks
for compat by introducing a new mm->flags. Reservation management is a property of the address space, so it would be sensible to introduce a per-mm flag.
Any specific reason to avoid vm_flags? (Less free bits). vm_flags makes the vma operation simple to implement. Although vm_flags is still duplicate as the vma using reservation can be determined if it has the reservation fields initialized.
Does that sound sensible?
Yes most of it.
[...]
+bool reserv_vmi_range_fully_mapped(struct vma_iterator *vmi, unsigned long start,
unsigned long len)
+{
- unsigned long align_start, align_len;
- struct vm_area_struct *vma;
- if (is_compat_task())
return true;
- mas_set_range(&vmi->mas, start, start + len);
- /* Try finding the given range */
- vma = mas_find(&vmi->mas, start + len);
- if (!vma || !(vma->vm_flags & VM_PCUABI_RESERVE))
return true;
Surely if no VMA is found (!vma) then the range is fully *unmapped* and the answer is false?
Well agree that NULL vma is not appropriately reflected. I will change this interface
- align_start = round_down(vma->vm_start, max(PAGE_SIZE, CHERI_REPRESENTABLE_ALIGNMENT(len)));
- align_len = cheri_representable_length(round_up((vma->vm_end - vma->vm_start), PAGE_SIZE));
- /* Check if the range fully mapped */
- if (start != vma->vm_start || (start + len) != vma->vm_end ||
align_start != vma->reserv_start ||
align_len != vma->reserv_len)
This checks that the found VMA corresponds exactly to the specified range. What we actually want though is to check that the range is fully mapped, i.e. there exists *any number* of mappings spanning every single page.
Assuming this function is always used for enforcing rules in the spec such as:
If any part of AlignedRange(addr.address, length) is not currently
mapped, then the call fails with -ENOMEM.
Then reservation bounds are irrelevant - this is purely about mappings. Also note that AlignedRange() is about page alignment, not capability representation.
I will change the implementation as the current approach is too restrictive.
//Amit
Kevin