On 16/09/2022 12:14, Beata Michalska wrote:
Introduce new printk's pointer format extension to support capabilities, with one being denoted by specifying the 'l' length modifier alongside the 'p' type character => "%lp". In addition to the raw capability format, adding '#' flag will generate the simplified one, being slightly more verbose. In both cases, the actual result is subject to pointer hashing (capability value) unless either 'x' qualifier or no_hash_pointers command line parameter is being provided. For more details see the docs, updated here as well.
That sounds good. Could we also link to [1] (either here or in the documentation)? This provides the justification for the chosen format (i.e. we just use the accepted standard format).
[1] https://github.com/CTSRD-CHERI/cheri-c-programming/wiki/Displaying-Capabilit...
While at it, add some testcases to validate new format, alongside checkpatchpl new warning to raise the awareness of potential misuse.
Signed-off-by: Beata Michalskabeata.michalska@arm.com
Documentation/core-api/printk-formats.rst | 59 ++++++ lib/test_printf.c | 128 ++++++++++++ lib/vsprintf.c | 225 +++++++++++++++++++++- scripts/checkpatch.pl | 13 +- 4 files changed, 421 insertions(+), 4 deletions(-)
diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst index 5e89497ba314..f1e26cf41776 100644 --- a/Documentation/core-api/printk-formats.rst +++ b/Documentation/core-api/printk-formats.rst @@ -202,6 +202,65 @@ Example:: printk("test: difference between pointers: %td\n", ptr2 - ptr1); +Capabilities +------------
+::
- %lp[x] 1:ffffc00000010005:0123456789abcdef
This is a departure from [1] (where only the address is displayed in the basic format), but I agree that it makes sense to display the entire value in the context of the kernel. There's also the problem that there is no agreed specifier to display the verbose format... So I guess this is a reasonable compromise.
- %#lp[x] 0123456789abcdef[rwxRWE,0000000000000000,ffffffffffffffff]
Space before "[", and "-" between the bounds instead of "," to match the format below.
Additionally, to match [1], all values should be preceded by "0x".
+For printing CHERI model-based architectural capabilities (when supported,
I think I know what you're getting at with "CHERI model-based", but I would probably just say "CHERI" to keep it simple.
+otherwise the result of using it is undefined). +The ``l`` modifier distinguishes pointers as capabilities from C integer pointers +(when in hybrid mode, in which case capabilities and pointers are not
"when in hybrid mode" is rather confusing when we don't have any other mode at the moment, could we just say "... distinguishes capability pointers from standard (integer) pointers" without the parentheses? I would also remove references to hybrid mode in other places.
+interchangeable). Provided argument will be subject to hashing (capability vale
s/vale/value/. In fact I would recommend saying "address" rather, as there is a long-standing argument on using "value" to mean "address" - that's specific to Morello, whereas in CHERI publications (such as the ISA), "capability value" means the entire capability.
+only, with remaining bits being disregarded) before printing, unless ``x`` +modifier is being specified as well, which results in the following capability +format being printed:
Could we move the discussion on hashing after the format, as you already have a note there? Otherwise you would have to mention "no_hash_pointers" here as well and it gets confusing.
Also I think you will want to use :: in order to have verbatim formatting below, make sure to check the rendering on GitLab (for the next version it would be a good idea to push the patch to a branch to look at the rendered version).
Capability tag:Hight bits:Low bits
s/Hight/High/
------------------------------------
<tag>:<127:64>:<63:0>
+Same considerations apply here, as when printing unmodified addresses.
Not sure I get the point, is this about hashing as well?
+Specifying ``#`` modifier results in printing capabilities in a simplified format:
<address> [<permissions>,<base>-<top>] (<attr>)
+with::
- permissions - none or any combination of:
- 'r' : CHERI_PERM_LOAD
- 'w' : CHERI_PERM_STORE
- 'x' : CHERI_PERM_EXECUTE
- 'R' : CHERI_PERM_LOAD_CAP
- 'W' : CHERI_PERM_STORE_CAP
- 'E' : ARM_CAP_PERMISSION_EXECUTIVE
- base - lower bound
- top - upper bound
- attr - none or any combination of:
'invalid' - capability's tag is clear
'sentry' - capability is a sealed entry
'sealed' - capability is sealed with a type other than
the sealed entry object type.
+(the above applies to formats with either `'no_hash_pointers'` parameter being +enabled or ``x`` modifier being provided - otherwise subject to hashing.
+Currently only Morello architecture is being supported.
+Note: +In hybrid mode, capabilities are being passed as a reference.
"by reference" (otherwise it could mean something else). Is this because of complications with the hybrid PCS? That's rather unfortunate as it creates another difference with [1].
Actually reading the actual code I don't see that "pass by reference" happening, you're using a va_args(args, void * __capability) so surely this is just passed by value?
An attempt to +use this format with a non-capability type is undefined behaviour. +There are no safety nests, so extra consciousness is advised! Handle with care.
+Same applies when capabilities are not supported, as the use of either the ``#`` +flag or the ``l`` length modifier is undefined behaviour when combined with +%p.
Struct Resources
diff --git a/lib/test_printf.c b/lib/test_printf.c index 07309c45f327..e3bbb83c439e 100644 --- a/lib/test_printf.c +++ b/lib/test_printf.c @@ -24,6 +24,10 @@ #include <linux/property.h> +#ifdef __CHERI__ +#include <cheriintrin.h> +#endif
- #include "../tools/testing/selftests/kselftest_module.h"
#define BUF_SIZE 256 @@ -755,6 +759,127 @@ errptr(void) #endif } +static void __init +capability_pointer(void) +{ +#ifdef CONFIG_ARM64_MORELLO
- enum action {
CAP_AXN_NONE,
CAP_TAG_CLEAR,
CAP_NULL_DERIVE
- };
- struct __setup {
const char * const plain_fmt;
const char * const fmt;
const char * const expected;
int action;
- } const setup[] = {
{ "%lp", "%lpx", "1:da00400059ab89ab:"PTR_STR, CAP_AXN_NONE },
{ "%#lp", "%#lpx",
PTR_STR" [rw-RW-,ffff0123456789ab,ffff0123456799ab]",
CAP_AXN_NONE
},
{ "%lp", "%lpx", "0:da00400059ab89ab:"PTR_STR, CAP_TAG_CLEAR },
{ "%#lp", "%#lpx",
PTR_STR" [rw-RW-,ffff0123456789ab,ffff0123456799ab] (invalid)",
CAP_TAG_CLEAR
},
{ "%lp", "%lpx", PTR_STR, CAP_NULL_DERIVE },
{ "%#lp", "%#lpx", PTR_STR, CAP_NULL_DERIVE }
- };
- char buf[PLAIN_BUF_SIZE];
- const char * const cap_fmt[] = {"%lp", "%#lp"};
- int nchars;
- void *__capability __cap = kaddr_to_user_ptr((ptraddr_t)PTR);
- /* Expecting basic permissions to be set */
- size_t perms = CHERI_PERM_GLOBAL | CHERI_PERM_LOAD | CHERI_PERM_LOAD_CAP |
CHERI_PERM_STORE | CHERI_PERM_STORE_CAP | CHERI_PERM_SEAL;
- size_t size = 4096;
- __cap = cheri_perms_and(__cap, perms);
- /*
* Basic checks so that the actual output can be safely compared
* to the expected one
*/
- if (!cheri_tag_get(__cap) || cheri_perms_get(__cap) != perms
|| cheri_is_sealed(__cap)) {
pr_warn("Failed to create capability for testing - skipping");
skipped_tests += no_hash_pointers ? 4 + ARRAY_SIZE(setup) :
4 + ARRAY_SIZE(setup)/2;
return;
- }
- __cap = cheri_bounds_set_exact(__cap, size);
- /* Verify hashing */
- if (no_hash_pointers) {
pr_warn("Skipping capability hashing tests\n");
skipped_tests += 4;
goto non_hashed;
- }
- for (int i = 0; i < ARRAY_SIZE(cap_fmt); ++i) {
nchars = snprintf(buf, PLAIN_BUF_SIZE, cap_fmt[i], __cap);
/*
* Should be ADDR_WIDTH in this case , but hey, let's not be picky
* about it, at least not here ... not yet ...
*/
if (nchars != PTR_WIDTH) {
/* This also covers the case when the value has not been
* actually hashed as otherwise nchars would be greater than
* PTR_WIDTH
*/
pr_warn("Something went wrong with hashing capability value\n");
goto failed;
}
if (strncmp(buf, PTR_VAL_NO_CRNG, PTR_WIDTH) == 0) {
pr_warn("crng possibly not yet initialized - capability value buffer contains \"%s\"",
PTR_VAL_NO_CRNG);
} else if (strncmp(buf, ZEROS, strlen(ZEROS)) != 0) {
pr_warn("Unexpected format for supposedly hashed capability value\n");
goto failed;
}
- }
+non_hashed:
- for (int i = 0; i < ARRAY_SIZE(setup); ++i) {
void *__capability __c = __cap;
switch (setup[i].action) {
case CAP_TAG_CLEAR:
__c = cheri_tag_clear(__c);
break;
case CAP_NULL_DERIVE:
/* Null-derived capability */
__c = (void *__capability)
(__cheri_addr uintcap_t)((void *)cheri_base_get(__cap));
break;
default:
break;
}
if (no_hash_pointers)
test(setup[i].expected, setup[i].plain_fmt, __c);
else
++skipped_tests;
test(setup[i].expected, setup[i].fmt, __c);
- }
- return;
+failed:
- failed_tests++;
+#endif /* CONFIG_ARM64_MORELLO */ +}
- static void __init test_pointer(void) {
@@ -781,6 +906,9 @@ test_pointer(void) errptr(); fwnode_pointer(); fourcc_pointer(); +#ifdef __CHERI__
Considering the function is always defined, the #ifdef is probably not needed.
Aside from that I haven't got round to looking at the test yet, I might wait for v2 to look at it.
- capability_pointer();
+#endif } static void __init selftest(void) diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 40d26a07a133..9dcdb364c0ba 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -51,6 +51,10 @@ #include <asm/byteorder.h> /* cpu_to_le16 */ #include <asm/unaligned.h> +#ifdef __CHERI__ +#include <cheriintrin.h> +#endif
- #include <linux/string_helpers.h> #include "kstrtox.h"
@@ -426,6 +430,9 @@ enum format_type { FORMAT_TYPE_PERCENT_CHAR, FORMAT_TYPE_INVALID, FORMAT_TYPE_LONG_LONG, +#ifdef __CHERI__
- FORMAT_TYPE_CAPABILITY,
+#endif
Nit: the position in the enum seems to be somewhat arbitrary, couldn't it be added at the end?
FORMAT_TYPE_ULONG, FORMAT_TYPE_LONG, FORMAT_TYPE_UBYTE, @@ -2401,6 +2408,7 @@ static noinline_for_stack char *pointer(const char *fmt, char *buf, char *end, void *ptr, struct printf_spec spec) {
Spurious change.
switch (*fmt) { case 'S': case 's': @@ -2488,6 +2496,203 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr, } } +#ifdef __CHERI__ +/*
- Support for printing capabilities with %[#]lp[x] format.
- It stands slightly in contradiction to kernel extensions
- for pointer printk formats as it is using the 'l' modifier
- instead of relying on extended format specifiers.
- Currently supprted formats:
- lp[x] - Hex value of full capability bits preceeded with capability tag:
<tag>:<127:64>:<63:0>
(* outcome subject to kernel pointer hashing )
- #lp[x] - Simplified format as:
<address> [permissions=[rwxRWE],<base>-<top>] (attr=[invalid,sentry,sealed])
(* outcome subject to kernel pointer hashing )
- Details at:
- Documentation/core-api/printk-formats.rst
- (keep both updated case making any changes)
- only the capability value (aka adress) undergoes hashing,
s/adress/address/ and same comment on "value" as above.
remaining bits are being disregarded
- */
+static noinline_for_stack +char *capability(const char *fmt, char *buf, char *end, void *__capability cap,
Nit: space between * and __capability, that's what we've used so far. Same in another place below.
struct printf_spec spec)
The indentation doesn't match.
+{ +/*
- Write to the buffer only when there is a space for it, otherwise
- just advance the buffer marker to account for the space needed to
- store full value here
- */
+#define update_buf_single(__b, __e, __c) \
Do we really need to prefix arguments with __ in an internal macro?
- do { if ((__b) < (__e)) *(__b)++ = (__c); else ++__b; } while (0)
There's some inconsistency on the usage of parentheses (the last __b doesn't have them). OTOH if you pass anything but a variable name as __b things can go horribly wrong as it's being evaluated multiple times. It's an internal macro so it doesn't matter, but maybe it's even better *not* to parenthesise __b so that it's clear that it's absolutely not safe to pass anything non-trivial as __b.
+#define show_tag(__cap) \
This is only used once so no need to make it a macro.
- do { \
update_buf_single(buf, end, cheri_tag_get(__cap) ? '1' : '0'); \
update_buf_single(buf, end, ':'); \
- } while (0)
- /*
* For null-derived capabilities switch to basic format
* (address only).
* Same applies when hashing is active.
*/
- if ((!cheri_tag_get(cap) &&
+#if __has_builtin(__builtin_cheri_copy_from_high)
We already use it unconditionally in some kselftest so I don't think we need to complicate things here.
!__builtin_cheri_copy_from_high(cap))
+#else
!((*((ptraddr_t *)&cap + 1)))
+#endif
|| (likely(!no_hash_pointers) && *fmt != 'x')) {
/* Avoid adding prefix */
spec.flags &= ~SPECIAL;
return pointer(fmt, buf, end, (__cheri_fromcap void*)(cap), spec);
- }
- if (spec.flags & SPECIAL) { /* Simplified format for capabilities */
unsigned long long __data = cheri_perms_get(cap);
In terms of readability I think we would win by declaring here one local variable for each "element" (base, top, perms), using the most appropriate type (ptraddr_t, cheri_perms_t).
int field_width = spec.field_width;
I originally thought this was just a convenience alias, but then realised it is used to restore the original value. Something like "orig_field_width" would make that clearer. Same comment for flags.
const char* start = buf;
The position of * is rather inconsistent in multiple places. The kernel style is always "char *p", i.e. a space before but not after.
char * attr_start;
unsigned int i;
int attrib = 0;
int flags;
/* Note: order matters to match the format required */
I wouldn't say "required" as [1] doesn't say anything about the order. "expected" would be more accurate (the user does expect rwxRWE to appear in that order).
struct {
unsigned long long cperm; const char id;
s/const char/char/ (the whole array is const already)
} static const __perms[] = {
{ CHERI_PERM_LOAD, 'r' },
{ CHERI_PERM_STORE, 'w' },
{ CHERI_PERM_EXECUTE, 'x' },
{ CHERI_PERM_LOAD_CAP, 'R' },
{ CHERI_PERM_STORE_CAP, 'W' },
+#ifdef CONFIG_ARM64_MORELLO
{ ARM_CAP_PERMISSION_EXECUTIVE, 'E' }
+#endif
};
/*
* Clear the 'SPECIAL' flag as that one results in adding '0x'
* prefix which should be skiped here to align with the
* way pointers are being handled.
*/
spec.flags &= ~SPECIAL;
/*
* Things get slightly confusing here when it comes to
* precision. The standard format specification claims that
* precision might cause truncation of the actual output,
* contrary to width.
* CHERI on the other hand suggests, that precision used
* with the 'p' specifier should determine the width of
* addresses though following current formating for pointer types,
"formatting"
* unless explicitely stated, the width will be enforced, leaving
* precision being ineffective. Furthermore, according to the guide,
* in theory recision should not affect the way the Raw and Simplified
"precision"
* formats are being handled (the non-address related parts).
* Stick with that one.
* Note: Using precision with pointers/capabilities with
* active hashing might lead to slightly unexpected output.
* That's the result of applying the precision while
* respecting the field width.
*
* Width in this case refers to the width of final string generated for
* the simplified format
*/
spec.field_width = -1;
flags = spec.flags;
spec.flags &= ~(ZEROPAD | LEFT);
buf = pointer(fmt, buf, end, (__cheri_fromcap void*)cap, spec);
update_buf_single(buf, end, ' ');
update_buf_single(buf, end, '[');
for (i = 0; i < ARRAY_SIZE(__perms); ++i) {
update_buf_single(buf, end,
(__data & __perms[i].cperm ? __perms[i].id : '-'));
I understand the motivation, have a fixed-width permission field, but I think we should stick to [1] and only print something for the permissions that are present.
}
update_buf_single(buf, end, ',');
__data = cheri_base_get(cap);
buf = pointer(fmt, buf, end, (void *)__data, spec);
update_buf_single(buf, end, ',');
__data += cheri_length_get(cap);
buf = pointer(fmt, buf, end, (void *)__data, spec);
update_buf_single(buf, end, ']');
/* Attributes */
/* Reset precision to output full attribute identifiers here */
spec.precision = -1;
/* Keep track of the attribute start section to format
* it properly in case there are attributes to be reported.
* Otherwise simply rolling on a buffer might lead to overflowing
* past the terminating character if the buffer is big enough.
Smart, I like this, "speculative write" without temporary buffer :D
*/
attr_start = buf;
buf +=2;
if (!cheri_tag_get(cap)) {
buf = string_nocheck(buf, end, "invalid", spec);
++attrib;
}
if (cheri_is_sentry(cap)) {
if (attrib++)
update_buf_single(buf, end, ',');
buf = string_nocheck(buf, end, "sentry", spec);
}
if (cheri_is_sealed(cap)) {
if (attrib++)
update_buf_single(buf, end, ',');
buf = string_nocheck(buf, end, "sealed", spec);
}
if (attrib) {
update_buf_single(attr_start, end, ' ');
update_buf_single(attr_start, end, '(');
update_buf_single(buf, end, ')');
} else {
buf = attr_start; /* Rollback on space and opening bracket */
}
/* Restore the originally requested width */
spec.field_width = field_width;
spec.flags |= flags;
return widen_string(buf, buf - start, end, spec);
- }
- /*
* Raw format: hex dump of full capability
*/
- show_tag(cap);
+#if __has_builtin(__builtin_cheri_copy_from_high)
- buf = pointer_string(buf, end,
(void *) __builtin_cheri_copy_from_high(cap),
spec);
+#else
- buf = pointer_string(buf, end,
(void *)(*((ptraddr_t *)&cap + 1)), spec);
+#endif
- update_buf_single(buf, end, ':');
- return pointer_string(buf, end, (__cheri_fromcap void*)(cap),
spec);
+#undef show_tag +#undef update_buf_single +} +#endif /* __CHERI__ */
- /*
- Helper function to decode printf style format.
- Each call decode a token from the format and return the
@@ -2623,7 +2828,11 @@ int format_decode(const char *fmt, struct printf_spec *spec)
As this is also used in vbin_printf() and bstr_printf(), I am somewhat concerned as to what happens there. I don't see a need to support %lp there but do things explode if you try to? Considering that the latter is accessible from BPF we might have to be a bit careful...
return ++fmt - start;
case 'p':
spec->type = FORMAT_TYPE_PTR;
spec->type =
+#ifdef __CHERI__
(qualifier == 'l') ? FORMAT_TYPE_CAPABILITY :
+#endif
Nit: not a big fan of that kind of #ifdef'ing, how about leaving the original line unchanged and overriding spec->type inside #ifdef __CHERI__? Alternatively this would also work:
#ifdef __CHERI__ if (...) spec->type = FORMAT_TYPE_CAPABILITY; else #endif spec->type = FORMAT_TYPE_PTR;
(but not quite as nice because of the dangling indentation).
return ++fmt - start;FORMAT_TYPE_PTR;
case '%': @@ -2717,6 +2926,7 @@ set_precision(struct printf_spec *spec, int prec)
- ``%n`` is unsupported
- ``%p*`` is handled by pointer()
- ``%lp[x]`` and ``%#lp[x]`` is handled by capability()
- See pointer() or Documentation/core-api/printk-formats.rst for more
- extensive description.
@@ -2818,7 +3028,18 @@ int vsnprintf(char *buf, size_t size, const char *fmt, va_list args) *str = '%'; ++str; break;
+#ifdef __CHERI__
case FORMAT_TYPE_CAPABILITY:
if (IS_BUILTIN(CONFIG_ARM64_MORELLO)) {
Do we actually need this? capability() is defined as long as __CHERI__ is, and while it might not be perfect for non-Morello architectures it seems better than nothing. Besides if we can reduce the number of references to arch configs in generic code it's always a win.
str = capability(fmt, str, end,
va_arg(args, void *__capability),
spec);
while (isalnum(*fmt))
fmt++;
break;
}
fallthrough;
+#endif case FORMAT_TYPE_INVALID: /* * Presumably the arguments passed gcc's type diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 577e02998701..0c5468f0894c 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -6760,7 +6760,7 @@ sub process { my $fmt = get_quoted_string($lines[$count - 1], raw_line($count, 0)); $fmt =~ s/%%//g;
while ($fmt =~ /(\%[\*\d\.]*p(\w)(\w*))/g) {
while ($fmt =~ /(\%[\*\d\.#l]*p(\w)(\w*))/g) { $specifier = $1; $extension = $2; $qualifier = $3;
@@ -6768,7 +6768,8 @@ sub process { ($extension eq "f" && defined $qualifier && $qualifier !~ /^w/) || ($extension eq "4" &&
defined $qualifier && $qualifier !~ /^cc/)) {
defined $qualifier && $qualifier !~ /^cc/) ||
($specifier =~ /lp/ && $extension ne "x")) {
Should be at the previous level of indentation, otherwise it looks nested under $extension == 4.
$bad_specifier = $specifier; last; }
@@ -6780,6 +6781,14 @@ sub process { "Using vsprintf specifier '%px' potentially exposes the kernel memory layout, if you don't really need the address please consider using '%p'.\n" . "$here\n$stat_real\n"); } }
if ($fmt =~ /\%[\*\d\.#]*l{1}p/) {
Rather confused as to what {1} is supposed to do here, if it's what I think it would mean that the previous character is there exactly once, which is the same as not having {1} at all?
Kevin
my $stat_real = get_stat_real($linenr, $lc);
WARN("VSPRINTF_POINTER_EXTENSION",
"Using vsprintf specifier '\%[#]lp[x]' is undefined behaviour when used with non-capability types in hybrid-mode. Make sure you got it right.\n" . "$here\n$stat_real\n");
}
if ($bad_specifier ne "") { my $stat_real = get_stat_real($linenr, $lc); my $ext_type = "Invalid";