This symbol is defined in a binary when there is a segment which
contains both the file header and the program header. The symbol points
at the file header. The point of this symbol is to allow the program to
robustly examine its own output.
Glibc uses this symbol. This symbol is currently not marked as a
linker or linker script defined symbol, and hence does not get its
bounds adjusted. The symbol is given zero size, and consequently any
capability initialised as a relocation to this symbol is given zero
bounds.
In order to allow access to read the headers this symbol points at this
patch adds a size to the symbol.
We do not believe that the size of this symbol is used for anything
other than CHERI bounds, so we believe that this is a safe change to
make. Setting the size of the symbol means that c64_fixup_frag uses
that size as the bounds to apply to a capability relocation pointing at
that symbol. This allows access to the file and program headers loaded
into memory.
An alternative approach would be to *not* set the size of the symbol,
but only change the bounds of the relocation generated. This would be
done by checking for the `__ehdr_start' name in c64_fixup_frag and
setting the size according to the `sizeof_ehdr' and
`elf_program_header_size' values stored on the output BFD object.
We chose the approach to set the size on the symbol for code-aesthetic
reasons under the belief that having this size on the symbol in the
final binary is a slight benefit in readability for a user and causes no
downside.
I do not believe that Morello lld sets the bounds of a capability to this
symbol correctly. That issue has been raised separately.
############### Attachment also inlined for ease of reply
###############
diff --git a/bfd/elf.c b/bfd/elf.c
index
9a5b472cda2c953cae40d2dfcbce493048ed8168..279f15a3aefaa6e9048ee93a54575a2a06004ae1
100644
--- a/bfd/elf.c
+++ b/bfd/elf.c
@@ -6028,6 +6028,7 @@ assign_file_positions_for_load_sections (bfd *abfd,
hash->root.u.def.section = bfd_abs_section_ptr;
}
+ hash->size = bed->s->sizeof_ehdr + elf_program_header_size (abfd);
hash->root.type = bfd_link_hash_defined;
hash->def_regular = 1;
hash->non_elf = 0;
diff --git a/ld/testsuite/ld-aarch64/aarch64-elf.exp
b/ld/testsuite/ld-aarch64/aarch64-elf.exp
index
f0d2048efc37c2df3f4dfea304c7f93f8fe3a169..adb0081720a48c984e246b838d6ccd0922fa1306
100644
--- a/ld/testsuite/ld-aarch64/aarch64-elf.exp
+++ b/ld/testsuite/ld-aarch64/aarch64-elf.exp
@@ -258,6 +258,7 @@ run_dump_test_lp64 "morello-sizeless-local-syms"
run_dump_test_lp64 "morello-sizeless-global-syms"
run_dump_test_lp64 "morello-sizeless-got-syms"
run_dump_test_lp64 "morello-disallow-merged-binaries"
+run_dump_test_lp64 "c64-ehdr-sized-reloc"
run_dump_test_lp64 "morello-capinit"
run_dump_test_lp64 "morello-stubs"
diff --git a/ld/testsuite/ld-aarch64/c64-ehdr-sized-reloc.d
b/ld/testsuite/ld-aarch64/c64-ehdr-sized-reloc.d
new file mode 100644
index
0000000000000000000000000000000000000000..41c3cffa32f6bc494e1ef282319331e794b07abe
--- /dev/null
+++ b/ld/testsuite/ld-aarch64/c64-ehdr-sized-reloc.d
@@ -0,0 +1,18 @@
+#as: -march=morello+c64
+#ld: -shared
+#objdump: -dR -j .data
+
+.*: file format .*
+
+
+Disassembly of section \.data:
+
+00000000000[[:xdigit:]]* <val>:
+ ...
+ [[:xdigit:]]*: R_MORELLO_RELATIVE \*ABS\*
+# Want to check that the size is non-zero.
+# Check that using a negative line match to a zero size.
+# In fact, when this size is zero objdump doesn't even print a line
here, but
+# that just adds extra robustness to our check.
+! .*: 00000000 \.word 0x00000000
+ .*: 01000000 .*
diff --git a/ld/testsuite/ld-aarch64/c64-ehdr-sized-reloc.s
b/ld/testsuite/ld-aarch64/c64-ehdr-sized-reloc.s
new file mode 100644
index
0000000000000000000000000000000000000000..3b750cfbd499d9babd25d1a8fb0b1d573726a855
--- /dev/null
+++ b/ld/testsuite/ld-aarch64/c64-ehdr-sized-reloc.s
@@ -0,0 +1,11 @@
+ .data
+ .global val
+val:
+ .chericap __ehdr_start
+ .size val, .-val
+
+ .align 4
+ .text
+ .global _start
+_start:
+ ldr c0, [c0, :got_lo12:val]
This was newly failing because it was checking for a value *without* the
LSB set. In a recent commit we have fixed the bug which lost the LSB,
and that caused this test to fail.
Here we use the new testsuite implementation to test for "one plus the
location" rather than "one of the values A, B, C, ...", which is a
better representation of what we're trying to check.
############### Attachment also inlined for ease of reply
###############
diff --git a/ld/testsuite/ld-aarch64/c64-ifunc-2.d
b/ld/testsuite/ld-aarch64/c64-ifunc-2.d
index
b87908b27a4366a181ddee8b7e4b7343ebd3f4b7..2954f67b0c50ece30e2e7a002fc7e03f2afb9f4c
100644
--- a/ld/testsuite/ld-aarch64/c64-ifunc-2.d
+++ b/ld/testsuite/ld-aarch64/c64-ifunc-2.d
@@ -4,10 +4,17 @@
#objdump: -dw
#source: ifunc-2.s
+#record: INDIRECT_LOC FOO_LOCATION
#...
-0+(130|1a0|1c8|1e0) <foo>:
+0+([0-9a-f]{3}).*0x([0-9a-f]{3})@plt>:
#...
-[ \t0-9a-f]+:[ \t0-9a-f]+bl[ \t0-9a-f]+<\*ABS\*\+0x(130|1a0|1c8|1e0)@plt>
+Disassembly of section \.text:
+
+#check: FOO_LOC string tolower $FOO_LOCATION
+#check: INDIRECT_POS format %x [expr "0x$INDIRECT_LOC + 1"]
+0+FOO_LOC <foo>:
+#...
+[ \t0-9a-f]+:[ \t0-9a-f]+bl[ \t0-9a-f]+<\*ABS\*\+0xFOO_LOC@plt>
[ \t0-9a-f]+:[ \t0-9a-f]+adrp[ \t]+c0, 0 <.*>
-[ \t0-9a-f]+:[ \t0-9a-f]+add[ \t]+c0, c0, #0x(120|190|1b8|1d0)
+[ \t0-9a-f]+:[ \t0-9a-f]+add[ \t]+c0, c0, #0xINDIRECT_POS
#pass
Really don't like that we use hard-coded addresses. There are examples
in the existing testsuite that use options of hard-coded addresses, but
I want something more general that we can actually test the things we
need to test with.
Here we add an initial implementation to do such a thing.
This initial implementation has quite a lot of problems, but it adds
a lot in the fact that we can write testcases which should work across
different setups.
Hopefully we can work out the problems with use (or maybe identify that
the problems don't actually matter very much in practice) and eventually
upstream something better.
To document the problems:
- The implementation means that recording something actually puts that
into the regexp_diff namespace which could shadow existing variables.
- We don't have a way to say "the previous value", but always have to
write some TCL procedure to return that previous value.
############### Attachment also inlined for ease of reply
###############
diff --git a/binutils/testsuite/lib/binutils-common.exp
b/binutils/testsuite/lib/binutils-common.exp
index
b9a1e6e4bc0c8644a3273a8532088ed05eb4fcea..efa476ee7a215fd3fd20b5995d2f64bc4d03ddbf
100644
--- a/binutils/testsuite/lib/binutils-common.exp
+++ b/binutils/testsuite/lib/binutils-common.exp
@@ -363,6 +363,23 @@ proc check_relro_support { } {
# Optionally match REGEXP against line from FILE_1. If the REGEXP
# does not match then the next line from FILE_2 is tried.
#
+# #record: <names>
+# Sets names under which to record matched subexpressions in the regexp
+# on the next line. Use all uppercase variable names to avoid
+# interacting with local variables in the given function.
+# N.b. PREVMATCH is always set to the *entire* of the previous match
+# (whether or not said match was directly after a #record line).
+#
+# #check: <name> <substitution>
+# Replaces any occurance of <name> in the following regexp lines
with the
+# result of evaluating the string <substitution> in TCL. Often
used in
+# combination with #record to set variables for future use in the
+# <substitution> field.
+#
+# #clearcheck
+# Clears all extra substitutions added with #check for future regexp
+# lines.
+#
# Other # lines are comments. Regexp lines starting with the `!'
character
# specify inverse matching (use `\!' for literal matching against a
leading
# `!'). Skip empty lines in both files.
@@ -383,6 +400,12 @@ proc regexp_diff { file_1 file_2 args } {
set diff_pass 0
set fail_if_match 0
set ref_subst ""
+
+ set PREVMATCH ""
+ set extra_vars ""
+ # set STRPOS "uninitialised"
+ set extra_subst ""
+
if { [llength $args] > 0 } {
set ref_subst [lindex $args 0]
}
@@ -440,14 +463,19 @@ proc regexp_diff { file_1 file_2 args } {
foreach {name value} $ref_subst {
regsub -- $name $line_bx $value line_bx
}
+ foreach {name value} $extra_subst {
+ set value [expr $value];
+ regsub -- $name $line_bx $value line_bx
+ }
verbose "looking for $n\"^$line_bx$\"" 3
- while { [expr [regexp "^$line_bx$" "$line_a"] == $negated] } {
+ while { [expr [regexp "^$line_bx$" "$line_a" PREVMATCH
{*}$extra_vars] == $negated] } {
verbose "skipping \"$line_a\"" 3
if { [gets $file_a line_a] == $eof } {
set end_1 1
break
}
}
+ set extra_vars ""
break
} elseif { [string match "#\\?*" $line_b] } {
if { ! $end_1 } {
@@ -459,17 +487,46 @@ proc regexp_diff { file_1 file_2 args } {
foreach {name value} $ref_subst {
regsub -- $name $line_bx $value line_bx
}
+ foreach {name value} $extra_subst {
+ set value [expr $value];
+ regsub -- $name $line_bx $value line_bx
+ }
verbose "optional match for $n\"^$line_bx$\"" 3
- if { [expr [regexp "^$line_bx$" "$line_a"] != $negated] } {
+ if { [expr [regexp "^$line_bx$" "$line_a" PREVMATCH
{*}$extra_vars] != $negated] } {
+ # Choice here between having #?<regexp> *always* clear
+ # the extra_vars, or only clear the extra_vars if it
+ # actually matched. Right now have no use case for
+ # extra_vars combined with the #?<regexp> pattern.
+ # Currently choosing to have this only clear the
+ # extra_vars if the line actually matched, but could
+ # happily change later on if needs be.
+ set extra_vars ""
break
}
}
+ } elseif { [string match "#record: *" $line_b] } {
+ if { ! $end_1 } {
+ set extra_vars [concat [string range $line_b 9 end]]
+ }
+ } elseif { [string match "#clearcheck*" $line_b] } {
+ if { ! $end_1 } {
+ set extra_subst ""
+ }
+ } elseif { [string match "#check: *" $line_b] } {
+ if { ! $end_1 } {
+ set value [lindex [regexp -inline "#check: (\\S+).*" $line_b] 1]
+ lappend extra_subst $value
+ lappend extra_subst [string range $line_b [expr 8+[string length
$value]] end]
+ }
+ # send_user "extra_subst is now: $extra_subst\n"
}
if { [gets $file_b line_b] == $eof } {
set end_2 1
break
}
}
+ # send_user "STRPOS is $STRPOS\n"
+ # send_user "$line_b\n"
if { $diff_pass } {
break
@@ -494,13 +551,21 @@ proc regexp_diff { file_1 file_2 args } {
foreach {name value} $ref_subst {
regsub -- $name $line_bx $value line_bx
}
+ foreach {name value} $extra_subst {
+ set value [eval $value];
+ # send_user "match: $name\n"
+ # send_user "replacement: $value\n"
+ regsub -- $name $line_bx $value line_bx
+ }
+ # send_user "checking against $line_bx\n"
verbose "regexp $n\"^$line_bx$\"\nline \"$line_a\"" 3
- if { [expr [regexp "^$line_bx$" "$line_a"] == $negated] } {
+ if { [expr [regexp "^$line_bx$" "$line_a" PREVMATCH
{*}$extra_vars] == $negated] } {
send_log "regexp_diff match failure\n"
send_log "regexp $n\"^$line_bx$\"\nline $s\"$line_a\"\n"
verbose "regexp_diff match failure\n" 3
set differences 1
}
+ set extra_vars ""
}
}
diff --git a/ld/testsuite/ld-aarch64/emit-relocs-morello-2.d
b/ld/testsuite/ld-aarch64/emit-relocs-morello-2.d
index
fe59dee85f7bbbc8a11ca36168068ae3dfbd1564..c5eebec4e1af2e1d6003138da91f912c0db6ac60
100644
--- a/ld/testsuite/ld-aarch64/emit-relocs-morello-2.d
+++ b/ld/testsuite/ld-aarch64/emit-relocs-morello-2.d
@@ -23,7 +23,8 @@ Disassembly of section .got:
Disassembly of section .data:
-0000000000010360 <str>:
+#record: STRPOS
+(0000000000010360|0000000000010380) <str>:
.*: 6c6c6548 .*
.*: 6874206f .*
.*: 20657265 .*
@@ -34,11 +35,12 @@ Disassembly of section .data:
.*: R_AARCH64_RELATIVE \*ABS\*\+.*
.* <ptr>:
-.*: 00010360 .*
+#check: SHORTSTR string range $STRPOS end-7 end
+.*: SHORTSTR .*
...
.* <cap>:
-.*: 00010360 .*
+.*: SHORTSTR .*
.*: R_MORELLO_RELATIVE \*ABS\*
.*: 00000000 .*
.*: 0000001b .*
There is special handling to ensure that symbols which look like they
are supposed to point at the start of a section are given a size to span
that entire section.
GNU ld has special `start_stop` symbols which are automatically provided
by the linker for sections where the output section and input section
share a name and that name is representable as a C identifier.
(see commit cbd0eecf2)
These special symbols represent the start and end address of the output
section. These special symbols are used in much the same way in source
code as section-start symbols provided by the linker script. Glibc uses
these for the __libc_atexit section containing pointers for functions to
run at exit.
This change accounts for these `start_stop` symbols by giving them the
size of the "remaining" range of the output section in the same way as
linker script defined symbols. This means that the `start` symbols get
section-spanning bounds and the `stop` symbols get bounds of zero.
N.b. We will have to also account for these symbols in the
`resize_sections` function, but that's not done yet.
############### Attachment also inlined for ease of reply
###############
diff --git a/bfd/elfnn-aarch64.c b/bfd/elfnn-aarch64.c
index
a33aea0eab02ac97cb605cac677d8ac27a475967..9a9bd46f4579d6ad1ec0d2a7c8df3b7d7ba1000a
100644
--- a/bfd/elfnn-aarch64.c
+++ b/bfd/elfnn-aarch64.c
@@ -6456,6 +6456,15 @@ c64_symbol_section_adjustment (struct
elf_link_hash_entry *h, bfd_vma value,
}
return C64_SYM_LDSCRIPT_DEF;
}
+
+ if (h->start_stop)
+ {
+ asection *s = h->u2.start_stop_section->output_section;
+ BFD_ASSERT (s != NULL);
+ *ret_sec = s;
+ return C64_SYM_LDSCRIPT_DEF;
+ }
+
return C64_SYM_STANDARD;
}
diff --git a/ld/testsuite/ld-aarch64/aarch64-elf.exp
b/ld/testsuite/ld-aarch64/aarch64-elf.exp
index
49b2e70adca947b61294bf1d589ce366b81da569..f0d2048efc37c2df3f4dfea304c7f93f8fe3a169
100644
--- a/ld/testsuite/ld-aarch64/aarch64-elf.exp
+++ b/ld/testsuite/ld-aarch64/aarch64-elf.exp
@@ -271,6 +271,7 @@ run_dump_test_lp64 "morello-sec-round-include-relro"
run_dump_test_lp64 "morello-pcc-bounds-include-readonly"
run_dump_test_lp64 "morello-sec-round-choose-linker-syms"
run_dump_test_lp64 "morello-entry-point"
+run_dump_test_lp64 "morello-sec-start_stop-round"
run_dump_test_lp64 "morello-tlsdesc"
run_dump_test_lp64 "morello-tlsdesc-static"
run_dump_test_lp64 "morello-tlsdesc-staticpie"
diff --git a/ld/testsuite/ld-aarch64/morello-sec-start_stop-round.d
b/ld/testsuite/ld-aarch64/morello-sec-start_stop-round.d
new file mode 100644
index
0000000000000000000000000000000000000000..3987696e5ab864a4ef1f57016eb9a87f2849f8a2
--- /dev/null
+++ b/ld/testsuite/ld-aarch64/morello-sec-start_stop-round.d
@@ -0,0 +1,26 @@
+#as: -march=morello+c64
+#ld: -static
+#objdump: -d -j .data -j __libc_atexit
+
+.*: file format .*
+
+
+Disassembly of section \.data:
+
+[0-9a-f]+ <__data_start>:
+#record: START_LIBC_ADDR
+.*: ([0-9a-f]+) .*
+.*: 00000000 .*
+.*: 00000008 .*
+.*: 02000000 .*
+
+Disassembly of section __libc_atexit:
+
+# Use `string tolower` because we know we only have a number so it
won't change
+# anything. That's needed because the current record/check implementation
+# doesn't have a way to define a replacement which is just the existing
+# variable.
+#check: START_LIBC string tolower $START_LIBC_ADDR
+00000000START_LIBC <__start___libc_atexit>:
+.*: 0000002a .*
+.*: 00000000 .*
diff --git a/ld/testsuite/ld-aarch64/morello-sec-start_stop-round.s
b/ld/testsuite/ld-aarch64/morello-sec-start_stop-round.s
new file mode 100644
index
0000000000000000000000000000000000000000..b89273e82460c05cadfd2e3365be671411558e93
--- /dev/null
+++ b/ld/testsuite/ld-aarch64/morello-sec-start_stop-round.s
@@ -0,0 +1,10 @@
+.section __libc_atexit,"aw"
+ .xword 42
+.data
+atexit_location:
+ .chericap __start___libc_atexit
+.text
+.globl _start
+.type _start STT_FUNC
+_start:
+ add c0, c0, :lo12:atexit_location
The LSB on STT_FUNC symbols was missed in a few different places.
1) Absolute relocations coming from .xword, .word, and .hword
directives and the lowest bit MOVW relocations did not account for
the LSB at all.
2) Relocations for the ADR instruction only added the LSB on local
symbols.
Here we account for these by adding the LSB in each clause in
elfNN_aarch64_final_link_relocate.
The change under the BFD_RELOC_AARCH64_NN clause handles absolute 64 bit
relocations, the change for BFD_RELOC_AARCH64_ADR_LO21_PCREL handles the
relocation on ADR instructions, and the extra relocations checked
against in the clause including BFD_RELOC_AARCH64_ADD_LO12 ore the
remaining items.
N.b. we noticed the MOVW relocation problem because glibc's start.S was
using these direct MOV relocations to access the value of `main`. Since
`main` is a function we need to include the LSB in the resulting
relocation value. These relocations did not include the LSB from
STT_FUNC symbols.
Others were found from inspection of each relocation in turn.
############### Attachment also inlined for ease of reply
###############
diff --git a/bfd/elfnn-aarch64.c b/bfd/elfnn-aarch64.c
index
cb9e5f132cca2d360b8bd2cf2e2eb1fbbf1695f0..a33aea0eab02ac97cb605cac677d8ac27a475967
100644
--- a/bfd/elfnn-aarch64.c
+++ b/bfd/elfnn-aarch64.c
@@ -6900,6 +6900,11 @@ elfNN_aarch64_final_link_relocate
(reloc_howto_type *howto,
return bfd_reloc_ok;
case BFD_RELOC_AARCH64_NN:
+ /* If we are relocating against a C64 symbol, then the value can't
+ already have the LSB set (since STT_FUNC symbols are code labels and
+ they will be aligned). Hence it's safe just to or-equal in order
+ to ensure the LSB is set in that case. */
+ value |= to_c64 ? 1 : 0;
/* When generating a shared object or relocatable executable, these
relocations are copied into the output file to be resolved at
@@ -7115,8 +7120,7 @@ elfNN_aarch64_final_link_relocate
(reloc_howto_type *howto,
signed_addend,
weak_undef_p);
- if (bfd_r_type == BFD_RELOC_AARCH64_ADR_LO21_PCREL && isym != NULL
- && isym->st_target_internal & ST_BRANCH_TO_C64)
+ if (bfd_r_type == BFD_RELOC_AARCH64_ADR_LO21_PCREL && to_c64)
value |= 1;
break;
@@ -7172,8 +7176,13 @@ elfNN_aarch64_final_link_relocate
(reloc_howto_type *howto,
value = _bfd_aarch64_elf_resolve_relocation (input_bfd, bfd_r_type,
place, value,
signed_addend, weak_undef_p);
- if (bfd_r_type == BFD_RELOC_AARCH64_ADD_LO12 && isym != NULL
- && isym->st_target_internal & ST_BRANCH_TO_C64)
+ if ((bfd_r_type == BFD_RELOC_AARCH64_ADD_LO12
+ || bfd_r_type == BFD_RELOC_AARCH64_MOVW_G0
+ || bfd_r_type == BFD_RELOC_AARCH64_MOVW_G0_S
+ || bfd_r_type == BFD_RELOC_AARCH64_MOVW_G0_NC
+ || bfd_r_type == BFD_RELOC_AARCH64_32
+ || bfd_r_type == BFD_RELOC_AARCH64_16)
+ && to_c64)
value |= 1;
break;
diff --git a/ld/testsuite/ld-aarch64/aarch64-elf.exp
b/ld/testsuite/ld-aarch64/aarch64-elf.exp
index
9352db42e9913021d17a8e5dd8d7d8cf2125fe29..49b2e70adca947b61294bf1d589ce366b81da569
100644
--- a/ld/testsuite/ld-aarch64/aarch64-elf.exp
+++ b/ld/testsuite/ld-aarch64/aarch64-elf.exp
@@ -250,6 +250,7 @@ run_dump_test_lp64 "emit-relocs-morello-6"
run_dump_test_lp64 "emit-relocs-morello-6b"
run_dump_test_lp64 "emit-relocs-morello-7"
run_dump_test_lp64 "emit-relocs-morello-8"
+run_dump_test_lp64 "emit-relocs-morello-9"
run_dump_test_lp64 "emit-morello-reloc-markers-1"
run_dump_test_lp64 "emit-morello-reloc-markers-2"
run_dump_test_lp64 "emit-morello-reloc-markers-3"
diff --git a/ld/testsuite/ld-aarch64/emit-relocs-morello-9.d
b/ld/testsuite/ld-aarch64/emit-relocs-morello-9.d
new file mode 100644
index
0000000000000000000000000000000000000000..a9e1c3f37485df9d0ea9f2a52edab97cc9edf355
--- /dev/null
+++ b/ld/testsuite/ld-aarch64/emit-relocs-morello-9.d
@@ -0,0 +1,33 @@
+#source: emit-relocs-morello-9.s
+#as: -march=morello+c64
+#ld: -static -Ttext-segment 0x0
+#objdump: -d -j .data -j .text
+
+.*: file format .*
+
+
+Disassembly of section \.text:
+
+0000000000000000 <_start>:
+ 0: f2800020 movk x0, #0x1
+ 4: f2800020 movk x0, #0x1
+ 8: 30ffffc0 adr c0, 1 <_start\+0x1>
+ c: 30ffffa0 adr c0, 1 <_start\+0x1>
+ 10: 02000400 add c0, c0, #0x1
+ 14: 02000400 add c0, c0, #0x1
+ 18: d2800020 mov x0, #0x1 // #1
+ 1c: d2800020 mov x0, #0x1 // #1
+ 20: f2800020 movk x0, #0x1
+ 24: f2800020 movk x0, #0x1
+
+Disassembly of section \.data:
+
+.* <val>:
+ .*: 00000001 .word 0x00000001
+ .*: 00000001 .word 0x00000001
+ .*: 00000001 .word 0x00000001
+ .*: 00000000 .word 0x00000000
+ .*: 00000001 .word 0x00000001
+ .*: 00000001 .word 0x00000001
+ .*: 00000001 .word 0x00000001
+ .*: 00000000 .word 0x00000000
diff --git a/ld/testsuite/ld-aarch64/emit-relocs-morello-9.s
b/ld/testsuite/ld-aarch64/emit-relocs-morello-9.s
new file mode 100644
index
0000000000000000000000000000000000000000..854482cd027dd853aecd41465c1b2238bbbedb7d
--- /dev/null
+++ b/ld/testsuite/ld-aarch64/emit-relocs-morello-9.s
@@ -0,0 +1,42 @@
+# Attempting to check that the LSB is set on all relocations to a function
+# symbol.
+#
+# This should only happen for those relocations which load an address
into a
+# register, since relocations that jump to a PC relative address like `bl`
+# should not include the LSB.
+.text
+.global _start
+.type _start,@function
+.type otherstart,@function
+_start:
+otherstart:
+ movk x0, #:abs_g0_nc:_start
+ movk x0, #:abs_g0_nc:otherstart
+ adr c0, _start
+ adr c0, otherstart
+ add c0, c0, :lo12:_start
+ add c0, c0, :lo12:otherstart
+ # The below are not as much of a worry if they go wrong since they
+ # check overflow, and the likelyhood of there being a function which
+ # fits in the lowest 16 bits of an address is low. However, we can
+ # still test it in our testsuite with arguments to the linker, so we
+ # still get to check this edge case.
+ movz x0, #:abs_g0_s:_start
+ movz x0, #:abs_g0_s:otherstart
+ movk x0, #:abs_g0:_start
+ movk x0, #:abs_g0:otherstart
+.data
+.align 4
+.global val
+val:
+ # LSB should be included in the value of function symbols even if they
+ # are just added via absolute relocations.
+ .hword _start
+ .hword 0
+ .word _start
+ .xword _start
+ .hword otherstart
+ .hword 0
+ .word otherstart
+ .xword otherstart
+ .size val, .-val
The previous code was not actually using the size of a symbol when the
symbol was in the hash table. This meant that our TLS relaxations
created an instruction sequence with bounds of zero so that the GCC TLS
instruction sequence eventually ended up giving a length-zero
capability.
Also handle extra size of pointers in TCB for c64. For purecap we have
16 byte pointers. Hence the TCB is 32 bytes. This was not yet handled
in our relaxations.
Here we determine whether to use a 32 or 16 byte TCB based on the flags
of the current BFD (i.e. whether this is a purecap binary that we're
creating).
Testcases are updated to account for the fact that the length
of the capability to the symbol itself is now sometimes non-zero and for
the different offset required into the TLS block for modules loaded at
startup time.
############### Attachment also inlined for ease of reply
###############
diff --git a/bfd/elfnn-aarch64.c b/bfd/elfnn-aarch64.c
index
1f6d83041eecf01f87494b0699e59647ddc07293..cb9e5f132cca2d360b8bd2cf2e2eb1fbbf1695f0
100644
--- a/bfd/elfnn-aarch64.c
+++ b/bfd/elfnn-aarch64.c
@@ -2929,7 +2929,8 @@ c64_value_p (asection *section, unsigned int value)
}
/* The size of the thread control block which is defined to be two
pointers. */
-#define TCB_SIZE (ARCH_SIZE/8)*2
+#define TCB_SIZE(cur_bfd) \
+ elf_elfheader(cur_bfd)->e_flags & EF_AARCH64_CHERI_PURECAP ? 32 :
(ARCH_SIZE/8)*2
struct elf_aarch64_local_symbol
{
@@ -6043,7 +6044,7 @@ tpoff_base (struct bfd_link_info *info)
/* If tls_sec is NULL, we should have signalled an error already. */
BFD_ASSERT (htab->tls_sec != NULL);
- bfd_vma base = align_power ((bfd_vma) TCB_SIZE,
+ bfd_vma base = align_power ((bfd_vma) TCB_SIZE (info->output_bfd),
htab->tls_sec->alignment_power);
return htab->tls_sec->vma - base;
}
@@ -7700,7 +7701,7 @@ elfNN_aarch64_tls_relax (bfd *input_bfd, struct
bfd_link_info *info,
BFD_ASSERT (globals && input_bfd && contents && rel);
- if (is_local)
+ if (is_local || !bfd_link_pic (info))
{
if (h != NULL)
sym_size = h->size;
@@ -8106,7 +8107,7 @@ set_nop:
/* No need of CALL26 relocation for tls_get_addr. */
rel[1].r_info = ELFNN_R_INFO (STN_UNDEF, R_AARCH64_NONE);
bfd_putl32 (0xd53bd040, contents + rel->r_offset + 0);
- bfd_putl32 (add_R0_R0 | (TCB_SIZE << 10),
+ bfd_putl32 (add_R0_R0 | (TCB_SIZE (input_bfd) << 10),
contents + rel->r_offset + 4);
return bfd_reloc_ok;
}
@@ -8135,7 +8136,7 @@ set_nop:
BFD_ASSERT (ELFNN_R_TYPE (rel[1].r_info) == AARCH64_R (CALL26));
/* No need of CALL26 relocation for tls_get_addr. */
rel[1].r_info = ELFNN_R_INFO (STN_UNDEF, R_AARCH64_NONE);
- bfd_putl32 (add_R0_R0 | (TCB_SIZE << 10),
+ bfd_putl32 (add_R0_R0 | (TCB_SIZE (input_bfd) << 10),
contents + rel->r_offset + 0);
bfd_putl32 (INSN_NOP, contents + rel->r_offset + 4);
return bfd_reloc_ok;
diff --git a/ld/testsuite/ld-aarch64/morello-tlsdesc-static.d
b/ld/testsuite/ld-aarch64/morello-tlsdesc-static.d
index
372f369e7a23625238a14aa2505f5a1de7d286ed..9026f14115b996ea90628325eb51e1773f08828e
100644
--- a/ld/testsuite/ld-aarch64/morello-tlsdesc-static.d
+++ b/ld/testsuite/ld-aarch64/morello-tlsdesc-static.d
@@ -14,8 +14,8 @@ Disassembly of section .text:
.*: c29bd042 mrs c2, ctpidr_el0
.*: d2a00001 movz x1, #0x0, lsl #16
.*: d2a00000 movz x0, #0x0, lsl #16
-.*: f2800200 movk x0, #0x10
-.*: f2800001 movk x1, #0x0
+.*: f2800400 movk x0, #0x20
+.*: f2800081 movk x1, #0x4
.*: c2a06040 add c0, c2, x0, uxtx
.*: c2c10000 scbnds c0, c0, x1
@@ -23,7 +23,7 @@ Disassembly of section .text:
.*: c29bd042 mrs c2, ctpidr_el0
.*: d2a00001 movz x1, #0x0, lsl #16
.*: d2a00000 movz x0, #0x0, lsl #16
-.*: f2800280 movk x0, #0x14
+.*: f2800480 movk x0, #0x24
.*: f2800281 movk x1, #0x14
.*: c2a06040 add c0, c2, x0, uxtx
.*: c2c10000 scbnds c0, c0, x1
diff --git a/ld/testsuite/ld-aarch64/morello-tlsdesc-staticpie.d
b/ld/testsuite/ld-aarch64/morello-tlsdesc-staticpie.d
index
e391d86962c1eb6c4b79aead7d6d67f817e970e1..bf2b67aa4574db66373a1932acb572609a793d69
100644
--- a/ld/testsuite/ld-aarch64/morello-tlsdesc-staticpie.d
+++ b/ld/testsuite/ld-aarch64/morello-tlsdesc-staticpie.d
@@ -42,7 +42,7 @@ Disassembly of section .text:
.*: c29bd042 mrs c2, ctpidr_el0
.*: d2a00001 movz x1, #0x0, lsl #16
.*: d2a00000 movz x0, #0x0, lsl #16
-.*: f2800280 movk x0, #0x14
+.*: f2800480 movk x0, #0x24
.*: f2800281 movk x1, #0x14
.*: c2....40 add c0, c2, x0, uxtx
.*: c2c10000 scbnds c0, c0, x1
The handling is done by putting the value that we want in a buffer and
using that as the entry_symbol.name which lang_end picks up.
Another option would be to find the entry symbol *after* lang_end has
finished (e.g. in elfNN_aarch64_init_file_header) and add the LSB to it
if that symbol is a C64 symbol.
This approach was mainly chosen in order to match more closely what
Thumb has done.
N.b. we set the LSB based on the LSB of the entry point symbol.
If the entry point symbol is in c64 code but is not an STT_FUNC (e.g.
it is an STT_NOTYPE) then the LSB will not be set.
This matches Morello clang behaviour.
############### Attachment also inlined for ease of reply
###############
diff --git a/ld/emultempl/aarch64elf.em b/ld/emultempl/aarch64elf.em
index
8a123106e3df3a0236cf818051430b5ef27eca8e..11512a127db039066f6c11e6132f7089d0528994
100644
--- a/ld/emultempl/aarch64elf.em
+++ b/ld/emultempl/aarch64elf.em
@@ -330,6 +330,43 @@ gld${EMULATION_NAME}_finish (void)
}
finish_default ();
+
+ struct bfd_link_hash_entry * h;
+ struct elf_link_hash_entry * eh;
+
+ if (!entry_symbol.name)
+ return;
+
+ h = bfd_link_hash_lookup (link_info.hash, entry_symbol.name,
+ FALSE, FALSE, TRUE);
+ eh = (struct elf_link_hash_entry *)h;
+ if (!h || !(eh->target_internal & ST_BRANCH_TO_C64))
+ return;
+ if (h->type != bfd_link_hash_defined
+ && h->type != bfd_link_hash_defweak)
+ return;
+ if (h->u.def.section->output_section == NULL)
+ return;
+
+ static char buffer[67];
+ bfd_vma val;
+
+ /* Special procesing is required for a C64 entry symbol. The
+ bottom bit of its address must be set. */
+ val = (h->u.def.value
+ + bfd_section_vma (h->u.def.section->output_section)
+ + h->u.def.section->output_offset);
+
+ val |= 1;
+
+ /* Now convert this value into a string and store it in entry_symbol
+ where the lang_end() function will pick it up. */
+ buffer[0] = '0';
+ buffer[1] = 'x';
+
+ sprintf_vma (buffer + 2, val);
+
+ entry_symbol.name = buffer;
}
/* This is a convenient point to tell BFD about target specific flags.
diff --git a/ld/testsuite/ld-aarch64/aarch64-elf.exp
b/ld/testsuite/ld-aarch64/aarch64-elf.exp
index
721d16e09bc1392fbc5e7920a080962bb4b374a2..9352db42e9913021d17a8e5dd8d7d8cf2125fe29
100644
--- a/ld/testsuite/ld-aarch64/aarch64-elf.exp
+++ b/ld/testsuite/ld-aarch64/aarch64-elf.exp
@@ -269,6 +269,7 @@ run_dump_test_lp64 "morello-sec-round-data-only"
run_dump_test_lp64 "morello-sec-round-include-relro"
run_dump_test_lp64 "morello-pcc-bounds-include-readonly"
run_dump_test_lp64 "morello-sec-round-choose-linker-syms"
+run_dump_test_lp64 "morello-entry-point"
run_dump_test_lp64 "morello-tlsdesc"
run_dump_test_lp64 "morello-tlsdesc-static"
run_dump_test_lp64 "morello-tlsdesc-staticpie"
diff --git a/ld/testsuite/ld-aarch64/morello-entry-point.d
b/ld/testsuite/ld-aarch64/morello-entry-point.d
new file mode 100644
index
0000000000000000000000000000000000000000..29fb431199b977f1714da9d792f9da79e182339a
--- /dev/null
+++ b/ld/testsuite/ld-aarch64/morello-entry-point.d
@@ -0,0 +1,10 @@
+# Checking that the entry point address of a binary with a c64 function
symbol
+# as the entry address is odd (i.e. has the LSB set).
+#source: emit-relocs-morello-2.s
+#as: -march=morello+c64
+#ld: -static
+#readelf: --file-header
+
+#...
+ Entry point address: 0x.*[13579]
+#pass