I am very excited about the binary release of gcc-toolchain, but I run into problems trying to compile something as simple as:
int main()
{
return 0;
}
$ ~/morello-gnu/bin/aarch64-none-elf-gcc -march=morello -O0 -nostdlib hello.c
/home/lord/morello-gnu/bin/../lib/gcc/aarch64-none-elf/10.1.0/../../../../aarch64-none-elf/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400000
$ ~/morello-gnu/bin/aarch64-none-elf-gcc -march=morello -O0 hello.c
/home/lord/morello-gnu/bin/../lib/gcc/aarch64-none-elf/10.1.0/../../../../aarch64-none-elf/bin/ld: /home/lord/morello-gnu/bin/../lib/gcc/aarch64-none-elf/10.1.0/../../../../aarch64-none-elf/lib/hybridcap/libc.a(lib_a-exit.o): in function `exit':
/data/jenkins/workspace/GNU-toolchain/morello-trunk/src/newlib-cygwin/newlib/libc/stdlib/exit.c:70: undefined reference to `_exit'
/home/lord/morello-gnu/bin/../lib/gcc/aarch64-none-elf/10.1.0/../../../../aarch64-none-elf/bin/ld: /home/lord/morello-gnu/bin/../lib/gcc/aarch64-none-elf/10.1.0/../../../../aarch64-none-elf/lib/hybridcap/crt0.o: in function `_start':
/data/jenkins/workspace/GNU-toolchain/morello-trunk/src/newlib-cygwin/libgloss/aarch64/crt0.S:360: undefined reference to `__init_global_caps'
/home/lord/morello-gnu/bin/../lib/gcc/aarch64-none-elf/10.1.0/../../../../aarch64-none-elf/bin/ld: /data/jenkins/workspace/GNU-toolchain/morello-trunk/src/newlib-cygwin/libgloss/aarch64/crt0.S:361: undefined reference to `__processRelocs'
/home/lord/morello-gnu/bin/../lib/gcc/aarch64-none-elf/10.1.0/../../../../aarch64-none-elf/bin/ld: /data/jenkins/workspace/GNU-toolchain/morello-trunk/src/newlib-cygwin/libgloss/aarch64/crt0.S:365: undefined reference to `initialise_monitor_handles'
collect2: error: ld returned 1 exit status
Any suggestions on what I am doing wrong?
Also, I think it mentions that pure and hybrid capabilities modes are supported, but I could not find option to specify which one to use.
Thanks!
Vadim
--
Senior Research Associate
Department of Computer Science and Technology
University of Cambridge
http://zaliva.org/
Capabilities pointing to symbols in SEC_CODE sections are given the
bounds of the entire PCC. We ensure that the PCC bounds are padded and
aligned as needed in the linker.
Capabilities pointing to other symbols (e.g. in data sections) are given
the bounds of the symbol that they point to. It is the responsibility
of the assembly generator (i.e. usually the compiler) to ensure these
bounds are correctly aligned and padded as necessary.
We emit a warning for imprecise bounds in the second case, until this
patch that warning also looked at the first case. This was a mistake
and is rectified in this commit.
############### Attachment also inlined for ease of reply ###############
diff --git a/bfd/elfnn-aarch64.c b/bfd/elfnn-aarch64.c
index 3e0562b72bc69a9c43f01c80c604b85c582f4f7b..da3c629fcdbfbceed633d9af7c5751dd358492fc 100644
--- a/bfd/elfnn-aarch64.c
+++ b/bfd/elfnn-aarch64.c
@@ -6965,17 +6965,6 @@ c64_fixup_frag (bfd *input_bfd, struct bfd_link_info *info,
bfd_vma base = value, limit = value + size;
unsigned align = 0;
- if (!bounds_ok && !c64_valid_cap_range (&base, &limit, &align))
- {
- /* Just warn about this. It's not a requirement that bounds on
- objects should be precise, so there's no reason to error out on
- such an object. */
- /* xgettext:c-format */
- _bfd_error_handler
- (_("%pB: capability range for '%s' may exceed object bounds"),
- input_bfd, sym_name);
- }
-
if (perm_sec && perm_sec->flags & SEC_CODE)
{
/* Any symbol pointing into an executable section gets bounds according
@@ -6994,6 +6983,16 @@ c64_fixup_frag (bfd *input_bfd, struct bfd_link_info *info,
data or jump to other functions. */
size = pcc_high - pcc_low;
}
+ else if (!bounds_ok && !c64_valid_cap_range (&base, &limit, &align))
+ {
+ /* Just warn about this. It's not a requirement that bounds on
+ objects should be precise, so there's no reason to error out on
+ such an object. */
+ /* xgettext:c-format */
+ _bfd_error_handler
+ (_("%pB: capability range for '%s' may exceed object bounds"),
+ input_bfd, sym_name);
+ }
if (perm_sec != NULL)
{
diff --git a/ld/testsuite/ld-aarch64/aarch64-elf.exp b/ld/testsuite/ld-aarch64/aarch64-elf.exp
index 4200c1d17e97ac9b54f2a0a051f15f0d51d47c06..a07a6c50bb8b601761cf38607afbc0a4dcc9ef0c 100644
--- a/ld/testsuite/ld-aarch64/aarch64-elf.exp
+++ b/ld/testsuite/ld-aarch64/aarch64-elf.exp
@@ -390,6 +390,8 @@ run_dump_test_lp64 "morello-illegal-tls"
run_dump_test_lp64 "morello-illegal-tls-pie"
run_dump_test_lp64 "morello-illegal-tls-shared"
+run_dump_test_lp64 "morello-large-function"
+
run_dump_test "no-morello-syms-static"
run_dump_test "reloc-overflow-bad"
diff --git a/ld/testsuite/ld-aarch64/morello-large-function.d b/ld/testsuite/ld-aarch64/morello-large-function.d
new file mode 100644
index 0000000000000000000000000000000000000000..dc0af92e12f9ab7203a400997ded14c8bdf4172c
--- /dev/null
+++ b/ld/testsuite/ld-aarch64/morello-large-function.d
@@ -0,0 +1,23 @@
+# Mainly here to check that this actually links.
+# This testcase used to complain that the function capability may have
+# imprecise bounds, but since such capabilities are given PCC bounds that error
+# was invalid.
+#
+# Even though the only point is to check that the testcase links, we still
+# ensure that the dump of the .data section contains a relocation with the
+# correct permissions.
+#as: -march=morello+c64
+#ld: -pie -static
+#objdump: -DR -j .data
+
+.*: file format .*
+
+
+Disassembly of section \.data:
+
+[0-9a-f]+ <__data_start>:
+ *[0-9a-f]+: .*
+ .*: R_MORELLO_RELATIVE \*ABS\*\+.*
+ *[0-9a-f]+: .* udf #0
+ *[0-9a-f]+: .*
+ *[0-9a-f]+: 04000000 .*
diff --git a/ld/testsuite/ld-aarch64/morello-large-function.s b/ld/testsuite/ld-aarch64/morello-large-function.s
new file mode 100644
index 0000000000000000000000000000000000000000..46cb51e0c68cba982e8bf2b83b696399a6225f21
--- /dev/null
+++ b/ld/testsuite/ld-aarch64/morello-large-function.s
@@ -0,0 +1,9 @@
+.data
+ .chericap _start
+.text
+ .globl _start
+ .type _start,@function
+_start:
+ ret
+ .zero 0x8000
+ .size _start, .-_start
1) Enable having a CAPINIT relocation against an IFUNC.
We update the `final_link_relocate` switch case around IFUNC's to
also handle CAPINIT relocations. The handling of CAPINIT relocations
is slightly different than for AARCH64_NN (i.e. ABS64) relocations
since we generally need to emit a dynamic relocation.
Handling this relocation also needs to manage the PDE case when a
hard-coded address has been put into code to satisfy something like
an `adrp`. In these cases the canonical address of the IFUNC becomes
its PLT stub rather than the result of the resolver. We then need to
use a RELATIVE relocation rather than an IRELATIVE one.
N.b. unlike the ABS64 relocation, since a CAPINIT will always emit a
dynamic relocation we do not require pointer equality adjustments on
a symbol from having seen a CAPINIT. That means we do not need to
request that the PLT stub of an IFUNC is treated as the canonical
address just from having seen a CAPINIT relocation.
A CAPINIT relocation against an IFUNC needs to be recorded internally
so that _bfd_elf_allocate_ifunc_dyn_relocs does not garbage collect
the PLT stub and associated IRELATIVE relocation.
See changes in the CAPINIT case of the IFUNC switch of
elfNN_aarch64_final_link_relocate, and in the CAPINIT case of
elfNN_aarch64_check_relocs.
2) Ensure that GOT relocations against an IFUNC have their fragment
populated with the LSB set.
For GOT relocations against a capability IFUNC we need to introduce a
relocation for the runtime to provide us with a valid capability.
See changes in the GOT cases of the IFUNC switch of
elfNN_aarch64_final_link_relocate, changes in the
elfNN_aarch64_allocate_ifunc_dynrelocs function, and changes around
handling an IFUNC GOT entry in elfNN_aarch64_finish_dynamic_symbol.
3) Ensure that mapping symbols are emitted for the .iplt. Without this
many of the testcases here are disassembled incorrectly.
See changes in elfNN_aarch64_output_arch_local_syms.
4) IRELATIVE relocations are against symbols which are not in the
dynamic symbol table, hence they need their fragment populated to
inform the dynamic linker the bounds and permissions to call the
associated resolver with.
See part of the CAPINIT IFUNC handling in
elfNN_aarch64_final_link_relocate, and the IRELATIVE handling in
elfNN_aarch64_create_small_pltn_entry.
5) Disallow an ABS64 relocation against a purecap IFUNC. Such a
relocation is expecting a 64-bit value but the function will return a
capability. Some handling could be implemented by some communication
method to the dynamic linker that this particular value should be
64-bit (maybe by emitting an AARCH64_IRELATIVE relocation rather than
a MORELLO_IRELATIVE one), but as yet GCC doesn't generate such a
relocation and we believe it's unlikely to be needed.
See new error check in AARCH64_NN clause of
elfNN_aarch64_final_link_relocate.
6) Ensure that for statically linked PDE's, we segregate IRELATIVE and
RELATIVE relocations. IRELATIVE relocs should be in the .rela.iplt
section, while RELATIVE relocs should be in the .rela.dyn section.
Correspondingly all RELATIVE relocations should be between the
__rela_dyn_{start,end} symbols, and all IRELATIVE relocations should
be between the __rela_iplt_{start,end} symbols.
This segregation is made based on dynamic relocation type rather than
static relocation that generates it. The segregation allows the
static libc to more easily handle relocations.
Update testcases accordingly.
We introduce some new testcases, morello-ifunc.s contains uses of an
IFUNC which has been referenced directly in code. When compiling a PDE
this triggers the pointer equality requirement and hence the canonical
address for this symbol becomes the PLT stub rather than the result of
the resolver.
morello-ifunc1.s does not use the IFUNC directly in code so that the
address used everywhere is the result of the resolver.
Both of these have testcases assembled and linked for static,
dynamically linked PDE, and PIE. The testcase without a hard-coded
access also has a testcase for -shared.
morello-ifunc2.s is written to check that a CAPINIT relocation does
indeed stop the garbage collection of an IFUNC's PLT and IRELATIVE
relocation.
morello-ifunc3.s tests that we error on an ABS64 relocation against a
C64 IFUNC.
morello-ifunc-dynlink.s tests that a CAPINIT relocation against an IFUNC
symbol defined in a shared library behaves the same way as one against a
FUNC symbol defined in a shared library.
Implementation note:
When segregating IRELATIVE and RELATIVE relocs the change for
relocations against IFUNC symbols populated in the GOT is
straightforward.
For CAPINIT relocations the change is not as straightforward. The
problem is that on sight of CAPINIT relocations in check_relocs we
immediately allocate space in the srelcaps section. In trying to
satisfy the above we need to know whether we're going to be emitting an
IRELATIVE relocation or RELATIVE one in order to know which section it
should go in. The determining factor between these two kinds of
relocations is whether there is a text relocation to this IFUNC symbol,
since that determines whether we need to make this CAPINIT relocation
a RELATIVE relocation pointing to the PLT stub (in order to satisfy
pointer equality) or an IRELATIVE relocation pointing to the resolver.
Whether such a relocation occurs is recorded against each symbol in the
pointer_equality_needed member. This can only be known after all
relocations have been seen in check_relocs. Hence, when coming across a
CAPINIT relocation in check_relocs we do not in general know whether
this CAPINIT relocation should end up as an IRELATIVE or RELATIVE
relocation.
This patch postpones the decision by recording the number of CAPINIT
relocations against a given symbol in a hash table while going through
check_relocs and allocating the relevant space in the required section
in size_dynamic_sections.
N.b. this is similar in purpose to the dyn_relocs linked list on a
symbol. We do not use that existing member which is on every symbol
since the structure does not allow any indication of what kind of
relocation triggered the need. Moreover the structure is used for
different purposes throughout the linker and disentangling the new
meaning from the existing ones seems overly confusing.
Overall, the decisions about which sections relocations against an IFUNC
should go in are:
CAPINIT relocations:
If this is a static PDE link, and the symbol does not need pointer
equality handling, then this should emit an IRELATIVE relocation and
that should go in the .rela.iplt section.
If this is a PIC link, then this should go in the .rela.ifunc
section (along with all other dynamic relocations against the IFUNC,
as commented in _bfd_elf_allocate_ifunc_dyn_relocs).
Otherwise this relocation should go in the srelcaps section (which
goes in .rela.dyn).
GOT relocations:
If this is a static PDE link, and the symbol does not need pointer
equality, then this should emit an IRELATIVE relocation into the
.rela.iplt section.
If this is a static PDE link, then this should emit a RELATIVE
relocation and that should go in the srelcaps section (which is in
.rela.dyn).
Otherwise this should go in .rela.got section.
Hi,
This patch deals with the interaction between the code that attempts to
make bounds precise (for both the PCC bounds and for some individual
sections) and the code that adds stubs (e.g. long-branch veneers and
interworking stubs) in the AArch64 backend.
We aim to set precise bounds for the PCC span and some individual
sections in elfNN_c64_resize_sections. However, it transpires that
elfNN_aarch64_size_stubs can change the layout in ways that extend
sections that should be covered under the PCC span outside of the bounds
set in elfNN_c64_resize_sections. The introduction of stubs can also
change (even reduce) the amount of padding required to make the bounds
on any given section precise.
To address this problem, we move the core logic from
elfNN_c64_size_sections into a new function, c64_resize_sections, that
is safe to be called repeatedly. Similarly, we move the core logic from
elfNN_aarch64_size_stubs into a new function aarch64_size_stubs which
again can be called repeatedly.
We then adjust elfNN_aarch64_size_stubs to call aarch64_size_stubs and
c64_resize_sections in a loop, stopping when c64_resize_sections no
longer makes any changes to the layout.
An important observation made above is that the introduction of stubs
can change the amount of padding needed to make bounds precise. Likewise,
introducing padding can in theory necessitate the introduction of stubs
(e.g. if the change in layout necessitates a long-branch veneer). This
is why we run the resizing/stubs code in a loop until no further changes
are necessary.
Since the amount of padding needed to achieve precise bounds for a
section can change (indeed reduce) with the introduction of stubs, we
need a mechanism to update the amount of padding applied to a section in
a subsequent iteration of c64_resize_sections. We achieve this by
introducing a new interface in ld/emultempl/aarch64elf.em. We have the
functions:
static void
c64_set_section_padding (asection *osec, bfd_vma padding, void **cookie);
static void
c64_get_section_padding (void *cookie);
Here, the "cookie" value is, to consumers of this interface (i.e.
bfd/elfnn-aarch64.c), an opaque handle used to refer to the padding that
was introduced for a given section. The consuming code then passes back
the cookie to later query the amount of padding already installed or to
update the amount of padding.
Internally, within aarch64elf.em, the "cookie" is just a pointer to the
node in the ldexp tree containing the integer amount of padding
inserted.
In the AArch64 ELF backend, we then maintain a (lazily-allocated)
mapping between output sections and cookies in order to be able to
update the padding we installed in subsequent iterations of
c64_resize_sections.
While working on this patch, an edge case became apparent: the case
where pcc_high_sec requires precise bounds (i.e. where we call
ensure_precisely_bounded_section on pcc_high_sec). As it stands, in this
case, the code to ensure precise PCC bounds may in fact make the bounds
on pcc_high_sec itself no longer representable (even if we previously
ensured this by calling ensure_precisely_bounded_section). In general,
it is not always possible to choose an amount of padding to add to the
end of pcc_high_sec to make both pcc_high_sec and the PCC span itself
have precise bounds (without introducing an unreasonably large alignment
requirement on pcc_high_sec).
To handle the edge case above, we decouple these two problems by adding
a separate amount of padding *after* pcc_high_sec to make the PCC bounds
precise. If pcc_high_sec is required to have precise bounds, then that
can be done in the usual way by adding padding to pcc_high_sec in
ensure_precisely_bounded_section. The new mechanism for adding padding
after an output section is implemented in
aarch64elf.em:c64_pad_after_section.
To avoid having to add yet another mechanism to update the padding
*after* pcc_high_sec, we avoid adding this padding until all other
resizing / bounds-setting work is done. This is not possible for
individual sections since padding introduced there may have a knock-on
effect requiring further work, but we believe this isn't the case for
the padding added after pcc_high_sec to make the PCC bounds precise.
This patch also reveals a pre-existing issue whereby we end up calling
ensure_precisely_bounded_section on the *ABS* section. Without a further
change to prevent this, this can lead to a null pointer dereference in
ensure_precisely_bounded_section, since the "owner" field on the *ABS*
pointer is NULL, and we use this field to obtain a pointer to the output
BFD in the new c64_get_section_padding_info function.
Of course, it doesn't make sense for ensure_precisely_bounded_section to
be called on the *ABS* section in the first place. This can happen when
there are relocations against ldscript-defined symbols which are defined
at the top level of the ldscript (i.e. not in a particular output
section). Those symbols initially have their output section set to the
*ABS* section. Later, we resolve such symbols to their correct output
section in ldexp_finalize_syms, but the code in c64_resize_sections is
running in ldemul_after_allocation, which comes before the call to
ldexp_finalize_syms in the lang_process flow.
For now, we just skip such symbols when looking for sections that need
precise bounds in c64_resize_sections, but this issue will later need
fixing properly. We choose to avoid fixing the pre-existing issue in
this patch to avoid over-complicating an already complex change.
Tested on aarch64-none-elf and aarch64-none-linux-gnu, OK for the
Morello branch?
Thanks,
Alex