On Tue, 13 Oct 2020, Alex Bennée wrote:
This seems to be the same bug in GRUB noticed by Masami, where GRUB is adding strings to device tree at runtime without '\0' at the end. Julien posted a potential (untested) fix for GRUB:
What version of GRUB are you using?
2.04-11 - so The current Debian Testing package with your patch applied. I'd avoided build grub from scratch to avoid screwing up the system but if it's just a drop in .efi binary then I can build from the vanilla tree.
That's understandable. Before switching to GRUB upstream, I would like to understand which compatible is likely not NUL-terminated.
Would you mind to apply this patch in Xen and post the full log?
diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c index e107c6f89f1b..1c88fef4d3af 100644 --- a/xen/common/device_tree.c +++ b/xen/common/device_tree.c @@ -212,11 +212,28 @@ bool_t dt_device_is_compatible(const struct dt_device_node *device, const char *compat) { const char* cp;
- u32 cplen, l;
u32 cplen, l, i;
printk("Check compatible for node %s\n",
device->full_name); cp = dt_get_property(device, "compatible", &cplen);
if ( cp == NULL ) return 0;
printk("cplen %u\n", cplen);
for ( i = 0; i < cplen; i++ )
{
if ( cp[i] == '\0' )
printk("\n");
else
printk("%c", cp[i]);
}
if ( cp[cplen - 1] != '\0')
printk("\n /!\\ The compatible property is not NULL terminated\n");
while ( cplen > 0 ) { if ( dt_compat_cmp(cp, compat) == 0 )
This patch will printk the compatible for each call to dt_device_is_compatible() and print a message if one property is not NUL-terminated.
Full log further bellow but it looks like it crashes while going through the a806 config space. Is this something that grub is likely to mess with or does it point to a problem with the EDK2 build? Does the build have the tables embedded in it or does it extract it from some ROM device on the board?
(XEN) Check compatible for node /ap806/config-space@f0000000/system-controller@6f4000/pinctrl (XEN) cplen 22 (XEN) marvell,ap806-pinctrl (XEN) Check compatible for node /a(XEN) Data Abort Trap. Syndrome=0x5
This is interesting: it doesn't look like the string is not NULL terminated. It looks like the FDT size wasn't reported correctly. Otherwise it would have at least printed "cplen XX" for the last node. (It might be worth re-running the same experiment passing sync_console on the Xen command line to be extra safe we aren't missing any console output.)
Basically, there is a dt_for_each_device_node loop which is iterating for each device node:
for ( dn = dt_host; dn != NULL; dn = dn->allnext )
For the node following /ap806/config-space@f0000000/system-controller@6f4000/pinctrl Xen crashes trying to access dn->full_name.
dn->full_name is set by unflatten_dt_node when unflattening the FDT.
I am guessing Grub is modifying the FDT incorrectly or doesn't update the FDT size correctly.