Hi Lorenzo, All,
Before potentially sending out an ECR for ACPI to require the _DSM that prevents PCI related resource reassignment (in GI case), I thought I would do some experiments to see if there is another potential solution (this obviously doesn't work for RMR case!)
Can we cache the state (BDF of each device) that the firmware has configured, to build a hierarchical representation that we can then use to put figure out the association after reenumeration? My assumption is this should be possible, and the vIOMMU spec of JPB (which has the same BDF issue) kind of suggests this solution. https://jpbrucker.net/virtio-iommu/viot/viot-v9.pdf "This identifier corresponds to the one observed by the operating system when parsing the PCI configuration space for the first time after boot"
So the problem I've run into is I can't actually build a test setup where the BDFs of devices correctly configured by the firmware are changed. Note this is all in QEMU with various hacks in EDK2 including enablement of root port padding from OVMF (I'll post that for merging into EDK2 at somepoint because it's useful.) Whilst qemu doesn't seem to allow hotplug of a PCIe switch I can plug a pcie-pci-bridge to exercise pretty much the same code paths.
As an alternative, if we stop EDK2 from enabling a switch upstream port, the kernel will attempt that later and it can go horribly wrong. Side questions on whether we may want to look at hardening that code against random broken situations. It's not high on my priority list but I might post an RFC on that at some stage as it can take a working partial pci topology and end up with nothing at all working as uses bus numbers multiple times.
But if the configuration is valid, I can't seem to actually build a setup where the BDF changes. It's relatively easy to get the various other resources to change (e.g. discussion on adding _DSM to qemu ACPI ongoing has some examples), but not the BDF. https://lore.kernel.org/qemu-devel/20201217132926.4812-1-cenjiahui@huawei.co...
Obviously, nothing stops Linux or another OS doing this in future if the _DSM isn't provided.
To a certain extend I don't really care if Linux does or doesn't re-enumerate the bus numbers but being unable to make it happen does make it rather tricky to build a PoC of the caching approach.
Any thoughts, or known configurations in which will change existing bus number assignments? The code is fiddly so there are a few places where it looks like it will but then turns out to not do so because of a sanity check elsewhere.
Jonathan
p.s. No rush on this as I'll be off until January from end of today.
On Fri, Dec 18, 2020 at 12:31:43PM +0000, Jonathan Cameron wrote:
Hi Lorenzo, All,
Before potentially sending out an ECR for ACPI to require the _DSM that prevents PCI related resource reassignment (in GI case), I thought I would do some experiments to see if there is another potential solution (this obviously doesn't work for RMR case!)
Can we cache the state (BDF of each device) that the firmware has configured, to build a hierarchical representation that we can then use to put figure out the association after reenumeration? My assumption is this should be possible, and the vIOMMU spec of JPB (which has the same BDF issue) kind of suggests this solution. https://jpbrucker.net/virtio-iommu/viot/viot-v9.pdf "This identifier corresponds to the one observed by the operating system when parsing the PCI configuration space for the first time after boot"
I've also been making that assumption on the devicetree side, because we don't really have a better way than BDF for identifying devices. See badd9f19f199 ("dt-bindings: Add "external-facing" PCIe port property") in particular, and 6c9e92ef8bdd ("dt-bindings: virtio: Add virtio-pci-iommu node") for the DT equivalent to VIOT.
No one objected on the list but there was some internal discussion asking whether it was safe longterm. Concerns about implementing PCIe FPB, or the firmware leaving ARI or SR-IOV enabled. I'll look at the details after my holiday, but we concluded it was fine.
Thanks, Jean
So the problem I've run into is I can't actually build a test setup where the BDFs of devices correctly configured by the firmware are changed. Note this is all in QEMU with various hacks in EDK2 including enablement of root port padding from OVMF (I'll post that for merging into EDK2 at somepoint because it's useful.) Whilst qemu doesn't seem to allow hotplug of a PCIe switch I can plug a pcie-pci-bridge to exercise pretty much the same code paths.
As an alternative, if we stop EDK2 from enabling a switch upstream port, the kernel will attempt that later and it can go horribly wrong. Side questions on whether we may want to look at hardening that code against random broken situations. It's not high on my priority list but I might post an RFC on that at some stage as it can take a working partial pci topology and end up with nothing at all working as uses bus numbers multiple times.
But if the configuration is valid, I can't seem to actually build a setup where the BDF changes. It's relatively easy to get the various other resources to change (e.g. discussion on adding _DSM to qemu ACPI ongoing has some examples), but not the BDF. https://lore.kernel.org/qemu-devel/20201217132926.4812-1-cenjiahui@huawei.co...
Obviously, nothing stops Linux or another OS doing this in future if the _DSM isn't provided.
To a certain extend I don't really care if Linux does or doesn't re-enumerate the bus numbers but being unable to make it happen does make it rather tricky to build a PoC of the caching approach.
Any thoughts, or known configurations in which will change existing bus number assignments? The code is fiddly so there are a few places where it looks like it will but then turns out to not do so because of a sanity check elsewhere.
Jonathan
p.s. No rush on this as I'll be off until January from end of today.
linaro-open-discussions@op-lists.linaro.org