Hi
I believe there is a hidden problem in the virtio implementation of Qemu (up to 6.1.0) in calculating the offset of the "used" split vring and the spec need some clarifications. Should anyone decide it deserve upstream/spec changes, feel free to do so.
Cheers
FF
According to the specification in 2.6.2:
#define ALIGN(x) (((x) + qalign) & ~qalign) static inline unsigned virtq_size(unsigned int qsz) { return ALIGN(sizeof(struct virtq_desc)*qsz + sizeof(u16)*(*3* + qsz)) }
And more specifically, "used" starts after "avail" as defined in 2.6.6
struct virtq_avail { #define VIRTQ_AVAIL_F_NO_INTERRUPT 1 le16 flags; le16 idx; le16 ring[ /* Queue Size */ ]; *le16 used_event; /* Only if VIRTIO_F_EVENT_IDX */* };
Linux and kvmtool calculates the offset with the formula: LINUX: vring_init @ include/uapi/linux/virtio_ring.h
vr->avail = (struct vring_avail *)((char *)p + num * sizeof(struct vring_desc)); vr->used = (void *)(((uintptr_t)&vr->avail->ring[num] *+ sizeof(__virtio16)* + align-1) & ~(align - 1));
The "+ sizeof(__virtio16)" properly accounts for the "used_event" in struct virtue_avail.
Hypervisor ACRN uses a similar scheme: virtio_vq_init @ /devicemodel/hw/pci/virtio/virtio.c
vq->avail = (struct vring_avail *)vb; vb += (2 + vq->qsize *+ 1*) * sizeof(uint16_t); /* Then it's rounded up to the next page... */ vb = (char *)roundup2((uintptr_t)vb, VIRTIO_PCI_VRING_ALIGN); /* ... and the last page(s) are the used ring. */ vq->used = (struct vring_used *)vb;
But Qemu uses: QEMU: virtio_queue_update_rings @ hw/virtio/virtio.c
vring->used = vring_align(vring->avail + offsetof(VRingAvail, ring[vring->num]), vring->align);
Linux alignment policies end up having vring->align values either 4096 (for MMIO) or 64 (PCI), and thus there are no visible issues.
If you use a different OS that choses an alignment of 4 (valid as per section 2.6) then Qemu does not calculate the same location for used and virtio does not work anymore. The OS actually works fine with the alignment of 4 with kvmtool and ACRN.
There are two other problems:
on the spec, the comment "/* Only if VIRTIO_F_EVENT_IDX */" on the avail structure is not clear wether if: - the field is always there but its content are only valid if... - the field may be absent altogether. Inferring from calculation formulae the field is always present, but some language would help clarifying this.
On 2.6.2, the alignment formula "#define ALIGN(x) (((x) + qalign) & ~qalign)" is true if "qalign" is actually a mask as in many parts of the spec, align is referred to as a power of 2. It may be good to change the text with something like: #define ALIGN(x) (((x) + (qalign - 1)) & ~(qalign -1)) /* where qalign is a power of Z */