Using PCI Express and Conventional PCI Devices with Q35 Virtual Machines

Updated

Summary

This article describes how to use PCI Express and conventional PCI devices with Intel Q35-based virtual machines. This article is relevant to Red Hat Virtualization, Red Hat Enterprise Linux, and Red Hat OpenStack Platform.

Q35 is a native PCI Express chipset. Most of the devices attached to a guest using this chipset will be PCI Express. There is, however, limited support for conventional PCI devices where no equivalent PCI Express device is available.

For more information, see the following external resources:

1. Device Support

The Q35 chipset supports the following devices:

TypeDevice
PCI ExpressMost devices, including Virtio devices, devices that use the e1000e driver, USB controllers
Conventional PCIDevices that either do not have a PCI Express equivalent or are usually integrated devices, such as graphic cards, sound cards, and pvpanic devices



If a NIC is required, a Virtio NIC should be used. If the Virtio NIC cannot be used because it is not supported by the guest OS, the e1000e NIC should be used because it is the only PCIe NIC available.

qemu-xhci should be used, unless the guest OS is so old that it does not support USB3. Only USB3 controllers (qemu-xhci and nec-xhci) are PCIe. All USB1 and USB2 controllers are conventional PCI, and should be connected to the root bus if they are used at all.

The sound controller model ich9 should be used. libvirt automatically connects the device to slot 0x1b of the root bus, which is a better option than manually assigning PCI addresses. (This particular address assignment is made because real Q35 hardware has an ich9 sound device at 00:1b.) For sound devices, only QEMU has conventional PCI devices (no PCIe sound devices). A conventional PCI sound device can be used in a Q35 domain as long as it is forced to connect to the root bus as an integrated conventional PCI device.

Only a subset of available PCI controllers is supported. See This content is not included.PCI Express and Conventional PCI Device Support for details.

2. Device Placement

Some care must be taken in deciding device placement when the Q35 chipset is in use. libvirt, by default, follows the recommendations described below. In most cases, no user interaction is required. However, hot-plugging for the guest must planned in advance.

2.1. PCI Devices and Slots

QEMU does not have a clear socket-to-device matching mechanism. It allows both conventional PCI and PCI Express devices to be plugged into any PCI slot. Plugging a conventional PCI device into a PCI Express slot, however, might not always work because this arrangement cannot be done on bare metal, due to the physical differences between the connectors. The converse, that is, plugging a PCI Express device into a conventional PCI slot, hides the extended configuration space. Thus, it is not recommended to combine conventional PCI and PCI Express devices and slots.

The recommended approach is to separate the PCI Express and the conventional PCI hierarchies: PCI Express devices should only be plugged into PCI Express root ports.

libvirt follows these recommendations and avoids plugging devices into inappropriate slots. If required, however, the user can override libvirt's choices by providing an explicit PCI address:

<interface type='network'>
  <model type='Virtio'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</interface>

This configuration plugs the Virtio network adapter directly into the default root complex, as an integrated endpoint, rather than into a PCI Express root port.

2.2. PCI Express and Conventional PCI Device Support

PCI Express and conventional PCI devices can be connected to a machine that uses the Q35 chipset, either directly into the root complex, as PCI Express integrated endpoints, or into the generic PCI Express root port, as in the following example:

  <controller type='pci' model='pcie-root-port'>
    <model name='pcie-root-port'/>
  </controller>

Other PCI controllers are available but not supported. libvirt does not try to use these controllers for supported devices.

2.3. Default Root Complex

Only the following device types should be placed directly on the default root complex (pcie.0):

  • Conventional PCI devices (e.g., network cards, graphics card, IDE controller), which will be integrated endpoints:

    <video>
      <model type='qxl'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    
    • Virtio or e1000e network cards should be used. Other network cards should be used only if the Virtio or e1000e network cards will not work with your guest.

    • A Virtio-gpu video device will be placed on a PCI Express root port by libvirt. All other video devices will be placed in slot 1 of bus 0.

    • The Q35 chipset uses the SATA or SCSI disk controller. IDE controllers are not supported.

    Note: Integrated endpoints are not hot-pluggable.

    You can use PCI Express devices as integrated endpoints, but most existing hardware uses PCI devices in the root complex as integrated endpoints.

  • PCI Express root ports, for starting exclusively PCI Express hierarchies:

    <controller type='pci' model='pcie-root-port'/>
    
  • Additional root complexes, if multiple PCI Express root buses are required:

    <controller type='pci' model='pcie-expander-bus'/>
    

    Technology Preview

Root Bus Devices

2.4. PCI Express-only Hierarchy

Only PCI Express root ports may be used to start PCI Express hierarchies.

The Q35 PCI Express root bus (pcie.0) has 30 empty slots, 1 to 0x1e. Slot 0 is used by the root complex and slot 0x1f is used by default devices that are part of the chipset and cannot be removed (e.g., the integrated SATA controller). All 32 slots of a PCI Express Expander controller are usable.

Because each PCI Express root port occupies a single slot and each slot can support up to 8 functions, the maximum possible number of PCI Express root ports per PCI Express root bus is 240 for pcie.0 and 256 for pcie-expander-bus.

It is preferable to group PCI Express root ports into multi-function devices to maintain a simple, flat hierarchy that is sufficient for most scenarios. The configuration will look like this:

<controller type='pci' index='1' model='pcie-root-port'>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
</controller>
<controller type='pci' index='2' model='pcie-root-port'>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
</controller>
<controller type='pci' index='3' model='pcie-root-port'>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
</controller>

libvirt automatically adds as many PCI Express root ports as are required to connect all the PCI Express devices in the guest, and will automatically aggregate multiple pcie-root-ports into a single slot when possible, to take full advantage of the available root complex slots. Since 240 devices can be supported with only pcie-root-ports, this should be sufficient, except in the most extreme cases.

PCI Express-Only Hierarchy

2.5. Conventional PCI-only Hierarchy

Conventional PCI devices can only be plugged into the PCI Express root complex as integrated endpoints. Only devices that do not have a PCI Express equivalent or are usually integrated devices (for example, graphic cards, sound cards, pv-panic) should be used this way.

3. Hot-Plugging

The PCI Express root buses (pcie.0 and the buses exposed by pcie-expander-bus controllers) do not support hot-plugging, so the following devices, plugged into root complexes, cannot be hot-plugged/unplugged:

  • PCI Express integrated endpoints
  • PCI Express root ports
  • pcie-expander-bus Technology preview

PCI Express devices can be natively hot-plugged into PCI Express root ports.

3.1. Planning for Hot-Plugging

  • PCI Express hierarchy: you must leave a sufficient number of PCI Express root ports empty. You should use the multifunction PCI Express root ports (up to 8 ports per slot) on the root complex(es), in order to keep the hierarchy as flat as possible and to conserve PCI bus numbers.

    When a Q35 domain is first created, libvirt automatically adds enough pcie-root-ports for all PCI Express devices, plus an extra, unused pcie-root-port that can be used for hot-plugging. After this point, libvirt does not make any attempt to keep extra pcie-root-ports available. If you want to ensure that you always have a slot available for hot-plugging, you must explicitly add another pcie-root-port after adding a new PCI Express device. If you add the extra pcie-root-port at the same time as the new device, libvirt will use the pcie-root-port you just added to connect the new device, instead of adding an additional pcie-root-port.

    Note that pcie-root-ports themselves cannot be hot-plugged. Therefore, unlike with the i440fx, the user or management application must plan ahead for hot-plugging, making sure that enough unused pcie-root-ports are available before starting the virtual machine. Adding a new pcie-root-port to the domain configuration is straightforward, because in most cases you will not need to change the default values. To add a pcie-root-port to the <devices> section, add the following:

      <controller type='pci' model='pcie-root-port'/>
    
  • PCI hierarchy: hot-plugging is not supported for PCI devices.

4. Device Assignment

Host devices are mostly PCI Express and should be plugged only into PCI Express root ports.

To determine whether a device is PCI Express, run # lspci -s 03:00.0 -v as root user:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83)
Subsystem: Intel Corporation Dual Band Wireless-AC 7260
Flags: bus master, fast devsel, latency 0, IRQ 50
Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
Capabilities: [c8] Power Management version 3
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [40] Express Endpoint, MSI 00
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20
Capabilities: [14c] Latency Tolerance Reporting
Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014

If you see the Express Endpoint capability in the output, the device is PCI Express.

5. Virtio Devices

Virtio devices plugged into the root complex as PCI Express integrated endpoints appear as conventional PCI devices and have transitional behavior by default.

Transitional Virtio devices work in both IO and MMIO modes, depending on guest support. The guest firmware assigns both IO and MMIO resources to transitional Virtio devices.

Virtio devices plugged into PCI Express root ports appear as PCI Express devices and have modern (Virtio 1.0) behavior by default, without IO support. This configuration results in better utilization of the IO space. Given the limited availability of IO space (see IO space issues), any other configuration may quickly lead to resource exhaustion, and is therefore strongly discouraged.

6. IO Space Issues

The PCI Express root ports are seen by the firmware and guest operating system as PCI-PCI bridges. Although some PCI Express devices may request IO space, they are required to work properly using only MMIO. According to the PCI specification, each PCI-PCI bridge requires an IO range of 4000 bytes, even though only one (multi-function) device can be plugged into a single port. This results in poor IO space utilization.

The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations by not allocating IO space for each PCI Express root port if:

  • The port is empty, or
  • The device behind the port has no IO BARs (base address registers)

The IO space is very limited, to 65,536 byte-wide IO ports, and may even be fragmented by fixed IO ports owned by platform devices. This will result in, at most, 10 PCI Express root ports per system if devices with IO BARs are used in the PCI Express hierarchy. Using the proposed device placement strategy solves this issue by using only PCI Express devices within a PCI Express hierarchy.

The PCI Express specification requires that PCI Express devices work properly without using IO ports. The PCI hierarchy has no such limitations, so a conventional PCI device may require IO space. This is another reason why conventional PCI devices are not recommended.

7. Bus Numbering Issues

Each Q35 machine can have a maximum of 256 PCI buses. Each PCI controller counts as a bus, whether it is pcie.0, pcie-root-port, pcie-expander-bus, or an unsupported PCI controller.

Each element of the PCI Express hierarchy (root complexes, PCI Express root ports) uses one bus number. Since only one (multi-function) device can be attached to a PCI Express root port, it is advisable to plan in advance for the expected number of devices, to prevent bus number exhaustion.

The 0..255 bus number space can be partitioned using the busNr attribute of pcie-expander-bus controllers:

<controller type='pci' model='pcie-expander-bus'>
  <target busNr='250'/>
</controller>

In this example, additional buses attached to pcie.0 have a busNr between 1 and 249, while buses attached to this pcie-expander-bus have a busNr from 251 to 255. Note that busNr is not in any way related to the bus index used by libvirt or to the bus attribute in the PCI addresses of PCI devices. The bus attribute of a PCI device's address should be set to the index of the controller the device should be attached to, not to the busNr of the controller.

All bus numbers must comply with the partitioning.

Category
Article Type