PCIe For Hackers: Link Anatomy

Last time, we looked over diffpairs, their basics, routing rules and the notorious tolerances of PCIe when it comes to diffpairs. Now, let’s take a look at the exact signals that make PCIe tick, as well as give you an overview of which sockets you can get PCIe on.

I separate PCIe sockets into two categories – desktop sockets, which are the usual 1x, 16x, or perhaps x4 PCIe sockets we see on desktop motherboards — and mobile sockets: mPCIe and M.2. There are also sockets like ExpressCard that are still found on some older laptops, but they have definitely fallen out of relevance. On mobile, M.2 is the way forward when it comes to PCIe – if you’re interested, I’ve written a short series talking about everything M.2 before.

On these sockets, most signals will be optional and some signals will be socket-specific, but there are some signals required in any PCIe device. Of course, the first group of required signals is the PCIe diffpairs themselves.

Any PCIe interface has two kinds of diffpairs – TX and RX; for an x1 link, you will have one TX diffpair and one RX diffpair, and a x16 link needs sixteen of each. As you might expect, RX on one end connects to TX on another end, and vice-versa – it’s just like UART, but spicy. Of course, if you’re doing a PCIe extender or socket-to-socket adapter of some kind, you don’t need to do any swapping. If you’re designing a board for one of the PCIe-carrying extension standards, the usual naming is host-side. For example, say you’re putting a PCIe Ethernet controller IC onto a PCB that plugs into a desktop PCIe socket – connect TX and RX of the PCIe socket board to to your Ethernet chip’s RX and TX respectively.

If there’s strong indication that this convention has been violated, don’t fret – make sure to compare to an existing host or device, and see if you are able to determine whether the naming is as-expected or indeed, unexpectedly, flipped. Again, it’s just like UART! That said, there’s also AC coupling capacitors placed in series on both the TX and RX lines. Specifically in case of PCIe extension cards, be it M.2 or desktop PCIe, the convention is placing them on device-TX (host-RX) lines on the device board, and on the host-TX side (device-RX) on the host board. This alone is a pleasant divergence from UART when it comes to design and reverse-engineering.

The third kind of diffpair, one that’s invariably present, is REFCLK – a 100 MHz clock diffpair. It’s a speedy clock, and I’m not sure about its full purpose, but apparently, it plays a part in decoding the data on TX and RX, specifically, clock recovery – staying vital despite being orders of magnitudes slower than the GBit-range bandwidth of PCIe TX and RX pairs. Unlike with RX and TX, you cannot swap the REFCLK polarity willy-nilly. On the other hand, it won’t hurt as much if you have REFCLK pass through a set of vias or two in case you need to flip it – it’s just 100 MHz. Remember to respect your diffpairs, and you’ll be good.

REFCLK is required – except, theoretically, there’s cases where you can omit it, requiring both ends of the PCIe link to support recovering this clock from data passed through RX and TX pairs, or generate their own REFCLK. It sounds quite realistic, yet, REFCLK-less PCIe implementations aren’t really out there. Even SD-Express, a recent standard that brings PCIe to SD cards for blazing fast access speeds, didn’t go this route – the only consumer-available REFCLK-less implementation out in the wild is SATA-Express, a standard that became obsoleted by M.2 almost as soon as motherboards with SATA-Express ports hit the store shelves. As it stands, if you want your PCIe link to work, you absolutely need REFCLK.

And now bit more about the capacitors. You need them on both TX and RX for the link to work, with ranges from 75 nF to 265 nF depending on PCIe link generation: 200 nF is a good all-around value and 100 nF will work in a pinch. Other than that, go for 0402 or 0201 package, and that’s basically it. REFCLK doesn’t typically need the capacitors, but they shouldn’t hurt either, and certain devices might require them. Also, here’s something puzzling – there’s been at least one documented case of a cheap mass-manufactured WD SSD seemingly omitting its TX-side capacitors, failing to initialize with a custom PCIe host board up until extra capacitors were added on the mainboard side. It’s not something you should do by default – merely a peculiarity that one manufacturer had the gall to create.

Then, there’s a trio of single-ended control signals – CLKREQ, PEWAKE, and PERST. CLKREQ is used for power saving – the device usually sets CLKREQ low to start receiving REFCLK, and is able to de-assert CLKREQ (set it high) to signal the host that the host can stop providing REFCLK, saving power. The CLKREQ signal is peculiar – it’s present on mobile devices, but isn’t used on desktops, to the point where desktop PCIe slots didn’t even initially have a dedicated pin for it. As a result, the USB3-cable-perusing miner “GPU riser” boards do not pass CLKREQ through – only PEWAKE and PERST. If your host has a CLKREQ pin but the device doesn’t, ground it; if your device has a CLKREQ pin but the host doesn’t, ignore it, otherwise, connect them together.

Well, if the corner-cutting miners bother to pass PEWAKE and PERST signals through, then they must be important, right? Let’s see. PERST is the reset pin for the PCIe link, and the host holds it low until all the clocks (including REFCLK) and power rails are stable and other requirements are met for the host to start talking PCIe; it’s also used for some low-power mode management. PEWAKE is used for power management just like CLKREQ, and does seem to be required as well – this is also a device-side-controlled signal. These two signals are a bit mysterious to me when it comes to ways you can omit them, so I recommend having both of them connected unless they’re outright unavailable on one of the ends.

Desktop slots also have PRSNT signals – depending on the link width, you have to short PRSNT#1 to one of the PRSNT#2 signals. Not that this is required – PCIe is more than capable of detecting link width correctly; however, desktop motherboards do often need this in practice. It seems to be used for hotplug – something that PCIe is technically capable of, and which will even work if you use a system like Linux; Windows might just refuse to process PCIe hotplug unless you dance around your PC chanting magic words.

There’s I2C signals on both desktop PCIe slots and mobile slots like mPCIe or M.2. However, those signals are rarely ever connected on the mainboard – when they are, it seems to be either the mainboard manufacturer’s whim, or provisions for features like VPro. On server boards, they’re connected, causing fun things like some server-originated SAS controllers not operating on certain desktop boards with I2C wired up to the desktop-size slots. If you are wiring up a host-side PCIe slot, you don’t really need to wire I2C up, and if you’re wiring up a device, don’t expect I2C to reliably be there. Same goes for JTAG signals – they’re used in debugging or server environments, and are best left unconnected unless you’re certain your implementation benefits from them in the place where you’ll use it.

Mobile PCIe devices usually operate on 3.3 V power, whereas desktop devices can run from either 3.3 V or 12 V, for less or more powerful devices respectively. As a result, if you’re adding a desktop-sized PCIe socket and you don’t have 12 V handy, some desktop cards might randomly not work. x1 slots are expected to provide less power on 12 V than larger slots, and desktop motherboard manufacturers often take advantage of this leeway – if you’re designing an x1 card, don’t expect it to provide the full 75 W (66 W at 12 V), and if you’re designing a motherboard with an x1 slot, the good news is that you don’t need to provide the full 75 W.

Both the host side and the device side can have different link widths – the choices are x1, x2, x4, x8 and x16, where the number indicates the amount of pairs being used; due to the way that a PCIe links function at the low level, these have to be powers of two. As such, don’t expect a x3 link to work if you somehow connect three lanes from a host to a device – it will downgrade to x2. PCIe is flexible with regard to link widths – the host and the device settle on the highest common denominator. No matter if your host has a x4 link and your device is x16, or your host is x16 and your device is x4, they will settle on a x4 link width, using four pairs, not bothered that one of the ends has twelve more to spare. There are very rare exceptions to this – namely, some server-purpose cards; that said, even those are cases of “x8 card won’t function with a x4 connection” and a 1x link will still work as a fallback.

This adds quite a bit of flexibility – if either side doesn’t have enough lanes to match the other, you can always expect at least an x1 link to work nevertheless. For instance, you can put a x16 GPU into a desktop x1 slot by cutting a notch in the slot, and perhaps shorting the PRSNT pins on the slot – though, make sure that your card doesn’t want more slot-provided power than your 1x slot is willing to provide, which might very well be the case if you’re plugging an extra GPU into a cut-up 1x slot directly! When it comes to desktop PCIe slots and cards, you will see x1, x4, and x16 physical slot widths, and x8 slots and cards being mostly a server occurrence. There are some low-end GPUs that are actually x8 cards, but they use x16 physical slot card edge width, and same goes for x2 cards – they use x4 slot card edge width. You will rarely, if ever, see physical x2 and x8 cards or slots.

When doing a multi-lane link, you must make sure that you’re connecting lane 0 with 0, 1 with 1 and so on. There’s apparently an ability to reverse the entire link in case that can help you with your layout – say, in case of an x4 link, you could connect lane 0 to 3, 1 to 2, and so on. However, it’s not clear how widely that’s supported – it seems to be optional and not automagic.

Just like Ethernet, PCIe link speeds have been increasing with each generation, through gradual data rate increases, encoding type and other incremental improvements. Same story as with the link widths – there’s great forwards and backwards compatibility, a Gen1 device will work in a Gen5 slot, and a Gen5 device will work in a Gen1 slot, again, picking the highest common denominator. Of course, the link might end up being throttled from the expectation – if you plug a Gen3 x1 USB3 controller card into the Gen1 PCIe x1 slot on an old motherboard, expect the USB3 port throughput to be limited (and of course, same goes for connecting GPUs through a x1 link). However, this does mean that typically, you don’t need to worry about PCIe generations unless you’re aiming for a certain speed target.

Thankfully, if you have any PCIe questions that go seriously in-depth, there’s a lot of information online, and specifications for even the last-gen PCIe 5.0 are quite easy to find. It can be a bit dense compared to what you’re reading right now, but when you go in deep, it will help. Next time, let’s complete your mental map of PCIe, so that whenever you set out to stretch the limits of PCIe, you know how to approach it!