Comments on: CXL Borgs IBM’s OpenCAPI, Weaves Memory Fabrics With 3.0 Spec

By: Timothy Prickett Morgan

Timothy Prickett Morgan — Mon, 15 Aug 2022 03:07:51 +0000

In reply to JayN. That seems pretty accurate....hopefully in a good way.

By: JayN

JayN — Mon, 15 Aug 2022 02:25:55 +0000

I recall comments that cxl was intended to be a simpler solution, easy to implement in the GPUs or other accelerators. I wonder what are the solutions to maintain that simplicity, while the added opencapi and genz functions of cxl 3.0 are being globbed on.

Is it effectively all just a free-for-all after taking control of the pcie lanes?

By: Hubert

Hubert — Fri, 12 Aug 2022 09:28:36 +0000

In reply to Siamak Tavallaei. That makes perfect sense; thanks for the clarification!

By: HuMo

HuMo — Fri, 12 Aug 2022 08:55:31 +0000

In reply to Timothy Prickett Morgan. Thank you very much! I'm here all weak (he-he-he)!

By: Siamak Tavallaei

Siamak Tavallaei — Thu, 11 Aug 2022 20:20:07 +0000

In reply to Hubert. Note that each CXL 3.0 Fabric supports up to 4096 Devices, and each Root Port (RP) of a CPU may belong to a different CXL 3.0 Fabric. In other words, we are not limited to 4096 Devices in a “large” ensemble of CXL-connected platform/Fabric.

By: Just a Realist

Just a Realist — Thu, 11 Aug 2022 20:12:08 +0000

In reply to Timothy Prickett Morgan.

IIRC, PCI Express / CXL / CCIX do not support transparent end-to-end transaction retransmission, which means that if a link or switch fails, then everything below will be inaccessible and a good implementation will trigger containment. This is really no worse than DDR in one sense, but keep in mind that DDR DIMMs sit right next to the processor on the same motherboard and are not customer accessible. This is not the case in composable solutions where I/O, memory, and storage modules are customer accessible and are typically provisioned in an independent cable-attached enclosure which contain independent power, cooling, and management domains.

OpenCAPI was limited to point-to-point topologies within a single enclosure, so an intelligent implementation with multiple links could support end-to-end transaction retransmission. Gen-Z supported transparent link and end-to-end transaction retransmission, so multi-link components can transparently survive link and switch failure (surprise cable disconnects occur at a much higher rate than you might realize). It is unclear if CXL can or will adapt any of Gen-Z’s resiliency capabilities as the volume processor companies have always opposed such stating that these are “high-end” features, and such capabilities are best built into the architecture core from the start (retrofits rarely work and even when they do, they come with a lot of caveats). However, I suspect many customers will be reluctant to deploy composable infrastructures en masse without such features. Also, keep in mind that cloud providers build solutions where they assume everything can and will fail, hence, don’t they care about such resiliency and are highly unlikely to deploy composable infrastructure as it does not fit their cost and operating / execution models.

By: Timothy Prickett Morgan

Timothy Prickett Morgan — Thu, 11 Aug 2022 11:39:13 +0000

In reply to HuMo.

You are very funny.

But seriously, from the GPU’s point of view, the CPU is just an serial accelerator that handles some I/O housekeeping tasks and has a huge block of cheap and slow memory that acts more or less like an L4 cache for its HBM…..

By: HuMo

HuMo — Thu, 11 Aug 2022 04:25:18 +0000

In reply to Just a Realist. I wonder if a device that combines 80GB or 128GB of HBM, with a fast processing unit, could be considered computational memory (these would be the A100 and MI250x)? The processor writes data to it, writes code to it, and then reads off the results from it. Maybe it requires too much power or is too small relative to expected conventional memory size to be considered computational memory (?). I wonder (and don't know the answer to this) where the line is drawn between computational memory, and a "large" chunk of memory fronted by a processor within a larger computational system.

By: Dave Simonson

Dave Simonson — Thu, 11 Aug 2022 03:18:22 +0000

In reply to Timothy Prickett Morgan. So you could have a whole big block of remote memory suddenly taken off line due to a PCIe failure. Ouch!

By: Timothy Prickett Morgan

Timothy Prickett Morgan — Wed, 10 Aug 2022 22:53:50 +0000

In reply to Just a Realist. Ouch. Good points, though. Hope is cheap, I know, and execution is expensive, I know. But maybe this time really will be different. Sometimes, it is.