Comments on: Just How Bad Is CXL Memory Latency?

By: Matthias Urlichs

Matthias Urlichs — Thu, 23 May 2024 09:35:45 +0000

“Complimenting these servers” — well, duh. Complementing.

By: Just a Realist

Just a Realist — Sat, 24 Dec 2022 15:28:39 +0000

In reply to Timothy Prickett Morgan. Memory performance is gated primarily by memory device design, organization and access not the interconnect used to access it. Increasing interconnect speed provides modest incremental improvement which is why processor vendors have focused on increasing the number of DDR memory channels at great cost and increased system complexity. This increased cost and complexity were the primary reason behind using serial interconnects to augment and perhaps one day replace DDR. Unlike DDR, serial interconnects rely on a media controller to abstract memory--underlying device type, number, access, etc. In essence, a media controller splits off the lower half of a memory controller which allows memory devices, functionality (caching, accelerator, etc.), mechanical module packaging, etc. to be optimized in ways that processor memory controllers could never support. As such, media controller innovation will ultimately drive system performance which will enable processor simplification as large memory channel pin counts and memory controller complexity are eliminated along with the accompanying power consumption. Ultimately, the interconnect itself won't matter that much as everything is data access is once again simplified to be basic reads and writes.

By: Timothy Prickett Morgan

Timothy Prickett Morgan — Thu, 15 Dec 2022 19:08:54 +0000

In reply to JayN. We might need local DDR until PCI-Express 6 or 7. But in a funny way, I think flash has to be the new tape, and we need to figure out a way to add a lot more memory to the CPUs. A lot more.

By: JayN

JayN — Thu, 15 Dec 2022 04:35:30 +0000

Looks like on-chip HBM + off-chip CXL memory pools + a DSA or DPU could be orchestrated to prefetch the anticipated memory blocks, perhaps under direct user control, or perhaps using some ai, analogous to the branch predictors.

By: RobYoung

RobYoung — Sun, 11 Dec 2022 18:33:53 +0000

You get around to mentioning IBM in passing, and we see that a memory hop to “other” socket is about 3x to 4x better in Power vs. CXL memory: https://www.nextplatform.com/?s=Memory+area+network+power10 … I guess with CXL-based memory trays in disaggregated futures, be aware things will surely run slower if you move latency sensitive apps to such a solution. Sound about right?

By: Anton Gavriliuk

Anton Gavriliuk — Tue, 06 Dec 2022 22:34:57 +0000

Memory tiering with MGLRU support will solve this.

By: Mark Hahn

Mark Hahn — Tue, 06 Dec 2022 21:27:12 +0000

Ok, so only a single digit times higher latency. Now, if youre incredibly latency tolerance and embrace the disaggregated concept, more than a few CPU’s and GPUs will be contending for access to the aggregated memory. How will latency react to contention? Is it hard to build a cxl switching complex that can sustain tens of GBps from numerous hosts at the same time?