Comments on: Just How Bad Is CXL Memory Latency? https://www.nextplatform.com/2022/12/05/just-how-bad-is-cxl-memory-latency/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Thu, 23 May 2024 09:35:45 +0000 hourly 1 https://wordpress.org/?v=6.7.1 By: Matthias Urlichs https://www.nextplatform.com/2022/12/05/just-how-bad-is-cxl-memory-latency/#comment-224687 Thu, 23 May 2024 09:35:45 +0000 https://www.nextplatform.com/?p=141615#comment-224687 “Complimenting these servers” — well, duh. Complementing.

]]>
By: Just a Realist https://www.nextplatform.com/2022/12/05/just-how-bad-is-cxl-memory-latency/#comment-202589 Sat, 24 Dec 2022 15:28:39 +0000 https://www.nextplatform.com/?p=141615#comment-202589 In reply to Timothy Prickett Morgan.

Memory performance is gated primarily by memory device design, organization and access not the interconnect used to access it. Increasing interconnect speed provides modest incremental improvement which is why processor vendors have focused on increasing the number of DDR memory channels at great cost and increased system complexity. This increased cost and complexity were the primary reason behind using serial interconnects to augment and perhaps one day replace DDR. Unlike DDR, serial interconnects rely on a media controller to abstract memory–underlying device type, number, access, etc. In essence, a media controller splits off the lower half of a memory controller which allows memory devices, functionality (caching, accelerator, etc.), mechanical module packaging, etc. to be optimized in ways that processor memory controllers could never support. As such, media controller innovation will ultimately drive system performance which will enable processor simplification as large memory channel pin counts and memory controller complexity are eliminated along with the accompanying power consumption. Ultimately, the interconnect itself won’t matter that much as everything is data access is once again simplified to be basic reads and writes.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2022/12/05/just-how-bad-is-cxl-memory-latency/#comment-202196 Thu, 15 Dec 2022 19:08:54 +0000 https://www.nextplatform.com/?p=141615#comment-202196 In reply to JayN.

We might need local DDR until PCI-Express 6 or 7. But in a funny way, I think flash has to be the new tape, and we need to figure out a way to add a lot more memory to the CPUs. A lot more.

]]>
By: JayN https://www.nextplatform.com/2022/12/05/just-how-bad-is-cxl-memory-latency/#comment-202163 Thu, 15 Dec 2022 04:35:30 +0000 https://www.nextplatform.com/?p=141615#comment-202163 Looks like on-chip HBM + off-chip CXL memory pools + a DSA or DPU could be orchestrated to prefetch the anticipated memory blocks, perhaps under direct user control, or perhaps using some ai, analogous to the branch predictors.

]]>
By: RobYoung https://www.nextplatform.com/2022/12/05/just-how-bad-is-cxl-memory-latency/#comment-202001 Sun, 11 Dec 2022 18:33:53 +0000 https://www.nextplatform.com/?p=141615#comment-202001 You get around to mentioning IBM in passing, and we see that a memory hop to “other” socket is about 3x to 4x better in Power vs. CXL memory: https://www.nextplatform.com/?s=Memory+area+network+power10 … I guess with CXL-based memory trays in disaggregated futures, be aware things will surely run slower if you move latency sensitive apps to such a solution. Sound about right?

]]>
By: Anton Gavriliuk https://www.nextplatform.com/2022/12/05/just-how-bad-is-cxl-memory-latency/#comment-201804 Tue, 06 Dec 2022 22:34:57 +0000 https://www.nextplatform.com/?p=141615#comment-201804 Memory tiering with MGLRU support will solve this.

]]>
By: Mark Hahn https://www.nextplatform.com/2022/12/05/just-how-bad-is-cxl-memory-latency/#comment-201803 Tue, 06 Dec 2022 21:27:12 +0000 https://www.nextplatform.com/?p=141615#comment-201803 Ok, so only a single digit times higher latency. Now, if youre incredibly latency tolerance and embrace the disaggregated concept, more than a few CPU’s and GPUs will be contending for access to the aggregated memory. How will latency react to contention? Is it hard to build a cxl switching complex that can sustain tens of GBps from numerous hosts at the same time?

]]>