Comments on: How Nvidia Blackwell Systems Attack 1 Trillion Parameter AI Models https://www.nextplatform.com/2024/03/19/how-nvidia-blackwell-systems-attack-1-trillion-parameter-ai-models/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Tue, 28 May 2024 15:50:50 +0000 hourly 1 https://wordpress.org/?v=6.7.1 By: Timothy Prickett Morgan https://www.nextplatform.com/2024/03/19/how-nvidia-blackwell-systems-attack-1-trillion-parameter-ai-models/#comment-222886 Mon, 08 Apr 2024 00:57:18 +0000 https://www.nextplatform.com/?p=143842#comment-222886 In reply to TooOldToDoItAgain.

I have been told in the past that I over-value coherence, and I have tried to not think the coherency domain should be the size of a datacenter. Which would be my natural inclination. (Smile) Very good points above, and I will take this as an assignment.

]]>
By: TooOldToDoItAgain https://www.nextplatform.com/2024/03/19/how-nvidia-blackwell-systems-attack-1-trillion-parameter-ai-models/#comment-222843 Sat, 06 Apr 2024 23:29:03 +0000 https://www.nextplatform.com/?p=143842#comment-222843 About memory coherence, this has been an industry debate (fight?) for more years than I care to remember. Mellanox (pre-acquisition) has always been marketing the benefits, but always fell short on the specifics of why it mattered to (standard) software. Sure, simplification of the programming model for pointer chasing – but ignorance of the physical topology in commercial (or research for that matter) has time and time again shown the limits.
Perhaps now that NLV72 is more of an appliance with somewhat of a singular workload (vs. trying to optimize for DB / Web / HPC….) coherence will showcase the market pull use case vs. the technology market push inventions. Are we saying the in-net reduction operations (“collections” as Jensen referred to them) are the killer app?

TPM does the absolute best in his writing, and asks the hard questions that hold technology peddlers proverbial feet to the fire to explain why their proposed value matters to customers. I was involved in CAPI, CCIX, but thankfully not Gen-Z. I hold patents in cloud and RDMA technology inventions. This is coming from a place of intimate knowledge.

Timothy, I humbly submit to you a request to deep dive on coherence (which to some extent would be an update from Nov 23, 2021 “FINALLY, A COHERENT INTERCONNECT STRATEGY: CXL ABSORBS GEN-Z”). I think the consumers of this tech needs the answers from a knowledgeable neutral party that can interrogate vendors, software developers/ISV’s on why memory coherence matters. I hope the answer isn’t “because CSP’s requested it to help their memory disaggregation dreams come true” as that dog just doesn’t TCO hunt.

Respect.

]]>
By: Calamity Jim https://www.nextplatform.com/2024/03/19/how-nvidia-blackwell-systems-attack-1-trillion-parameter-ai-models/#comment-222039 Wed, 20 Mar 2024 13:00:34 +0000 https://www.nextplatform.com/?p=143842#comment-222039 Happy Spring Equinox (and Cherry Blossoms) to all!

With these upgrades, It looks like they can now pretty-much make an MS Eagle-equivalent machine (Top500 #3) with an 8×8 array of DXG Grace-Blackwell GB200 NVL72 racks … which is impressively compact! No wonder Jensen can bake them in his home’s kitchen oven! (eh-eh-eh!)

]]>