Comments on: How Facebook Architects Around Silent Data Corruption https://www.nextplatform.com/2021/03/01/facebook-architects-around-silent-data-corruption/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Sat, 14 May 2022 22:29:11 +0000 hourly 1 https://wordpress.org/?v=6.7.1 By: patrick convert https://www.nextplatform.com/2021/03/01/facebook-architects-around-silent-data-corruption/#comment-191042 Sat, 14 May 2022 22:29:11 +0000 http://www.nextplatform.com/?p=137987#comment-191042 OK, facebook reinventing the wheel.
Nothing new. Google does the same analysis 10 years ago about ram corruption and concluded that the main case was bad hardware computer board engineering of the memory subsystem (noise on memory bus, ground bounces on random corner case and lack/miss placed power supply capacitors) and not SEU -at sea level in computer centers-.
To avoid miss calculation either systematic or random, multiple well known technics exists : mainly redundancy (with diversification, either by hardware using 2 out of 2 technics, or performing the same operation using algorithmic diversification ) or coherence checks and retry or fallback values. On real-time application, this is very common to ensure global robustness to random short term failure of interfaces.

]]>