• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Bug in AMD EPYC "Rome" Processors Puts Them to Sleep After 34 Months of Uptime

Joined
May 30, 2015
Messages
1,924 (0.56/day)
Location
Seattle, WA
AMD recently published an errata for their second generation EPYC processors based on Zen 2 which states that, "A core will fail to exit CC6 after about 1044 days after the last system reset." 1044 days is roughly 34 months, or just shy of 3 years of total uptime, and is actually an over estimate according to some sysadmin sleuths on Reddit and Twitter that did the math and discovered the actual time is 1042 days and 12 hours. The problem occurs because the CPU REFCLK counts 10ns ticks in a 54-bit signed integer, and if you count just over 9 quadrillion of these ticks you get the resulting overflow at 1042.4999 days. Once this overflow occurs the cores are stuck forever in a zombie state, and will not take any external interrupt requests. Well, forever until you flip the power switch off and back on again, which will reset the counter.

It's certainly impressive that this problem was discovered at all, as it suggests that more than a single system has been running for almost three years straight without a single restart. Though this does put EPYC "Rome" out of the running for any possible awards for longest running systems, it may serve as a reminder to initiate system updates or patches for other vulnerabilities that have been discovered in the four years since that generation of processor were first launched. AMD does not plan to issue any fix for the CC6 bug, instead recommending that administrators disable CC6 to avoid the cores entering the zombified state, or simply initiating a restart every once in awhile before the time limit expires.



View at TechPowerUp Main Site | Source
 
Joined
Jan 29, 2012
Messages
6,873 (1.47/day)
Location
Florida
System Name natr0n-PC
Processor Ryzen 5950x-5600x | 9600k
Motherboard B450 AORUS M | Z390 UD
Cooling EK AIO 360 - 6 fan action | AIO
Memory Patriot - Viper Steel DDR4 (B-Die)(4x8GB) | Samsung DDR4 (4x8GB)
Video Card(s) EVGA 3070ti FTW
Storage Various
Display(s) Pixio PX279 Prime
Case Thermaltake Level 20 VT | Black bench
Audio Device(s) LOXJIE D10 + Kinter Amp + 6 Bookshelf Speakers Sony+JVC+Sony
Power Supply Super Flower Leadex III ARGB 80+ Gold 650W | EVGA 700 Gold
Software XP/7/8.1/10
Benchmark Scores http://valid.x86.fr/79kuh6
reminds me of y2k times
 
Joined
Jul 15, 2020
Messages
1,017 (0.65/day)
System Name Dirt Sheep | Silent Sheep
Processor i5-2400 | 13900K (-0.02mV offset)
Motherboard Asus P8H67-M LE | Gigabyte AERO Z690-G, bios F29e Intel baseline
Cooling Scythe Katana Type 1 | Noctua NH-U12A chromax.black
Memory G-skill 2*8GB DDR3 | Corsair Vengeance 4*32GB DDR5 5200Mhz C40 @4000MHz
Video Card(s) Gigabyte 970GTX Mini | NV 1080TI FE (cap at 50%, 800mV)
Storage 2*SN850 1TB, 230S 4TB, 840EVO 128GB, WD green 2TB HDD, IronWolf 6TB, 2*HC550 18TB in RAID1
Display(s) LG 21` FHD W2261VP | Lenovo 27` 4K Qreator 27
Case Thermaltake V3 Black|Define 7 Solid, stock 3*14 fans+ 2*12 front&buttom+ out 1*8 (on expansion slot)
Audio Device(s) Beyerdynamic DT 990 (or the screen speakers when I'm too lazy)
Power Supply Enermax Pro82+ 525W | Corsair RM650x (2021)
Mouse Logitech Master 3
Keyboard Roccat Isku FX
VR HMD Nop.
Software WIN 10 | WIN 11
Benchmark Scores CB23 SC: i5-2400=641 | i9-13900k=2325-2281 MC: i5-2400=i9 13900k SC | i9-13900k=37240-35500

Bug Feature in AMD EPYC "Rome" Processors Puts Them to Sleep After 34 Months of Uptime​

 
Last edited:
Joined
May 19, 2009
Messages
1,860 (0.33/day)
Location
Latvia
System Name Personal \\ Work - HP EliteBook 840 G6
Processor 7700X \\ i7-8565U
Motherboard Asrock X670E PG Lightning
Cooling Noctua DH-15
Memory G.SKILL Trident Z5 RGB Black 32GB 6000MHz CL36 \\ 16GB DDR4-2400
Video Card(s) ASUS RoG Strix 1070 Ti \\ Intel UHD Graphics 620
Storage 2x KC3000 2TB, Samsung 970 EVO 512GB \\ OEM 256GB NVMe SSD
Display(s) BenQ XL2411Z \\ FullHD + 2x HP Z24i external screens via docking station
Case Fractal Design Define Arc Midi R2 with window
Audio Device(s) Realtek ALC1150 with Logitech Z533
Power Supply Corsair AX860i
Mouse Logitech G502
Keyboard Corsair K55 RGB PRO
Software Windows 11 \\ Windows 10
While the bug is stupid and should not have happened 3 years of uptime gives me shivers.
Patch and reboot your damn stuff, FFS.
 
Joined
Sep 6, 2013
Messages
3,307 (0.81/day)
Location
Athens, Greece
System Name 3 desktop systems: Gaming / Internet / HTPC
Processor Ryzen 5 5500 / Ryzen 5 4600G / FX 6300 (12 years latter got to see how bad Bulldozer is)
Motherboard MSI X470 Gaming Plus Max (1) / MSI X470 Gaming Plus Max (2) / Gigabyte GA-990XA-UD3
Cooling Νoctua U12S / Segotep T4 / Snowman M-T6
Memory 32GB - 16GB G.Skill RIPJAWS 3600+16GB G.Skill Aegis 3200 / 16GB JUHOR / 16GB Kingston 2400MHz (DDR3)
Video Card(s) ASRock RX 6600 + GT 710 (PhysX)/ Vega 7 integrated / Radeon RX 580
Storage NVMes, ONLY NVMes/ NVMes, SATA Storage / NVMe boot(Clover), SATA storage
Display(s) Philips 43PUS8857/12 UHD TV (120Hz, HDR, FreeSync Premium) ---- 19'' HP monitor + BlitzWolf BW-V5
Case Sharkoon Rebel 12 / CoolerMaster Elite 361 / Xigmatek Midguard
Audio Device(s) onboard
Power Supply Chieftec 850W / Silver Power 400W / Sharkoon 650W
Mouse CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Keyboard CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Software Windows 10 / Windows 10&Windows 11 / Windows 10
I read somewhere some comments about Linux being able to update itself without rebooting, so there it does present a kind of a problem if someone needs to have a non stop server running until the Apocalypse.
 
Joined
Nov 6, 2014
Messages
113 (0.03/day)
Processor Intel i7 13700K
Motherboard ASUS PROArt Z690 Creator WiFi
Cooling Liquid Freezer II - 280
Memory Kingston 32GB DDR5 @ 6200 MT/s
Video Card(s) Palit RTX3070 GamingPRO
Storage TrueNAS CORE
Case Phanteks ECLIPSE P600S
Audio Device(s) Creative Sound Blaster AE-5
Power Supply SEASONIC CONNECT 750W
While the bug is stupid and should not have happened 3 years of uptime gives me shivers.
Patch and reboot your damn stuff, FFS.
live patching in GNU LInux exists, the servers don't need to be rebooted
 
Joined
Apr 12, 2013
Messages
7,473 (1.77/day)
Yeah well it's a good idea rebooting your damn systems at least once in 3 years. Though depending on what they're used for, or shared by, it won't be easy.

reminds me of y2k times
You mean the end times? Or 2012 :laugh:

Nostra, Incans(?) & Mayans sure did a number on millions at the time :slap:
 
Joined
Jan 29, 2012
Messages
6,873 (1.47/day)
Location
Florida
System Name natr0n-PC
Processor Ryzen 5950x-5600x | 9600k
Motherboard B450 AORUS M | Z390 UD
Cooling EK AIO 360 - 6 fan action | AIO
Memory Patriot - Viper Steel DDR4 (B-Die)(4x8GB) | Samsung DDR4 (4x8GB)
Video Card(s) EVGA 3070ti FTW
Storage Various
Display(s) Pixio PX279 Prime
Case Thermaltake Level 20 VT | Black bench
Audio Device(s) LOXJIE D10 + Kinter Amp + 6 Bookshelf Speakers Sony+JVC+Sony
Power Supply Super Flower Leadex III ARGB 80+ Gold 650W | EVGA 700 Gold
Software XP/7/8.1/10
Benchmark Scores http://valid.x86.fr/79kuh6
Yeah well it's a good idea rebooting your damn systems at least once in 3 years. Though depending on what they're used for, or shared by, it won't be easy.


You mean the end times? Or 2012 :laugh:

Nostra, Incans(?) & Mayans sure did a number on millions at the time :slap:
year 2000 bios bug/limitation with older pc
 
Joined
Apr 12, 2013
Messages
7,473 (1.77/day)
Yes I meant there were also lots of weird predictions around those years, like I mentioned 99/2k & 2012 recently. I wonder when's the next world ending event supposed to come!
 
Joined
Jan 29, 2012
Messages
6,873 (1.47/day)
Location
Florida
System Name natr0n-PC
Processor Ryzen 5950x-5600x | 9600k
Motherboard B450 AORUS M | Z390 UD
Cooling EK AIO 360 - 6 fan action | AIO
Memory Patriot - Viper Steel DDR4 (B-Die)(4x8GB) | Samsung DDR4 (4x8GB)
Video Card(s) EVGA 3070ti FTW
Storage Various
Display(s) Pixio PX279 Prime
Case Thermaltake Level 20 VT | Black bench
Audio Device(s) LOXJIE D10 + Kinter Amp + 6 Bookshelf Speakers Sony+JVC+Sony
Power Supply Super Flower Leadex III ARGB 80+ Gold 650W | EVGA 700 Gold
Software XP/7/8.1/10
Benchmark Scores http://valid.x86.fr/79kuh6
Yes I meant there were also lots of weird predictions around those years, like I mentioned 99/2k & 2012 recently. I wonder when's the next world ending event supposed to come!
Well no more world actually ending stuff. Lots of plaques and shit according to prophesy and what a persons beliefs are. If waters turn to blood I would panic if I didnt know about it.
 
Joined
Jan 3, 2021
Messages
3,436 (2.46/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Yes I meant there were also lots of weird predictions around those years, like I mentioned 99/2k & 2012 recently. I wonder when's the next world ending event supposed to come!
That's easy to answer, unless IBM, Intel and everyone else rip the remaining 32-bit abilities out of their processors by then.

I read somewhere some comments about Linux being able to update itself without rebooting, so there it does present a kind of a problem if someone needs to have a non stop server running until the Apocalypse.
To reach that goal, you better have servers with hot-swappable CPUs too. I don't know much about that ability but Wikipedia says it's "common". So you pull out a CPU and put it back in, and hopefully that timer will be reset.
 
Low quality post by P4-630
Joined
Jan 5, 2006
Messages
18,585 (2.70/day)
System Name AlderLake
Processor Intel i7 12700K P-Cores @ 5Ghz
Motherboard Gigabyte Z690 Aorus Master
Cooling Noctua NH-U12A 2 fans + Thermal Grizzly Kryonaut Extreme + 5 case fans
Memory 32GB DDR5 Corsair Dominator Platinum RGB 6000MT/s CL36
Video Card(s) MSI RTX 2070 Super Gaming X Trio
Storage Samsung 980 Pro 1TB + 970 Evo 500GB + 850 Pro 512GB + 860 Evo 1TB x2
Display(s) 23.8" Dell S2417DG 165Hz G-Sync 1440p
Case Be quiet! Silent Base 600 - Window
Audio Device(s) Panasonic SA-PMX94 / Realtek onboard + B&O speaker system / Harman Kardon Go + Play / Logitech G533
Power Supply Seasonic Focus Plus Gold 750W
Mouse Logitech MX Anywhere 2 Laser wireless
Keyboard RAPOO E9270P Black 5GHz wireless
Software Windows 11
Benchmark Scores Cinebench R23 (Single Core) 1936 @ stock Cinebench R23 (Multi Core) 23006 @ stock
Rest in peace zombies!...
 
Joined
Oct 1, 2006
Messages
4,930 (0.75/day)
Location
Hong Kong
Processor Core i7-12700k
Motherboard Z690 Aero G D4
Cooling Custom loop water, 3x 420 Rad
Video Card(s) RX 7900 XTX Phantom Gaming
Storage Plextor M10P 2TB
Display(s) InnoCN 27M2V
Case Thermaltake Level 20 XT
Audio Device(s) Soundblaster AE-5 Plus
Power Supply FSP Aurum PT 1200W
Software Windows 11 Pro 64-bit
Joined
Jan 9, 2023
Messages
293 (0.44/day)
In fact many server boards have if off by default.
Why this isn't such a widespread problem... Because of this.
Restarting your server every once in a while is dumb, there are many scenarios where rebooting is undesired.
Why CC6 isn't off by default on servers however...
 
Joined
Aug 30, 2006
Messages
7,217 (1.09/day)
System Name ICE-QUAD // ICE-CRUNCH
Processor Q6600 // 2x Xeon 5472
Memory 2GB DDR // 8GB FB-DIMM
Video Card(s) HD3850-AGP // FireGL 3400
Display(s) 2 x Samsung 204Ts = 3200x1200
Audio Device(s) Audigy 2
Software Windows Server 2003 R2 as a Workstation now migrated to W10 with regrets.
Fascinating read at https://en.m.wikipedia.org/wiki/Leap_second

I’m amazed that such a modern processor as EPYC has an n bit limit clock problem. Where the design teams asleep?

since EPYC is a server processor, rebooting should not have to be part of the standard operating procedures, and i can imagine many use case scenarios where this clock problem can cause chaos, esp, futures markets.
 
Joined
Dec 25, 2020
Messages
6,521 (4.63/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS Special Edition
Motherboard ASUS ROG MAXIMUS Z790 APEX ENCORE
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) ASUS ROG Strix GeForce RTX™ 4080 16GB GDDR6X White OC Edition
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic Intellimouse
Keyboard Generic PS/2
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores I pulled a Qiqi~
I’m amazed that such a modern processor as EPYC has an n bit limit clock problem. Where the design teams asleep?

I wouldn't be surprised if Xeon had a similar issue, just on a larger (much larger?) time scale.

I bet that the exact same issue would manifest on a consumer-grade Ryzen Threadripper processor, too, maybe even the socket AM4 counterparts.

Other hand, this is some proper uptime...

 
Low quality post by SomeOne99h
Joined
Oct 3, 2015
Messages
468 (0.14/day)
System Name Specs Last Update: 8/April/2024
Processor Intel Core i5 9400f 2.9GHz/4.0 Turbo (NoOC)
Motherboard Gigabyte Z370M D3H rev. 1.0
Cooling be quite! Dark Rock Slim 180W TDP (The Silent Wings 3 120mm Fan)
Memory Corsair Red Line 8x2 16GB 3000MHz (NoOC) DDR4-3000 15-17-17-35 (CMK16GX4M2B3000C15R) V1.35 ver 4.24
Video Card(s) NVIDIA GeForce MSI 980 Ti Golden Edition (NoOC)| Spare: GTX 650 Ti 1 GB
Storage Samsung 870 EVO 4 TB | Samsung 860 EVO 1 TB | Cold Backup: WDC Black 930 GiB WD1003FZEX
Display(s) Asus VG248QZ 1920x1080 144hz 24" (Current: 60hz)
Case Corsair Air 540
Audio Device(s) Realtek ALC892
Power Supply Corsair 850W RMi
Mouse Logitech M187 wireless (First day of use 30-9-2021)
Keyboard Logitech K270 wireless
Software Windows 10 21H2 LTSC 2021 / Linux: Candidates: Bazzite - Linux MX - Tuxedo - Kubuntu
Using Bing AI Image creation with "AMD CPU, transforming to a zombie, depth, 3D", I got these four drawings :D :

AMD_Zombie.PNG
 
Joined
Sep 1, 2020
Messages
2,309 (1.52/day)
Location
Bulgaria
Nothing, just planned problems that, when accumulated over time, will force you to buy new hardware. Not a total wreck where you can sue them for compensation even after the product's warranty is over, but annoying.
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
41,839 (6.61/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
Engineer
Fascinating read at https://en.m.wikipedia.org/wiki/Leap_second

I’m amazed that such a modern processor as EPYC has an n bit limit clock problem. Where the design teams asleep?

since EPYC is a server processor, rebooting should not have to be part of the standard operating procedures, and i can imagine many use case scenarios where this clock problem can cause chaos, esp, futures markets.
Engineers aren't perfect
 
Joined
Mar 10, 2010
Messages
11,878 (2.22/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
Why this isn't such a widespread problem... Because of this.
Restarting your server every once in a while is dumb, there are many scenarios where rebooting is undesired.
Why CC6 isn't off by default on servers however...
While I agree , if you're not performing some sort of upgrade, maintenance or check, in three years of a server part in a normal IT department I would be surprised, filters need cleaning and three years is quite long to use the same server hardware ( considers own IT DPT), maybe though. :D :)

It's a surprising oversight for a server part.

@JAKra get a job a Cyberdyne, these firm's need this kind of thought:)
 
Joined
Dec 29, 2010
Messages
3,788 (0.75/day)
Processor AMD 5900x
Motherboard Asus x570 Strix-E
Cooling Hardware Labs
Memory G.Skill 4000c17 2x16gb
Video Card(s) RTX 3090
Storage Sabrent
Display(s) Samsung G9
Case Phanteks 719
Audio Device(s) Fiio K5 Pro
Power Supply EVGA 1000 P2
Mouse Logitech G600
Keyboard Corsair K95
Yea, this is more about lazy admins. If you hit the bug you get fired lol.
 
Joined
Jan 14, 2019
Messages
12,184 (5.75/day)
Location
Midlands, UK
System Name Nebulon B
Processor AMD Ryzen 7 7800X3D
Motherboard MSi PRO B650M-A WiFi
Cooling be quiet! Dark Rock 4
Memory 2x 24 GB Corsair Vengeance DDR5-4800
Video Card(s) AMD Radeon RX 6750 XT 12 GB
Storage 2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2
Display(s) Dell S3422DWG, 7" Waveshare touchscreen
Case Kolink Citadel Mesh black
Audio Device(s) Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply Seasonic Prime GX-750
Mouse Logitech MX Master 2S
Keyboard Logitech G413 SE
Software Windows 10 Pro
Even computers get tired sometimes.
 
Top