Technical Issues - TPU Main Site & Forum (2024)

W1zzard · Sep 2, 2024

Will share more content and information about today's downtime later, once things have recovered enough

Vincero · Sep 2, 2024

W1zzard said:
Will share more content and information about today's downtime later, once things have recovered enough

Who did it...?

Blaeza · Sep 2, 2024

Damn @W1zzard. Hope it wasn't malicious.

Assimilator · Sep 2, 2024

Uh oh, stinky!

W1zzard said:
Will share more content and information about today's downtime later, once things have recovered enough

Seems like you had to do a partial rollback/restore?

64K · Sep 2, 2024

I have never seen a downtime like this one. It was a Bad Gateway 502. Then updated to several servers went down and an 11:00 UTC time at least before fixed and then updated to 13:00 UTC.

502 Bad Gateway

This will take a while, we lost several servers. At least until 11:00 UTC

FreedomEclipse · Sep 2, 2024

last time I saw TPU down this long was the time TPU got taken down by the FBI.

P4-630 · Sep 2, 2024

Techpowerdown..

Apocalypsee · Sep 2, 2024

I blame ARF for posting about runny toothpaste in science subforum

HTC · Sep 2, 2024

FreedomEclipse said:
was the time TPU got taken down by the FBI

WTH????

When was this?

R0H1T · Sep 2, 2024

Apocalypsee said:
I blame ARF for posting about runny toothpaste in science subforum

You don't like watery toothpaste?

FreedomEclipse · Sep 2, 2024

HTC said:
WTH????

When was this?

it was a while back. :pimp:

Frick · Sep 2, 2024

FreedomEclipse said:
it was a while back.

I have no memory of that at all.

HTC · Sep 2, 2024

FreedomEclipse said:
it was a while back.

What was the reason: do you know?

redeye · Sep 2, 2024

HTC said:
WTH????

When was this?

quote”
from “ www. f b i. g o v / contact-us/ international-offices#:~:text=Authorities%20and%20Jurisdiction,invited%20by%20the%20host%20country.

Does the FBI have jurisdiction outside the US?

Authorities and Jurisdiction

A number of U.S. federal laws give the FBI authority to investigate extraterritorial criminal and terrorist activity. The FBI, however, conducts investigations abroad only when invited by the host country.”

so it was not the FBI, but on behalf of a European country… IMO

FreedomEclipse · Sep 2, 2024

HTC said:
What was the reason: do you know?

April fools.

Frick said:
I have no memory of that at all.

Apparently people got so mad at w1zzard for doing it that he never did the same prank again.

So If you were there. You witnessed a great slice of TPU history.

::EDIT::

Im surprised you guys got so deep into it as If TPU was a front for running cocaine into columbia and W1zzard was the criminal mastermind behind it or something.

HTC · Sep 2, 2024

FreedomEclipse said:
So If you were there. You witnessed a great slice of TPU history.

Is it accessible via waybackmachine or something?

eidairaman1 · Sep 2, 2024

HTC said:
Is it accessible via waybackmachine or something?

Go on there and find out, Im sure you will find a calandar date they backed up the info

redeye said:
quote”
from “ www. f b i. g o v / contact-us/ international-offices#:~:text=Authorities%20and%20Jurisdiction,invited%20by%20the%20host%20country.

Does the FBI have jurisdiction outside the US?

Authorities and Jurisdiction

A number of U.S. federal laws give the FBI authority to investigate extraterritorial criminal and terrorist activity. The FBI, however, conducts investigations abroad only when invited by the host country.”

so it was not the FBI, but on behalf of a European country… IMO

TPU servers are in The US last I recall.

Solaris17 · Sep 2, 2024

FreedomEclipse said:
April fools.

Apparently people got so mad at w1zzard for doing it that he never did the same prank again.

So If you were there. You witnessed a great slice of TPU history.

::EDIT::

Im surprised you guys got so deep into it as If TPU was a front for running cocaine into columbia and W1zzard was the criminal mastermind behind it or something.

I don’t remember the FBI one but I do remember them posting an April fools about selling TPU. It was wild.

FreedomEclipse · Sep 2, 2024

HTC said:
Is it accessible via waybackmachine or something?

Dont know. I cant even remember which year it was but it was 100% April Fools. (1st of April) so check every 1st of April going back the last 5-6 years

eidairaman1 said:
TPU servers are in The US last I recall.

I was told they are spread out all over the globe. Ive been told some were in hong kong too but that might have changed by now. W1zzard told me this years ago when I was being curious.

I think i was being curious because I either looked up TPUs DNS or IP address and that lead me to some server hosting originating in Hong Kong. I cant remember that it all that clearly but it was a real long time ago.

eidairaman1 · Sep 2, 2024

Solaris17 said:
I don’t remember the FBI one but I do remember them posting an April fools about selling TPU. It was wild.

There may come a day he might do that...

W1zzard · Sep 2, 2024

Alright .. finally .. first of all, this outage wasn't caused by any external/DDOS/hacking.

What happened was:

I wanted to run a database query on our banner impressions logs.
That table contains A LOT of rows, one for each banner impressions shown in like a year
So I wanted to reduce the working set to just August by copying that month into a separate table: INSERT INTO .. (SELECT .. FROM .. WHERE timestamp > "2024-08-01-01")
The query still took forever to run, I worked on something else, Zen 5 memory scaling, SSD review, GPU review, but after like an hour I got impatient and decided to solve it differently, so I used KILL to kill the query
As soon as the query was killed, MySQL started executing a rollback to undo the rows that it inserted in the new table (I probably somehow thought I was running SELECT, not INSERT, so no rollback expected)
At this point I realized that I mistyped the timestamp (2x "-01"), so it actually was actually copying ~70 GB of small rows into a temp table, and was now rolling them back one-by-one
We're running a 3-node Galera cluster, so this caused extra load across the cluster, network, disk, CPU
At some point one of the DB nodes crashed.. 2 out of 3 is still a good cluster size
The crashed node got auto-restarted, but was unable to rejoin the cluster, because the other nodes were still busy doing the rollback
I also saw log messages related to DDL statements, which acquire an exclusive lock on the cluster, new plan: "have you tried turning it off and on again?"
I took down the whole DB cluster and tried to manually bring up a single node as primary, to add the other nodes afterwards
When I did that, MySQL insisted that it had to finishing rolling back the transactions, so I let it .. took like an hour, I still wasn't sure if this would solve the problem
At this point I started digging up our our DB backups and thought about options to restore in case of total failure
For the past months I've been working on a migration to Kubernetes and MySQL Cluster with Group Replication (no more Galera)
I had a 3-node Group Replication cluster running in producting with a subset of our database, so I started restoring the backup to that cluster
On the main DB cluster, things were still moving slooooowly ..
Now that I had some rough ETA I updated our 502 Servers Down message, so that people would stop trying to reach out to me "hey wizz, are you aware that TPU is down?"
Once the rollback completed, I still couldn't get the single MySQL Galera node into primary mode, it was always read-only
I tried everything, no go, so I decided to focus on restoring from backup
This went mostly smooth, except for some minor issues because Oracle MySQL 8.x isn't exactly 100% compatible to MariaDB MySQL 10.x
Fixed them all, site is back up
Ads are still disabled because the backup for that huge ads log table is still restoring

FreedomEclipse said:
I was told they are spread out all over the globe

Our download severs are spread around the globe, the main infrastructure (that creates the pages in your browser) is in NYC, because that's geographically closest to the average of our audience. Backups are multi-site, multi-continent

wNotyarD · Sep 2, 2024

W1zzard said:
Alright .. finally .. first of all, this outage wasn't caused by any external/DDOS/hacking.

What happened was:

I wanted to run a database query on our banner impressions logs.

That table contains A LOT of rows, one for each banner impressions shown in like a year

So I wanted to reduce the working set to just August by copying that month into a separate table: INSERT INTO .. (SELECT .. FROM .. WHERE timestamp > "2024-08-01-01")

The query still took forever to run, I worked on something else, Zen 5 memory scaling, SSD review, GPU review, but after like an hour I got impatient and decided to solve it differently, so I used KILL to kill the query

As soon as the query was killed, MySQL started executing a rollback to undo the rows that it inserted in the new table (I probably somehow thought I was running SELECT, not INSERT, so no rollback expected)

At this point I realized that I mistyped the timestamp (2x "-01"), so it actually was actually copying ~70 GB of small rows into a temp table, and was now rolling them back one-by-one

We're running a 3-node Galera cluster, so this caused extra load across the cluster, network, disk, CPU

At some point one of the DB nodes crashed.. 2 out of 3 is still a good cluster size

The crashed node got auto-restarted, but was unable to rejoin the cluster, because the other nodes were still busy doing the rollback

I also saw log messages related to DDL statements, which acquire an exclusive lock on the cluster, new plan: "have you tried turning it off and on again?"

I took down the whole DB cluster and tried to manually bring up a single node as primary, to add the other nodes afterwards

When I did that, MySQL insisted that it had to finishing rolling back the transactions, so I let it .. took like an hour, I still wasn't sure if this would solve the problem

At this point I started digging up our our DB backups and thought about options to restore in case of total failure

For the past months I've been working on a migration to Kubernetes and MySQL Cluster with Group Replication (no more Galera)

I had a 3-node Group Replication cluster running in producting with a subset of our database, so I started restoring the backup to that cluster

On the main DB cluster, things were still moving slooooowly ..
View attachment 361780

Now that I had some rough ETA I updated our 502 Servers Down message, so that people would stop trying to reach out to me "hey wizz, are you aware that TPU is down?"

Once the rollback completed, I still couldn't get the single MySQL Galera node into primary mode, it was always read-only

I tried everything, no go, so I decided to focus on restoring from backup

This went mostly smooth, except for some minor issues because Oracle MySQL 8.x isn't exactly 100% compatible to MariaDB MySQL 10.x

Fixed them all, site is back up

Ads are still disabled because the backup for that huge ads log table is still restoring

Our download severs are spread around the globe, the main infrastructure (that creates the pages in your browser) is in NYC, because that's geographically closest to the average of our audience. Backups are multi-site, multi-continent

Thank you for the transparency @W1zzard

Timbaloo · Sep 2, 2024

So basically: "This is how an RBMK reactor core explodes".

Blaeza · Sep 2, 2024

I have no idea what that means, but well done, you fixed it!

FreedomEclipse · Sep 2, 2024

So the biggest threat to TPU is not TomsHardware but W1zzard himself. :toast:

Processor	Ryzen 7 5700X
Memory	48 GB
Video Card(s)	RTX 4080
Storage	2x HDD RAID 1, 3x M.2 NVMe
Display(s)	30" 2560x1600 + 19" 1280x1024
Software	Windows 10 64-bit

System Name	AMDWeapon
Processor	Ryzen 7 7800X3D
Motherboard	X670E MSI Tomahawk WiFi
Cooling	Thermalright Peerless Assassin 120 ARGB with Silverstone Air Blazer 2200rpm fans
Memory	G-Skill Trident Z Neo RGB 6000 CL30 32GB@EXPO
Video Card(s)	Powercolor 7900 GRE Red Devil
Storage	Samsung 870 QVO 1TB x 2, Lexar 256 GB, TeamGroup MP44L 2TB, Crucial T700 1TB, Seagate Firecuda 2TB
Display(s)	32" LG UltraGear GN600-B
Case	Montech 903 MAX AIR
Audio Device(s)	Corsair void wireless/Sennheiser EPOS 670
Power Supply	MSI MPG AGF 850 watt gold
Mouse	Glorious Model D l Pad GameSir G7 SE
Keyboard	Redragon Vara K551P
Software	Windows 11 Pro 24H2
Benchmark Scores	Fast Enough.

System Name	Firelance.
Processor	Threadripper 3960X
Motherboard	ROG Strix TRX40-E Gaming
Cooling	IceGem 360 + 6x Arctic Cooling P12
Memory	8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s)	MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage	2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s)	3x AOC Q32E2N (32" 2560x1440 75Hz)
Case	Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply	Fractal Design Ion+ 2 Platinum 760W
Mouse	Logitech G602
Keyboard	Razer Pro Type Ultra
Software	Windows 10 Professional x64

Processor	i7 7700k
Motherboard	MSI Z270 SLI Plus
Cooling	CM Hyper 212 EVO
Memory	2 x 8 GB Corsair Vengeance
Video Card(s)	Temporary MSI RTX 4070 Super
Storage	Samsung 850 EVO 250 GB and WD Black 4TB
Display(s)	Temporary Viewsonic 4K 60 Hz
Case	Corsair Obsidian 750D Airflow Edition
Audio Device(s)	Onboard
Power Supply	EVGA SuperNova 850 W Gold
Mouse	Logitech G502
Keyboard	Logitech G105
Software	Windows 10

System Name	DarnGosh Edition
Processor	AMD 7800X3D
Motherboard	MSI X670E GAMING PLUS
Cooling	Thermalright AM5 Contact Frame + Phantom Spirit 120SE
Memory	2x32GB G.Skill Trident Z5 NEO DDR5 6000 CL32-38-38-96
Video Card(s)	Asus Dual Radeon™ RX 6700 XT OC Edition
Storage	WD SN770 1TB (Boot)\| 2x 2TB WD SN770 (Gaming)\| 2x 2TB Crucial BX500\| 2x 3TB Toshiba DT01ACA300
Display(s)	LG GP850-B
Case	Corsair 760T (White) {1xCorsair ML120 Pro\|5xML140 Pro}
Audio Device(s)	Yamaha RX-V573\|Speakers: JBL Control One\|Auna 300-CN\|Wharfedale Diamond SW150
Power Supply	Seasonic Focus GX-850 80+ GOLD
Mouse	Logitech G502 X
Keyboard	Duckyshine Dead LED(s) III
Software	Windows 11 Home
Benchmark Scores	ლ(ಠ益ಠ)ლ

Technical Issues - TPU Main Site & Forum (2024)

W1zzard

Administrator

Vincero

Blaeza

Assimilator

64K

502 Bad Gateway

FreedomEclipse

~Technological Technocrat~

P4-630

Apocalypsee

HTC

R0H1T

FreedomEclipse

~Technological Technocrat~

Frick

Fishfaced Nincompoop

HTC

redeye

FreedomEclipse

~Technological Technocrat~

HTC

eidairaman1

The Exiled Airman

Solaris17

Super Dainty Moderator

FreedomEclipse

~Technological Technocrat~

eidairaman1

The Exiled Airman

W1zzard

Administrator

wNotyarD

Timbaloo

Blaeza

FreedomEclipse

~Technological Technocrat~

System Name	AlderLake
Processor	Intel i7 12700K P-Cores @ 5Ghz
Motherboard	Gigabyte Z690 Aorus Master
Cooling	Noctua NH-U12A 2 fans + Thermal Grizzly Kryonaut Extreme + 5 case fans
Memory	32GB DDR5 Corsair Dominator Platinum RGB 6000MT/s CL36
Video Card(s)	MSI RTX 2070 Super Gaming X Trio
Storage	Samsung 980 Pro 1TB + 970 Evo 500GB + 850 Pro 512GB + 860 Evo 1TB x2
Display(s)	23.8" Dell S2417DG 165Hz G-Sync 1440p
Case	Be quiet! Silent Base 600 - Window
Audio Device(s)	Panasonic SA-PMX94 / Realtek onboard + B&O speaker system / Harman Kardon Go + Play / Logitech G533
Power Supply	Seasonic Focus Plus Gold 750W
Mouse	Logitech MX Anywhere 2 Laser wireless
Keyboard	RAPOO E9270P Black 5GHz wireless
Software	Windows 11
Benchmark Scores	Cinebench R23 (Single Core) 1936 @ stock Cinebench R23 (Multi Core) 23006 @ stock

Processor	AMD Ryzen 7 5700G
Motherboard	ASUS A520M-K
Cooling	Scythe Kotetsu Mark II
Memory	2 x 16GB SK Hynix CJR OEM DDR4-3200 @ 4000 20-22-20-48
Video Card(s)	Colorful RTX 2060 SUPER 8GB GDDR6
Storage	250GB WD BLACK SN750 M.2 + 4TB WD Red Plus + 4TB WD Purple
Display(s)	AOpen 27HC5R 27" 1080p 165Hz curved VA
Case	AIGO Darkflash C285
Audio Device(s)	Creative SoundBlaster Z + Kurtzweil KS-40A bookshelf / Sennheiser HD555
Power Supply	Great Wall GW-EPS1000DA 1kW
Mouse	Razer Deathadder Essential
Keyboard	Cougar Attack2 Cherry MX Black
Software	Windows 10 Pro x64 22H2

System Name	HTC's System
Processor	Ryzen 5 5800X3D
Motherboard	Asrock Taichi X370
Cooling	NH-C14, with the AM4 mounting kit
Memory	G.Skill Kit 16GB DDR4 F4 - 3200 C16D - 16 GTZB
Video Card(s)	Sapphire Pulse 6600 8 GB
Storage	1 Samsung NVMe 960 EVO 250 GB + 1 3.5" Seagate IronWolf Pro 6TB 7200RPM 256MB SATA III
Display(s)	LG 27UD58
Case	Fractal Design Define R6 USB-C
Audio Device(s)	Onboard
Power Supply	Corsair TX 850M 80+ Gold
Mouse	Razer Deathadder Elite
Software	Ubuntu 20.04.6 LTS

System Name	White DJ in Detroit
Processor	Ryzen 5 5600
Motherboard	Asrock B450M-HDV
Cooling	Be Quiet! Pure Rock 2
Memory	2 x 16GB Kingston Fury 3400mhz
Video Card(s)	XFX 6950XT Speedster MERC 319
Storage	Kingston A400 240GB \| WD Black SN750 2TB \|WD Blue 1TB x 2 \| Toshiba P300 2TB \| Seagate Expansion 8TB
Display(s)	Samsung U32J590U 4K + BenQ GL2450HT 1080p
Case	Fractal Design Define R4
Audio Device(s)	Plantronics 5220, Nektar SE61 keyboard
Power Supply	Corsair RM850x v3
Mouse	Logitech G602
Keyboard	Cherry MX Board 1.0 TKL Brown
Software	Windows 10 Pro
Benchmark Scores	Rimworld 4K ready!

System Name	YACS amd
Processor	5800x,
Motherboard	gigabyte x570 aorus gaming elite.
Cooling	bykski GPU, and CPU, syscooling p93x pump
Memory	corsair vengeance pro rgb, 3600 ddr4 stock timings.
Video Card(s)	xfx merc 310 7900xtx
Storage	kingston kc3000 2TB, amongst others. Fanxiang s770 2TB
Display(s)	benq ew3270u, or acer XB270hu, acer XB280hk, asus VG 278H,
Case	lian li LANCOOL III
Audio Device(s)	obs,
Power Supply	FSP Hydro Ti pro 1000w
Mouse	logitech g703
Keyboard	durogod keyboard. (cherry brown switches)
Software	win 11, win10pro.

System Name	PCGOD
Processor	AMD FX 8350@ 5.0GHz
Motherboard	Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling	Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory	16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s)	AMD Radeon 290 Sapphire Vapor-X
Storage	Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s)	NEC Multisync LCD 1700V (Display Port Adapter)
Case	AeroCool Xpredator Evil Blue Edition
Audio Device(s)	Creative Labs Sound Blaster ZxR
Power Supply	Seasonic 1250 XM2 Series (XP3)
Mouse	Roccat Kone XTD
Keyboard	Roccat Ryos MK Pro
Software	Windows 7 Pro 64

System Name	RogueOne
Processor	Xeon W9-3495x
Motherboard	ASUS w790E Sage SE
Cooling	SilverStone XE360-4677
Memory	128gb Gskill Zeta R5 DDR5 RDIMMs
Video Card(s)	MSI SUPRIM Liquid X 4090
Storage	1x 2TB WD SN850X \| 2x 8TB GAMMIX S70
Display(s)	49" Philips Evnia OLED (49M2C8900)
Case	Thermaltake Core P3 Pro Snow
Audio Device(s)	Moondrop S8's on schitt Gunnr
Power Supply	Seasonic Prime TX-1600
Mouse	Lamzu Atlantis mini (White)
Keyboard	Monsgeek M3 Lavender, Moondrop Luna lights
VR HMD	Quest 3
Software	Windows 11 Pro Workstation
Benchmark Scores	I dont have time for that.

System Name	G-Station 1.17 FINAL
Processor	AMD Ryzen 7 5700X3D
Motherboard	Gigabyte X470 Aorus Gaming 7 WiFi
Cooling	DeepCool AK620 Digital
Memory	Asgard Bragi DDR4-3600CL14 2x16GB
Video Card(s)	Sapphire PULSE RX 7900 XTX
Storage	240GB Samsung 840 Evo, 1TB Asgard AN2, 2TB Hiksemi FUTURE-LITE, 320GB+1TB 7200RPM HDD
Display(s)	Samsung 34" Odyssey OLED G8
Case	Thermaltake Level 20 MT
Audio Device(s)	Astro A40 TR + MixAmp
Power Supply	Cougar GEX X2 1000W
Mouse	Razer Viper Ultimate
Keyboard	Razer Huntsman Elite (Red)
Software	Windows 11 Pro

Technical Issues - TPU Main Site & Forum (2024)

Administrator

502 Bad Gateway​

~Technological Technocrat~

~Technological Technocrat~

Fishfaced Nincompoop

~Technological Technocrat~

The Exiled Airman

Super Dainty Moderator

~Technological Technocrat~

The Exiled Airman

Administrator

~Technological Technocrat~

502 Bad Gateway