• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Applies for CPU Design Patent Featuring Core-Integrated FPGA Elements

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.25/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
AMD has applied for a United States Patent that describes a CPU design with FPGA (Field-Programmable Gate Array) elements integrated into its core design. Titled "Method and Apparatus for Efficient Programmable Instructions in Computer Systems", the patent application describes a CPU with FPGA elements inscribed into its very core design, where the FPGA elements actually share CPU resources such as registers for floating-point and integer execution units. This patent undoubtedly comes in the wake of AMD's announced Xilinx acquisition plans, and brings FPGA and CPU marriages to a whole other level. FPGA,as the name implies, are hardware constructions which can reconfigure themselves according to predetermined tables (which can also be updated) to execute desired and specific functions.

Intel have themselves already shipped a CPU + FPGA combo in the same package; the company's Xeon 6138P, for example, includes an Arria 10 GX 1150 FPGA on-package, offering 1,150,000 logic elements. However, this is simply a CPU + FPGA combo on the same substrate; not a native, core-integrated FPGA design. Intel's product has severe performance and latency penalties due to the fact that complex operations performed in the FPGA have to be brought out of the CPU, processed in the FPGA, and then its results have to be returned to the CPU. AMD's design effectively ditches that particular roundabout, and should thus allow for much higher performance.





Some of the more interesting claims in the patent application are listed below:

  • Processor includes one or more reprogrammable execution units which can be programmed to execute different types of customized instructions
  • When a processor loads a program, it also loads a bitfile associated with the program which programs the PEU to execute the customized instruction
  • Decode and dispatch unit of the CPU automatically dispatches the specialized instructions to the proper PEUs
  • PEU shares registers with the FP and Int EUs.
  • PEU can accelerate Int or FP workloads as well if speedup is desired
  • PEU can be virtualized while still using system security features
  • Each PEU can be programmed differently from other PEUs in the system
  • PEUs can operate on data formats that are not typical FP32/FP64 (e.g. Bfloat16, FP16, Sparse FP16, whatever else they want to come up with) to accelerate machine learning, without needing to wait for new silicon to be made to process those data types.
  • PEUs can be reprogrammed on-the-fly (during runtime)
  • PEUs can be tuned to maximize performance based on the workload
  • PEUs can massively increase IPC by doing more complex work in a single cycle

As it stands, this sort of design would allow, in theory, for an updatable CPU that might never need to be upgraded when it comes to new instruction support: since FPGA is a programmable hardware logic, a simple firmware update could allow the CPU to reconfigure its FPGA array so as to be able to process new, exotic instructions as they are released. Another argument for this integration is that in this way, some fixed-function silicon that is today found in CPUs and that serve to support legacy x86 instructions could be left out of the die, to be taken care of by the FPGA package itself - enabling a still-on-board hardware accelerator for when (and if) these instructions are required.

This would also allow AMD to trim the CPU of the "dark silicon" that is currently present - essentially, highly specialized hardware acceleration blocks that sit idly, as a waste of die space, when not in use. The bottom line is this: CPUs with lower die space reserved for highly specialized operations, thus with more die area available for other resources (such as more cores), and with integrated, per-core FPGA elements that would on-the-fly reconfigure themselves according to processing needs. And if there are no exotic operations required (such as AI inferencing and acceleration, AVX (for example), video hardware acceleration, or other workloads, then the FPGA elements can just be reconfigured to "turbo" the CPU's own floating point and integer units, increasing available resources. An interesting patent application, for sure.

View at TechPowerUp Main Site
 
Joined
Oct 15, 2010
Messages
208 (0.04/day)
Cant imagine why hasnt anyone thought of this before. I dreamed of on the fly reprogramable hardware units inside chips since like forever, and it is only know i see one of the great CPU makers on the planet going this route.
This is the future, Full CPU with on the fly reprogramable units. They "morph" in the shape that is the most efficient for the calculation that needs to be done. Its as if you have a billion cpus into one. This is the future for sure.
Imagine having a 512 or 1024 core on 0.5 nanometer cpu like this, would blast curent high end cpus like the 5950x into oblivion like they were nothing. In all kind of workloads.
 
Joined
Nov 11, 2019
Messages
62 (0.03/day)
Location
Germany
Processor Ryzen 5 3600
Motherboard MSI B450M Gaming Plus
Cooling EK Supremacy EVO, Bykski N-GV1080TIG1-X (Gigabyte 1080TI Turbo) [280mm front, 240mm top, 120mm back]
Memory 16GiB 3600Mhz CL16 Patriot Viper
Video Card(s) Gigabyte GTX 1080Ti Turbo
Storage 4TiB Seagate Baracuda + 256 GiB Samsung 970 Evo Plus (StoreMI) & 500GiB Intenso SSD
Display(s) MSI Optix MAG271CR
Case CoolerMaster NR600
Power Supply Seagate Focus Plus 650 Watt GOLD
Mouse Sharkoon SHARK Force
Keyboard ReIDEA KM06
Cant imagine why hasnt anyone thought of this before. I dreamed of on the fly reprogramable hardware units inside chips since like forever, and it is only know i see one of the great CPU makers on the planet going this route.
This is the future, Full CPU with on the fly reprogramable units. They "morph" in the shape that is the most efficient for the calculation that needs to be done. Its as if you have a billion cpus into one. This is the future for sure.
Imagine having a 512 or 1024 core on 0.5 nanometer cpu like this, would blast curent high end cpus like the 5950x into oblivion like they were nothing. In all kind of workloads.
Do FPGAs not have a performance penalty? I always thought it was a trade-off (highly specialized & faster <--> highly generalized & slower)
 
Joined
Sep 6, 2013
Messages
3,318 (0.81/day)
Location
Athens, Greece
System Name 3 desktop systems: Gaming / Internet / HTPC
Processor Ryzen 5 5500 / Ryzen 5 4600G / FX 6300 (12 years latter got to see how bad Bulldozer is)
Motherboard MSI X470 Gaming Plus Max (1) / MSI X470 Gaming Plus Max (2) / Gigabyte GA-990XA-UD3
Cooling Νoctua U12S / Segotep T4 / Snowman M-T6
Memory 32GB - 16GB G.Skill RIPJAWS 3600+16GB G.Skill Aegis 3200 / 16GB JUHOR / 16GB Kingston 2400MHz (DDR3)
Video Card(s) ASRock RX 6600 + GT 710 (PhysX)/ Vega 7 integrated / Radeon RX 580
Storage NVMes, ONLY NVMes/ NVMes, SATA Storage / NVMe boot(Clover), SATA storage
Display(s) Philips 43PUS8857/12 UHD TV (120Hz, HDR, FreeSync Premium) ---- 19'' HP monitor + BlitzWolf BW-V5
Case Sharkoon Rebel 12 / CoolerMaster Elite 361 / Xigmatek Midguard
Audio Device(s) onboard
Power Supply Chieftec 850W / Silver Power 400W / Sharkoon 650W
Mouse CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Keyboard CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Software Windows 10 / Windows 10&Windows 11 / Windows 10
Noob question.


Can those "FPGA parts" emulate old stuff? Like 32bit code? I mean, that could probably help AMD clean up all the old stuff in their designs, that is there for compatibility perposes and help streamline their future cores?
 

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.25/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
Noob question.


Can those "FPGA parts" emulate old stuff? Like 32bit code? I mean, that could probably help AMD clean up all the old stuff in their designs, that is there for compatibility perposes and help streamline their future cores?

Yes. That's part of what dark silicon in the article refers to.
 
Joined
Sep 6, 2013
Messages
3,318 (0.81/day)
Location
Athens, Greece
System Name 3 desktop systems: Gaming / Internet / HTPC
Processor Ryzen 5 5500 / Ryzen 5 4600G / FX 6300 (12 years latter got to see how bad Bulldozer is)
Motherboard MSI X470 Gaming Plus Max (1) / MSI X470 Gaming Plus Max (2) / Gigabyte GA-990XA-UD3
Cooling Νoctua U12S / Segotep T4 / Snowman M-T6
Memory 32GB - 16GB G.Skill RIPJAWS 3600+16GB G.Skill Aegis 3200 / 16GB JUHOR / 16GB Kingston 2400MHz (DDR3)
Video Card(s) ASRock RX 6600 + GT 710 (PhysX)/ Vega 7 integrated / Radeon RX 580
Storage NVMes, ONLY NVMes/ NVMes, SATA Storage / NVMe boot(Clover), SATA storage
Display(s) Philips 43PUS8857/12 UHD TV (120Hz, HDR, FreeSync Premium) ---- 19'' HP monitor + BlitzWolf BW-V5
Case Sharkoon Rebel 12 / CoolerMaster Elite 361 / Xigmatek Midguard
Audio Device(s) onboard
Power Supply Chieftec 850W / Silver Power 400W / Sharkoon 650W
Mouse CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Keyboard CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Software Windows 10 / Windows 10&Windows 11 / Windows 10
Joined
Mar 16, 2017
Messages
231 (0.08/day)
Location
behind you
Processor Threadripper 1950X
Motherboard ASRock X399 Professional Gaming
Cooling IceGiant ProSiphon Elite
Memory 48GB DDR4 2934MHz
Video Card(s) MSI GTX 1080
Storage 4TB Crucial P3 Plus NVMe, 1TB Samsung 980 NVMe, 1TB Inland NVMe, 2TB Western Digital HDD
Display(s) 2x 4K60
Power Supply Cooler Master Silent Pro M (1000W)
Mouse Corsair Ironclaw Wireless
Keyboard Corsair K70 MK.2
VR HMD HTC Vive Pro
Software Windows 10, QubesOS
Do FPGAs not have a performance penalty? I always thought it was a trade-off (highly specialized & faster <--> highly generalized & slower)
They do yes. An FPGA implementation will always be slower and take more die space than dedicated hardware. However, if there is more than one accelerator present and only one is used at a time then an FPGA implementation can emulate both while taking up less space. Further an FPGA solution is almost always faster than performing the function in software on a traditional CPU.

In summary FPGAs are much more flexible than dedicated hardware while generally faster than performing a function in software. They're sort of a middle ground.
 
Joined
Jun 19, 2010
Messages
409 (0.08/day)
Location
Germany
Processor Ryzen 5600X
Motherboard MSI A520
Cooling Thermalright ARO-M14 orange
Memory 2x 8GB 3200
Video Card(s) RTX 3050 (ROG Strix Bios)
Storage SATA SSD
Display(s) UltraHD TV
Case Sharkoon AM5 Window red
Audio Device(s) Headset
Power Supply beQuiet 400W
Mouse Mountain Makalu 67
Keyboard MS Sidewinder X4
Software Windows, Vivaldi, Thunderbird, LibreOffice, Games, etc.
maybe it can handle AVX512 and Intels new AMX stuff, that would be great.
 

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.25/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
They do yes. An FPGA implementation will always be slower and take more die space than dedicated hardware. However, if there is more than one accelerator present and only one is used at a time then an FPGA implementation can emulate both while taking up less space. Further an FPGA solution is almost always faster than performing the function in software on a traditional CPU.

In summary FPGAs are much more flexible than dedicated hardware while generally faster than performing a function in software. They're sort of a middle ground.

This. Of course, one also has to take into consideration the relation between the amount of die space reserved for the FPGA (more die space means more units means more performance for any given task) and also the AMD intention of having these take advantage of already-existing core resources.

Also, perhaps we could actually see improved performance on tasks performed by fixed-function hardware. I suppose in theory, if one can shave 3x 20mm2 (pulling this out of my proverbial, as an example) fixed function hardware for three specific tasks, and replace those with 60 mm2 FPGA, perhaps those 60 mm2 FPGA resources will be faster at executing one of those tasks than their previous 20mm2 fixed-function hardware?
 
Joined
Jul 7, 2019
Messages
908 (0.47/day)
This could be another way AMD keeps up with ARM and RISC V, by making their CPUs flexible enough to use software/hardware intended for ARM/RISC devices while still retaining exclusive (1 of 3? companies, IIRC) x86 legacy support. As opposed to shifting entirely over to ARM (though I can still see AMD using a K12 successor in entering the ARM ecosystem proper; moreso given their RDNA joint effort with Samsung to integrate ARM and RDNA for mobile).
 
Joined
Jan 3, 2021
Messages
3,453 (2.46/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
The idea is exciting but doubts remain.
* Manufacturing tech. For example, the process for making CPUs is significantly different from those for DRAM and NAND - two of those can't be combined on the same die efficiently. So, is the process that produces the best CPU logic also the best for FPGA logic?
* Can a CPU really make full advantage of flexible execution units if all other parts remain fixed, like decode logic, out-of-order logic, etc.? For 32-bit emulation, I believe that programmable decoders would be the key, not programmable EUs.
* Context switching might require reprogramming the FPGA logic every time, or often, depending on the load. How much time does it take? Not very good if it's many microseconds.
* The poor guy that writes highly optimized C/C++ code, will he (or she) have to become a VHDL expert too? (or will that be left up to the other poor guy, the on that maintains the optimizing compiler?)

If all this ever sees the light of day, I imagine AMD will make various FPGA functions available as downloads, for a fee of course. Maybe they won't let just anyone develop or sell new ones.
 
Last edited:

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,162 (2.82/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
Do FPGAs not have a performance penalty? I always thought it was a trade-off (highly specialized & faster <--> highly generalized & slower)
The advantage to FPGA is to be able to make changes after the device has already been fabricated. The cost really comes down to how the FPGA is programmed and the implementation itself. Being able to make changes after the device has been built is definitely an advantage, particularly if you consider some of these security flaws we've been seeing.
 
Joined
Jul 16, 2014
Messages
8,197 (2.17/day)
Location
SE Michigan
System Name Dumbass
Processor AMD Ryzen 7800X3D
Motherboard ASUS TUF gaming B650
Cooling Artic Liquid Freezer 2 - 420mm
Memory G.Skill Sniper 32gb DDR5 6000
Video Card(s) GreenTeam 4070 ti super 16gb
Storage Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s) 1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s) onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply Corsair HX1000i
Mouse Steeseries Esports Wireless
Keyboard Corsair K100
Software windows 10 H
Benchmark Scores https://i.imgur.com/aoz3vWY.jpg?2
other workloads, then the FPGA elements can just be reconfigured to "turbo" the CPU's own floating point and integer units, increasing available resources.

FPGA: I'm bored
CPU: you could help me
FPGA: eh?
CPU: Lazy bum!
FPGA: Ok fine here, Haz some Red Bull, it'll make you run faster.
CPU: I dont have legs!
FPGA: better I stick a Falcon 9 up your rear?
CPU: Zoom! ZOOM!
 

Mussels

Freshwater Moderator
Joined
Oct 6, 2004
Messages
58,413 (7.96/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
Can someone simplify this for me? in way over my head on CPU designs

is this what it seems like, with AMD making their CPU's hardware functions reprogammable so they can just change the damn architecture and feature set on a whim?
 
Last edited:
Joined
Mar 21, 2016
Messages
2,508 (0.79/day)
So many implications from this. I've long been a fan of FPGA's flexibility prospects. I kind of felt this was coming down the pike for a long long while with CPU instruction sets and chiplets. Bravo to AMD on this if it proves to be efficient and functionally sound. What I really like with this is they have the prospect of utilizing FPGA tech to significantly accelerate instruction sets for example compress/decompression/encryption/decryption algorithms them swap them around in and out as required quickly on the fly maximizing the die space available as opposed to have all fixed into hardware occupying more overall combined space. This is ideal for any instruction sets that won't be utilized a certain % of the time. Instead of a compromised instruction set as well that's less brute force they can hopefully have several brute force ones that they can quickly interchange and utilize to bump up the overall efficiency. Perhaps you need some profiles to load them around dynamically and it takes a moment or two, but once configured is a pronounced speed up that's still a great compromise and if it happens behind the scenes and quickly enough in the first place that's excellent. AI accelerated FPGA's if you will and this is what will lead to better on the fly cognitive neuron like chips.
Can someone simplify this for me? in way over my head on CPU designs

is this what it seems like, with AMD making their CPU's hardware functions reprogammable so they can just change the damn architecture and feature set on a whim?

Think of it a bit like a sound DSP chip that can reconfigure itself on the fly to the sound environment to maximize the sound effect and realism of the sound in relation to the 3D surrounding. In essence they could reconfiguration optimizing re-calibrating AI assisted instruction responsive compute algorithms on a chiplet. Depending on how much FPGA tech is on the chiplet and how many instructions were removed along with how quickly they can be reconfigured there is a lot of potential upside in form of die space that could be better allocated. Not simply that either this gives them a great idea of where they might shift things from the direction dedicated instructions they retain on the CPU chiplet's and more legacy ones they can trim or adjust better and supplement with FPGA tech to yield better IPC as a whole within the die space and heat tolerance constraints

Do FPGAs not have a performance penalty? I always thought it was a trade-off (highly specialized & faster <--> highly generalized & slower)

I believe that's a bit of a misconception with FPGA's. They aren't as optimal as a ASIC at a given task and designed for it, but they aren't one trick ponies confined to that task indefinitely either and that is a key difference. What AMD's aiming to do here is remove some "fixed instruction sets" of less vital importance thru clever use of FPGA's and interchanging instruction sets in quick fashion "ideally" and re-purposing or transforming the removed instructions set die space however is optimus roll out AMD autobots.

This. Of course, one also has to take into consideration the relation between the amount of die space reserved for the FPGA (more die space means more units means more performance for any given task) and also the AMD intention of having these take advantage of already-existing core resources.

Also, perhaps we could actually see improved performance on tasks performed by fixed-function hardware. I suppose in theory, if one can shave 3x 20mm2 (pulling this out of my proverbial, as an example) fixed function hardware for three specific tasks, and replace those with 60 mm2 FPGA, perhaps those 60 mm2 FPGA resources will be faster at executing one of those tasks than their previous 20mm2 fixed-function hardware?

Quite a bit like polyphony and timbrality for music and sequencing. Really with a FPGA AMD could adjust many aspects in many ways at any point thru transforming the FPGA programming. Don't need compression/decompression/encryption/decryption at the moment or only a select amount of it dynamically junk and reconfigure it. Don't need certain instruction sets for the task goodbye. Basically this is a bit like precision boost all over again on a whole other level of refinement and IPC efficiency uplift in a round about sense in theory if done efficiently and well. I can see it taking a bit of generation refinement, but much like other tech it should see nice improvements as it's better perfected.

The idea is exciting but doubts remain.
* Manufacturing tech. For example, the process for making CPUs is significantly different from those for DRAM and NAND - two of those can't be combined on the same die efficiently. So, is the process that produces the best CPU logic also the best for FPGA logic?
* Can a CPU really make full advantage of flexible execution units if all other parts remain fixed, like decode logic, out-of-order logic, etc.? For 32-bit emulation, I believe that programmable decoders would be the key, not programmable EUs.
* Context switching might require reprogramming the FPGA logic every time, or often, depending on the load. How much time does it take? Not very good if it's many microseconds.
* The poor guy that writes highly optimized C/C++ code, will he (or she) have to become a VHDL expert too? (or will that be left up to the other poor guy, the on that maintains the optimizing compiler?)

If all this ever sees the light of day, I imagine AMD will make various FPGA functions available as downloads, for a fee of course. Maybe they won't let just anyone develop or sell new ones.

The way I see it is AMD could utilize the chiplet's cleverly. Example 4 chiplet design. The first chiplet pure multi-core CPU design, second pure FPGA design, third pure APU/GPU design. As for that fourth chiplet perhaps it's 1/4 of each and infinity fabric between the other three that it controls. Also that fourth chip could effectively be seen as one large monolithic chip in essence. Now think about that prospect suddenly those yields and die defects and laser cutting off some of the bad portions to salvage what they can in a chip isn't as big a issue in the overall chip design.
 
Last edited:
Joined
Oct 12, 2005
Messages
704 (0.10/day)
Can someone simplify this for me? in way over my head on CPU designs

is this what it seems like, with AMD making their CPU's hardware functions reprogammable so they can just change the damn architecture and feature set on a whim?
From what i read in the article, they will only use it for some stuff but the main part of the CPU will remain a traditional CPU. This might help to increase performance in the future but better performance come mostly from the increase in transitors, and not a lot by better optimise one.

Of these 3 part of a CPU, they want to be able to use these programable transitors to replace 2 of them.
The legacy support (to support old x86 apps that use instruction that modern apps no longer use.).
Accelerators like AI, Image processing, video decoding/encoding, Encryption/Decryption, etc.

The core of the CPU will remain normal static transitors.

But the goal would be to use the space saved as they won't require as much space and transitors for many legacy instruction/code or accelerators for something that might provide better performance. Like more core, larger cores, more cache, etc...
 
Joined
Apr 8, 2008
Messages
339 (0.06/day)
System Name Xajel Main
Processor AMD Ryzen 7 5800X
Motherboard ASRock X570M Steel Legened
Cooling Corsair H100i PRO
Memory G.Skill DDR4 3600 32GB (2x16GB)
Video Card(s) ZOTAC GAMING GeForce RTX 3080 Ti AMP Holo
Storage (OS) Gigabyte AORUS NVMe Gen4 1TB + (Personal) WD Black SN850X 2TB + (Store) WD 8TB HDD
Display(s) LG 38WN95C Ultrawide 3840x1600 144Hz
Case Cooler Master CM690 III
Audio Device(s) Built-in Audio + Yamaha SR-C20 Soundbar
Power Supply Thermaltake 750W
Mouse Logitech MK710 Combo
Keyboard Logitech MK710 Combo (M705)
Software Windows 11 Pro
AMD -when asked about AVX512- said they're more interested in a better silicon usage that can do multiple things rather than wasting die space in a specialised workload that only few can take benefit from. The same goes for their RayTracing on GPU's, they said they're leaning toward making a more general purpose cores than can do RT calculations faster rather than having a dedicated silicon only for RT.

I guess this is how AMD is seeing things like AVX512, AI and other stuff, just put an FPGA there. Developers can just program it and do their magic. But I don't know how much it can do over specialised ASICS like how Intel is doing with AVX512, and how the FPGA will work with multiple applications each trying to do its own thing. The patent I saw was like each x86 core has a small FPGA beside it and both share resources (like how x86 core has integer and FP units, now we will have an FPGA unit as well). So each core can have their own FPGA and each core can program it's FPGA to do specific task (or combine more than one core with their FPGA to have more FPGA power).

Maybe int he future, a single x86 core na have multiple FPGA execution units like how they do with integer and FP units. And maybe AMD can differentiate Server and consumer Zen Core dies buy how many FPGA units per core/die as I don't think consumers will need that much in the near future.
 
Joined
Mar 21, 2016
Messages
2,508 (0.79/day)
Programming thing and doing magic is pretty great take reshade for example. Would you look at the god rays on that grim dawn! Fake it til you make it!
Grim Fake.jpg


Far as the FPGA matter is concerned I feel AMD full well intends to expand FPGA power over time with refinement and also use it to help redesign maximize what's ideal in terms of fixed function instruction sets to keep in place and which could be shifted away from fixed function instructions per core to FPGA silicone die space instead of lesser importance instruction set algorithms and other chip aspects. It could lead to something like a chiplet with 8 cores within it and each potentially could have it's own unique fixed instruction set the saved space on the rest used to FPGA space that's programmable. It could lead to a chip where the first core has some FPGA parts and all instruction sets you'd want and the next core drops a instruction down the line per additional core and replaces that instruction set space for additional FPGA space. That would enable a fair degree of programmable micro adjustments a lot like precision boost with voltages. How they work out which ways Windows Task Scheduler handles it is another matter, but it'll work itself out over time I'm sure.

To touch on what I said a few months back "I think bigLITTLE is something to think about and perhaps some FPGA tech being applied to designs. I wonder if perhaps the MB chipset will be turned into a FPGA or incorporate some of that tech same with CPU/GPU just re-route some new designs and/or re-configure them a bit depending on need they are wonderfully flexible in a great way perfect no, but they'll certainly improve and be even more useful. Unused USB/PCI-E/M.2 slots cool I'll be reusing that for X or Y. I think eventually it could get to that point perhaps hopefully and if it can be and efficiently that cool as hell." That's something I feel is another aspect of FPGA's being integrated and fused with CPU's. The CPU's these days have fixed hardware to handle a lot of stuff even things like direct CPU based PCIE connections. What happens with fixed function hardware is if that hardware isn't being utilized fully it's effectively wasted die space is it not!!?

Now with FPGA's handling some of those things and depending on how much extra space is required to do so you can actually bypass that downfall to fixed function hardware design not utilizing something repurpose the die space for something you're trying to do like sorcery. The traditional chipset could be eliminated in the future entirely replaced by a FPGA potentially or at least more of a twin socket CPU and no chipset with a infinity cache and infinity fabric connection between them both. They could even behave more like the human brain across a motherboard left one handle memory channels on the left side and other on the right side along with peripherals like PCIE lanes with shorter traces by making the PCIE distance between the CPU socket more symmetric. That could be part of the issue with mGPU as well the PCIE traces differ a fair amount because of the slot location nearer and further away in relation to the CPU. That's certainly a area that could be improved in practice.

Another part to touch on few months back I mentioned and feel is quite true. Eventually we need even more fixed function ASIC functions integrated into chip dies or FPGA's because the low hanging fruit on node shrinks is eroding. Where FPGA's come into the equation is die space is limited, but their programmability isn't though the extent it certainly is. That said you can't infinity put new ASIC fixed instructions into a chip die with the laws of physics diminishing the prospects of node shrinks at some stage or another. We either need a cost effective quantum computer break-thru of some sort or FPGA's cleverly being purposed and a delicate balance of critical fixed function instruction sets.
"I still really feel FPGA's could be the best all around solution outside of combining a variety of ASIC's to really specifically maximize and prioritize a handful of the individuals use cases. Eventually this will be one of the few low hanging fruits left to leverage so it has to happen eventually for both Intel and AMD not to mention Nvidia on the GPU side this is how it is going to be moving forward one way or another w/o a break thru on the manufacturing side or quantum computers really taking a foothold."

From like 3 years ago...the tides have turned the future is now!
"I hate to say it because I wish AMD luck in the future as they are a great company, but I strongly feel that Intel's FPGA is a enormous sleeping giant. FPGA's in general have so much potential to me as they can be configured appropriately to specific needs. I'm not sure why we don't have a CPU FPGA swamped with like 8 FPGA's around it that interconnect with it. You'd have tons of surface area for cooling and enormous amounts of reconfigurable power at hand especially if you had that with something like very potent APU at it's center with lots of AI machine learning capability to adapt to a users usage and needs."
 
Last edited:
Joined
Oct 12, 2019
Messages
128 (0.07/day)
The same goes for their RayTracing on GPU's, they said they're leaning toward making a more general purpose cores than can do RT calculations faster rather than having a dedicated silicon only for RT.
NVIDIA RT cores are quite near to the dark silicon. RT math is boring, repetitive and highly non-interesting. But it's different from rasterization-math, not that much, but cores are optimized either for rasterization or for *partial* RT (and are likely either idling or doing something very inefficiently when no RT is needed).

In short, I don't like NVIDIA RT solution at all - it's wasteful (and consumers pays for it), it's incomplete and doomed to become obsolete in time.

If part of cores are capable to do a quick change from rasterization to (any) RT or vice-versa, it's a very flexible and elegant solution.

Goes to various other stuff, say anti-aliasing - some games clearly don't need it at all (though they don't usually need high computing power too, but lets think small stuff, like APUs). And a number of others. Not needed - used for something else. Especially in APUs and low-end.

Also - reasonably future-proof. As bloody TFOPs might actually get some meaning... Stuff just work, until there are fundamental changes or compute power just becomes too small.

Will be interesting what will NVIDIA do, because if AMD get Xilinx and Intel already has Altera and it's 85% of the FPGA-world, and since NVIDIA is 'coexisting peacefully' with both...

Perhaps it's worth mentioning that Intel-guys aren't probably just sitting on their collective ass, they are using interposers, have announced big.little design (could it be that it's FPGA-related? this big.little stuff puzzles me greatly since announced), they could be more ready than we think they are...

One last thing - probably just a feeeeeeling, but whole stuff smells like DC spirit, anyhow something we won't see in quite a while (except those who works with datacenters and web-servers and whathaveyou)
 
Joined
May 19, 2009
Messages
223 (0.04/day)
What are all the 'accelerators' on Intel's CPUs at the moment? Maybe apart of the iGPU. Are they ASICs?
The stuff that allows performance jumps like this:
 

Attachments

  • intel-ai-performance (1).jpg
    intel-ai-performance (1).jpg
    565.5 KB · Views: 181
Joined
Oct 27, 2009
Messages
1,176 (0.21/day)
Location
Republic of Texas
System Name [H]arbringer
Processor 4x 61XX ES @3.5Ghz (48cores)
Motherboard SM GL
Cooling 3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory 16x gskill DDR3 1600 cas6 2gb
Video Card(s) blah bigadv folder no gfx needed
Storage 32GB Sammy SSD
Display(s) headless
Case Xigmatek Elysium (whats left of it)
Audio Device(s) yawn
Power Supply Antec 1200w HCP
Software Ubuntu 10.10
Benchmark Scores http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww
FPGAs are good at inference, so think bfloat 16 and smaller, int 4-8.

Having a fpga chiplet to handle interference would allow normal calculations at the same time as specialized.


What are all the 'accelerators' on Intel's CPUs at the moment? Maybe apart of the iGPU. Are they ASICs?
The stuff that allows performance jumps like this:
Intel has Bfloat 16 support as one of their cpu extensions, VNNI
 

r9

Joined
Jul 28, 2008
Messages
3,300 (0.55/day)
System Name Primary|Secondary|Poweredge r410|Dell XPS|SteamDeck
Processor i7 11700k|i7 9700k|2 x E5620 |i5 5500U|Zen 2 4c/8t
Memory 32GB DDR4|16GB DDR4|16GB DDR4|32GB ECC DDR3|8GB DDR4|16GB LPDDR5
Video Card(s) RX 7800xt|RX 6700xt |On-Board|On-Board|8 RDNA 2 CUs
Storage 2TB m.2|512GB SSD+1TB SSD|2x256GBSSD 2x2TBGB|256GB sata|512GB nvme
Display(s) 50" 4k TV | Dell 27" |22" |3.3"|7"
VR HMD Samsung Odyssey+ | Oculus Quest 2
Software Windows 11 Pro|Windows 10 Pro|Windows 10 Home| Server 2012 r2|Windows 10 Pro
On Intel meeting: "How come we didn't come up with this ?" ... complete silence ...
 

JonGMan

New Member
Joined
Mar 18, 2021
Messages
2 (0.00/day)
Take a look at patent US9471519B2 https://patents.justia.com/patent/9779051
It clearly shows a parallel - series chip that incorporates a FPGA. Later patents include ASICS, GPU just like the Ryzen chip.
This patent was shown to Microsoft and there was a 3 month engagement concerning its use in their project Olympus.
They took the split motherboard all accelerated design details outlined in the patent family and used it verbatim in their Xseries Xbox.

Jonathan Glickman is the real inventor and architect of the parallel - series Adaptable Computing Machine that can dynamically reconfigure itself to suit any compute job.
Terms like pipe-lining, cascading. child processing, streaming and so only all can be done with conventional computers which pass results from one node to another.
A parallel series Adaptable Cluster prefetches data inside an application embedded accelerated controller ( ala Netezza style ) and then passes the results to another application embedded accelerated controller be it a storage, network, Memory or GPU. That is exactly what the new Xseries Xbox is doing its really a super computer like no other, its not a gaming machine rather a gaming appliance in the same lineage as Netezza. Soon all PCs will utilize this technology.
 

JonGMan

New Member
Joined
Mar 18, 2021
Messages
2 (0.00/day)
All accelerated split motherboard design including CPU,GPU,SSD, Network devices, Memory...
This was shown to Microsoft, see if any of you can guess what happened next
When one accelerated component passes results to another accelerated component this is dubbed a series
connection which differs from pipelining, cascading, streaming... which are all done with conventional computers for ages now.
The advantage of this Parallel-Series all accelerated architecture is that is doesn't need to load balance or be elastic since it can balance within by reconfigure itself dynamically
Foxconn tried to design around and failed and guess who showed it to Microsoft

Also hardware acceleration can be done by embedding GPU, ASICs and other components.
Really any technology that allows for an application to be embedded into a device controller will work
 
Top