Friday, May 3rd 2019
Possible Listings of AMD Ryzen 9 3800X, Ryzen 7 3700X, Ryzen 5 3600X Surface in Online Stores
Remember to bring your osmosis process to the table here, as a good deal of salt is detected present in this story's environment. Some online webstores from Vietnam and Turkey have started listing AMD's 3000 series CPUs based on the Zen 2 architecture. The present company stands at a Ryzen 9 3800X, Ryzen 7 3700X, and Ryzen 5 3600X, and the specs on these are... Incredible, to say the least.
The Ryzen 9 3800X is being listed with 32 threads, meaning a base 16-core processor. Clock speeds are being reported as 3.9 GHz base with up to 4.7 GHz Turbo on both a Turkish and Vietnamese etailer's webpages. The Turkish Store then stands alone in listing AMD's Ryzen 7 3700X CPU, which is reported as having 12 cores, 24 threads, and operating at an extremely impressive 4.2 GHz base and 5.0 GHz Boost clocks. Another listing by the same website, in the form of the Ryzen 5 3600X, details the processor as having 8 physical cores and running at 4.0 GHz base and 4.8 Boost clocks.
Sources:
TPU Forums @Thread starter R0H1T, nguyencongpc.vn, ebrarbilgisayar.com
The Ryzen 9 3800X is being listed with 32 threads, meaning a base 16-core processor. Clock speeds are being reported as 3.9 GHz base with up to 4.7 GHz Turbo on both a Turkish and Vietnamese etailer's webpages. The Turkish Store then stands alone in listing AMD's Ryzen 7 3700X CPU, which is reported as having 12 cores, 24 threads, and operating at an extremely impressive 4.2 GHz base and 5.0 GHz Boost clocks. Another listing by the same website, in the form of the Ryzen 5 3600X, details the processor as having 8 physical cores and running at 4.0 GHz base and 4.8 Boost clocks.
242 Comments on Possible Listings of AMD Ryzen 9 3800X, Ryzen 7 3700X, Ryzen 5 3600X Surface in Online Stores
You also seem to think that Zen suffers from high latencies, which is absolutely wrong. AFAIK the IF itself is (arguably) the biggest bottleneck in their memory subsystem, in fact Zen+ does beat Intel in L2/L3 cache latencies ~ www.anandtech.com/show/12625/amd-second-generation-ryzen-7-2700x-2700-ryzen-5-2600x-2600/3
Intel's much better with mem latency, that may change slightly with IF2 & zen2 however.
Enthusiast is someone who is enthusiastic about PCs. He likes to talk about them, he likes to read reviews, he likes to spend more than needed.
It has absolutely nothing to do with how you use a PC. You don't know much about how servers are used, do you - mister "enthusiast"? ;-) Nope. Single-thread performance is improving slowly and will hit a wall soon. If one wants more processing power, he is forced to buy more cores.
But having more cores doesn't automatically mean software will run faster. Someone has to write them to do so (assuming it's possible in the first place).
That's the main advantage of increasing single-core performance. Not everyone is a programmer and not everyone has the comfort you do. Data processing is by definition the best case possible for multi-threaded computing.
I know it may be difficult, but you have to consider that coding today is also done by analyst and it has to happen fast. I also write code as a day job (and as a hobby). But I also train analysts to use stuff like R or VBA. They need high single-core performance. And yes, they work on Xeons.
And as usual, I have to repeat the fundamental fact: some programs are sequential - no matter how well you code and what language you use. It can't be helped. Well exactly! The balance between writing and running a program is the key issue.
Some people write ETLs and spend hours optimizing code. It's their job.
But other people have other needs. You have to open a bit and try to understand them as well.
I was doing physics once, now I'm an analyst. A common fact: a lot of the programs are going to be used few times at best, often just once.
There's really no point in spending a lot of time optimizing (and often you don't have time to do it). I don't know what you mean. Graphene exists and PoC transistors have been made a while ago.
Yes, it is a distant future for personal computers, but that's the whole point of technological advancement - we have to plan many years ahead. And that's what makes science and engineering interesting.
When graphene CPUs arrive in laptops, they'll be very well controlled and boring. And the reality is that some applications would benefit from very fast cores, not dozens of them. That's all I'm saying. To be honest, I don't really care when such CPUs will be available for PCs. I'm interested when they'll arrive in datacenters. I always hoped it'll happen before quantum processors, but who knows?
50GHz will produce a lot of heat. In fact the whole idea of GaN in processors is that it can sustain much higher temperatures than silicon.
Quantum computers need extreme cooling solutions just to actually work.
It's always good to look back at the progress computers made in the last decade. Just how much faster have cores became? And why stop now?
Moreover, can you imagine the discussions people had in the early 50s? Transistors already existed. PoC microprocessors as well. And still many didn't believe microprocessors will be stable enough to make the idea feasible. So it wasn't very different from the situation we have with GaN and graphene in 2019.
A few years later microprocessors were already mass produced. When I was born ~30 years later, computers were as normal and omnipresent as meat mincers. I have absolutely nothing against MCM. It's a very good idea. Well, I'm precisely mentioning 2990WX because it represents a similar scenario to how Zen2 will work.
And not just the 16-core variant. Every Zen2 processor will have to make an additional "hop" because the whole communication with memory will be done via the I/O die. No direct wires.
Some of this will be mitigated by huge cache, but in many loads the disadvantage will be obvious. You'll see soon enough.
Also, I understand many people are waiting for Zen2 APUs (8 cores + Navi or whatever). Graphics memory will also have to be accessed via the I/O die. Good luck with that. Just don't bet your life savings on that. ;-)
The perspective of the number is wrong.
it has also been mentioned by a couple of others that fail to grasp the concept of a placeholder.
I've spent many years working in online retail/WS so am well aware of what placeholders are. It's simply another indicator that the entire thing could be fake. Given some of the name formatting it's even more likely.
I didn't see any other examples posted so I put it up. Relax huh...
People that are being hard headed about it since it's AMD rather than Intel are being foolish is the bottom line. It shouldn't really be any harder for AMD to improve it's IPC as it is for Intel to copy it's chiplet approach is how I see it. The one clear difference is Intel obviously has a larger budget to work from, but AMD today isn't the same cash strapped mismanaged company from a decade ago that also had to contend with Intel's anti competitive behavior on top of all of that at the time. I've got a lot of faith 7nm Ryzen will be great overall and the closest thing to AMD64 performance and competitiveness out of AMD on the CPU side since that point in time. I'm sure Intel will bounce back and do so aggressively, but we could see a good boxing match between the two companies in the next 5-6years if I had to guess.
This is TR 2 ~
The IO die is strategically placed between zen 2 dies & there is no additional hop, though admittedly we don't know how TR3 will look but I'd be seriously disappointed if AMD redid this disable entire memory channels for a couple of dies!
Yep. AMD offer you that engineering job yet?
To use all cores, you have to configure it as NUMA, basically adding a layer (a "hop") that centralizes memory access.
To limit latency and get better performance in interactive software (like games), you could have run it in "game mode", which uses just 8 cores.
Zen2 Ryzen may be subject to similar treatment.
And another thing is ratio of cores vs memory channels which could give 16-core Ryzen similar problems the 32-core Threadripper had. I don't know why you keep writing this (and why AMD this time? I preferred Intel!). What's the point?
I'd be shocked if AMD doesn't make a post process die for scaling/denoise and other stuff at some point. Essentially RTX/tensor cores for Turing is basically those two things. It wouldn't be a bad idea for AMD to have die that can do those things on the fly quickly and efficiently for it's APU's and even for TR/Epyc that were actually pretty quick at ray tracing and could be quicker if they had specialized instruction sets or die's to do some of those things better.
Zen(1): Die -> Memory (best case or single die)
Zen(1): Die -> Die -> Memory (worst case)
Zen 2: Die -> IO controller -> Memory
Zen 2 should at least be more consistent, and benchmarks will reveal the actual latencies and performance penalties. But thinking that Zen 2 will have no such issues is naive.
At 8 minutes in:
In most cases it doesn't matter how much of the code is parallel or not, but how much of the execution time is spent on which part of the code, i.e. in some cases 99% of the execution time is spent in 1% of the code.
A much better way of thinking of it (even for non-coders) is how many tasks/subtasks/work chunks can be done independently, because you can scale into hundreds if not thousands of threads as long as each thread doesn't need to be synchronized, and it's the synchronization between cores which kills your performance. A good example of a workload which scales this way is a web server which spawns a thread per request, or a software rendering which splits pars of the scene up into separate worker threads. Workloads like this can scale well with very high core counts, but they do so because each thread essentially work on their own subtask, which is also why Amdahl's law is irrelevant, if anything it has to be applied on this level, not the application level.
Most real world applications are highly synchronized, and it has little to do with the skills or willingness of the developers, but the nature of the task the application solves, and as I will get back to, the overall structure of the codebase. The "heavy" parts of most applications is usually an algorithm where the application is stuck in a loop before it proceeds, but most applications are usually written in overly complex and abstracted codebases making it nearly impossible to separate out algorithms and the CPU also spends most cycles stuck idling due to the bloat. The first step to optimize the code is always to make it more dense, remove all possible abstractions and make it cache optimized. Then it usually becomes obvious at which level the task can be split into subtasks and potentially even multiple threads. I usually refer to this tendency among developers and "software architects" to overcomplicate and abstract things as a "decease".
-
Back to the video you referred to;
Well, it will be "impossible" to take an existing thread and split it up across multiple cores on an OS scheduling level, by "impossible" I mean impossible in real time and without a slowdown of 10.000x or more.
What this video appears to be showing is just "smarter" OS scheduling. Even if you make your glorious single threaded application, it probably relies on system calls or libraries which will span multiple threads. If your application relies heavily on this kind of interaction, your slowdown may actually be OS scheduling overhead, not overhead within your application. Most desktops and even mobile devices these days run hundreds of tiny background threads which constantly "disturbs" the scheduler with "unnecessary" scheduling overhead. If only the OS could prioritize better which threads are waiting for each other etc. you can get a huge improvement in performance for certain use cases. But this, as with anything is just trying to remove a bottleneck, not actually making code more parallel, so the scaling here will also be declining with core count.
But tweaking kernel scheduling is not new, it's well known in the industry. In Linux you can choose between various schedulers which have their pros/cons depending on workload. One of them is the "low latency" kernel which is optional in some Linux distributions, which increases the scheduling interval and is more aggressive in prioritizing threads, which have huge impacts on latencies in some thread-heavy workloads. There are probably more potential to do smarter schedulers which uses more statistics for the thread allocation, or "AI" as they call it these days.
As for optimizations in hardware, CPUs already do instruction level parallelism, and Intel CPUs since the first Pentium have been superscalar. The automatic optimizations today are however very limited due to branching in code. Even with branch prediction, CPUs are pretty much guaranteed a stall after just a few branching instructions, which is why most applications are actually stalled 95-99% of the time. If however the CPU was given more context and able to distinguish branching which only affects the "local scope(s)" (which is probably what they mean by "threadlets" in the video) and branching which affects the control flow of the program, then we could see huge improvements in performance, 2-3x is quite possible in the long term.
With Ryzen 3800X & ECC UDIMM we can easily build a 16C32T workstation superior to all I mentioned above.
Not to mention these are very different CPUs and different platforms. No unintentional forced sex is going to happen.
When you add price on top really become ugly investing in Intel 10 core example, or even i9-9900K.
More will pop out I am sure.
Nevertheless, no AM4 Zen CPU to date is ECC-certified.
IMO we'll see AM4 EPYC CPUs - competitors for Xeon E (LGA1151). There was some talk about EPYC APU as well.
That would be rather nice.