Any large workload can be parallelized, if it's big enough, it means it can be broken into pieces that can be dealt with separately.
Everything has a cost. If you divide it into too small work chunks you'll end up with too much overhead. There are also dependencies which requires things to be executed in sequence (like a pipeline), limiting how large work chunks you can create before synchronizing. Another implication of this is just mutation of data, and those thinking that throwing in mutexes everywhere will solve it would be ignorant, you'll quickly end up with stalled threads and even deadlocks. Additionally there is OS scheduling overhead, which quickly can add latency up to 1ms or more, which becomes significant when your entire workload lives within a few ms window and you're trying to sync up hundreds or thousands of times per frame, but not significant if you have giant work chunks taking several seconds or even minutes.
Are you really trying to say that the PS5 will get by with doing most of the work on one 2GHz core?
What? I've never said anything like that.
I pointed out that we've had 8-core consoles for nearly 7 years now. Many predicted this would cause "fully multithreaded games" within "a couple of years" back when Xbox One and PS4 launched, just like you are doing now. But it didn't happen, because rendering doesn't work that way.
I think what you do not understand is the fact that having all the calls to the graphic API coming from a single thread doesn't equate at all to the fact that that thread is doing all the computing.
I understand fully, and I never claimed so either.
As I said, having multiple threads building a single queue makes no sense. Which is why the amount of threads interfacing with the GPU is limited to the distinct tasks (rendering passes or compute workloads) which can be separated and potentially parallelized.
Additionally, the driver uses up to several thread on its side, but that's out of our control.
The rest is more blurry, and it depends entirely on what direction the gaming industry will take from now on. At present, we are GPU bound in most games with high settings.
Well, that's the
goal.
Ideally, all games should be GPU bound. That way you get the graphics performance you paid for.
Developers could take the approach of shifting more load towards the CPU, and parallelize massively their code, or could choose to do the minimum necessary to see performance improvements. Since I don't have my crystal ball, I have no way to predict which will occur.
This clearly illustrates that you don't understand how games works.
GPUs have specialized hardware to do all kinds of rendering tasks, like creating verticies, tessellation, rasterization, texture mapping, etc. While all of these can technically be emulated in software, the performance would be terrible.
Offloading the GPU to the CPU would be to go backwards. And what would be the purpose? GPUs are massively powerful at dense math, while CPUs are better at logic. Games typically do a lot of the logic parts on the CPU while building a queue, then sends batches of data to the GPU which it does the best.
I wouldn't advice anyone to base anything from their crystal balls, but rather understand the field of expertise before attempting to do qualified predictions. But I can list some of the current trends;
* GPUs are becoming more flexible, and the rendering pipeline is increasingly programmable. Some games will take advantage of this, and thus become less CPU bottlenecked.
* Most games are using a few generalized game engines. These engines are increasingly bloated, and will to some extend counteract the improvements on the GPU side. Some of these may spawn a lot of threads, most of which does fairly little work at all. The ever-increasing bloat of these engines is also responsible for the lack of benefits from lower API overhead in DirectX 12.
* Resource streaming will be more utilized, but slowly.
* GPU accelerated audio will probably become common eventually.
But technically, it is perfectly possible to code an application in such a way that it runs faster on a 16 core with 85% clock speed versus on a 10 core at 100% clock speed. It has already been done for many applications and it can be done for the game too.
I'm sorry, but this is where you go wrong. An arbitrary piece of code can't be parallelized any way you want, see my first paragraph.
A workload which mostly consists of large work chunks which can be processed mostly async can scale to almost
any core count. The need for synchronization increase the overhead for additional threads, so with more synchronization a workload will always have a diminishing return with increasing thread count. Games is one of the workloads which is on the "worst" end of this scale.
I am quite sure future improvements of processor performance will rely much more on core counts than on IPC. And clock frequencies will stagnate at best, as both Intel and AMD continue to shrink their nodes, we will have 64 cores in a few years for home computers, but we will never reach 6GHz.
You're right about clock speeds stagnating (at least until we have different types of semiconductors), but the way forward is a balanced approach between more cores, more superscalar and more SIMD. Many are forgetting that the performance per core is the base scaling factor for multithreaded performance. Since most non-server workloads does not scale linearly, having faster cores actually helps you scale to more cores and suffer less from OS overhead. The key here is to strike the right balance between core count and core speed for your workload.