You have actually very good questions.
Firstly, it's important to understand that utilization in Windows Task Manager is not actual CPU load, but rather how much threads have allocated in the scheduling interval. Games usually have multiple threads waiting for events or queues, these usually run in a loop constantly checking for work, but to the OS these will seem to have 100% core utilization. There are several reasons to code this way, firstly to reduce latency and increase precision, secondly Windows is not a realtime OS, so the best way to ensure a thread gets priority is to make sure it never sleeps. Thirdly, any thread waiting for IO(HDD, SSD, etc.) will usually have 100% utilization while waiting. It's important to understand that the "100% utilization" of these threads is
not a sign of CPU bottleneck.
Secondly, game engines to a lot of things that are strictly not rendering or doesn't impact rendering performance unless it "disturbs" the rendering thread(s).
This is a rough illustration I made in 5 min: (I apologize for my poor drawing)
View attachment 109032
Some of these tasks may be executed by the same thread, or some advanced game engines scale this dynamically. Even if a game uses 8 threads on one machine and 5 on a different one, doesn't mean it will have an impact on performance. Don't forget the driver itself can have up to ~four threads on top of this.
Most decent games these days have at least a dedicated rendering thread, many also have dedicated ones for game loop and event loop. These usually have 100% utilization, even though the
true load of event loop is usually ~1%. Modern games may spawn a number "worker threads" for asset loading, this doesn't mean you should have a dedicated core for each, since these are usually just IO wait. I could go on, but you should get the point.
There are exceptions to this, like "cheaply made" games like Euro Truck Simulator 2, which does rendering, game loop, event loop and asset loading in the same thread, which of course give terrible stutter during gameplay.
So you might think it's advantageous to have as many threads as possible? Well, it depends. Adding more threads that are synchronized will cause latency, so a thread should only be given a workload it can do independently and then sync back up, or even better, an async queue. At 60 FPS we're talking of a frame window of 16.67 ms, and in compute time that's not a lot if most is spent on synchronization.