- Joined
- Apr 24, 2020
- Messages
- 2,709 (1.62/day)
Of course not, you will never be able to do that, that's not what I meant.
I was thinking of branching logic inside a single scope, like a lot of ifs in a loop. Compilers already turn some of these into branchless alternatives, but I'm sure there is more potential here, especially if the ISA could express dependencies so the CPU could do things out of order more efficiently and hopefully some day limit the stalls in the CPU. As you know, with ever more superscalar CPUs, the relative cost of a cache miss or branch misprediction is growing.
Ideally code should be free of unnecessary branching, and there are a lot of clever tricks with and without AVX, which I believe we have discussed previously.
Possibly, I think we have talked about this issue before.
Dependency management on today's CPUs and compilers is a well solved problem: "xor rax, rax" cuts a dependency, allocates a new register from the reorder buffer, and starts a parallel-calculation that takes advantage of super-scalar CPUs. Its a dirty hack, but it works, and it works surprisingly well. I'm not convinced that a new machine-code format (with more explicit dependency matching) is needed for speed.
I think the main advantage to a potential "dependency-graph representation" would be power-consumption and core-size. Code with more explicit dependencies encoded could have smaller decoders that use less power, leveraging information that the compiler already calculated (instead of re-calculating it from scratch, so to speak).
Modern ROBs are 200+ long already, meaning the CPU can search ~200 instructions looking for instruction-level parallelism. And apparently these reorder buffers are only going to get bigger (300+ for Icelake).