You are talking about peak FLOP/s, which is computational power,
not rendering performance.
For an AMD GPU to scale as well as Pascal they need to overcome the following:
1) Saturate the GPU
Computational power is useless unless your scheduler is able to feed it, analyze data dependencies and avoid stalls. Nvidia is excellent at this, while GCN is not. Nothing in either Mantle, Direct3D 12 nor Vulkan exposes these features, so no such API will have any impact on this.
2) Efficient rendering avoiding bottlenecks
One of the most clear examples where Nvidia chose a more efficient path is when it comes to rasterizing and fragment processing. AMD processes it in screen space, which means the same data has to travel back and forth between GPU memory and L2 cache multiple times during one frame rendering, which means memory bandwidth, cache misses and data dependencies becomes an issue. Nvidia on the other hand, has since Maxwell rasterized and processed fragments in regions/tiles, so the data can be mostly kept in L2 cache until it's done, and thereby keeping the GPU at peak performance all throughout rasterizing and fragment processing, which after all is most of the load when rendering.
If AMD were to achieve their peak computational power during rendering, they would need to overhaul their architecture. Only then can this performance level be achieved. It doesn't matter if you have the most theoretical power in the world, if you are not able to utilize it.
So RX 480 will always perform close to GTX 1060, it will never rise above it.
Both Maxwell and Pascal have more complete Direct3D 12 support than any other. Stop spinning the lie of a "missing feature", when everybody knows it has been proven that Nvidia supports it.
The problem is that you are clearly misguided and biased when discussing the subject. A person can own something and still be biased against them