I know that.
But let me tell you something. My goal is to get fast AGP solution on ANY 64bit OS. In this case the only way out is AM2NF3. Certainly you can say: "Oh, man! There is several mobos running on Intel 865. You can paste Pentium4 and have the same velocity".
But you will be absolutely wrong. Even with Pentium D or Core2Duo such assembly sucks.
'cause using DDRI is a real bottleneck. I mean, all the power of CPU is faded by the slow rates of RAM data transfer. Furthermore data transfer between the RAM and AGP8x is also significantly reduces with DDRI.
More about P4.
Extreme lack of P4 is the pipeline structure: it consists of 20-25 stages (depending on architecture type) - horrable latency. Therefore, any dismiss in code causes branch prediction failure and total pipeline reload occurs. It takes not less the 127 clk to restart the pipeline!
And don't forget about high TDP of P4 - also significant disadvantage.
6 years ago I had to write a part of critical code on asm connected with coding/decoding algorythm. It had to be universal for a quite vast number of intel CPU's starting from P4. The only processors we had problem with was P4... Because the code had to be so specificaly rewriten to omit branch prediction misunderstandings. Bad days.
So going back to the topic - fast AGP solution is the solution running DDRII.