Monday, July 1st 2019

AMD Patent Shines Raytraced Light on Post-Navi Plans

Jul 1st, 2019 13:44 Discuss (55 Comments)

An AMD patent may have just shown the company's hand when it comes to its interpretation of raytracing implementation on graphics cards. The patent, titled "Texture Processor Based Ray Tracing Acceleration Method and System", describes a hybrid, software-hardware approach to raytracing. AMD says this approach improves upon solely hardware-based solutions:

"The hybrid approach (doing fixed function acceleration for a single node of the bounded volume hierarchy (BVH) tree and using a shader unit to schedule the processing) addresses the issues with solely hardware based and/or solely software based solutions. Flexibility is preserved since the shader unit can still control the overall calculation and can bypass the fixed function hardware where needed and still get the performance advantage of the fixed function hardware. In addition, by utilizing the texture processor infrastructure, large buffers for ray storage and BVH caching are eliminated that are typically required in a hardware raytracing solution as the existing vector general purpose register (VGPRs) and texture cache can be used in its place, which substantially saves area and complexity of the hardware solution."

Essentially, AMD will be introducing what it calls a "fixed function ray intersection engine", which is specialized hardware that only handles BVH intersection (processing BVH calculations in a stream processor solely via a software solution isn't a pretty option, since execution divergence means that a number of error corrections are required, which makes the process time and resource-intensive). This fixed function hardware (which is nothing like NVIDIA's RT cores and is much simpler) is added in parallel to the texture filter pipeline in GPU's texture processor.

The idea is that the fixed-function raytracing hardware can now use the texture system's already existing memory buffers instead of having to store raytracing-specific data locally, which adds to die area and chip complexity. Additionally, since there is no software to allocate resources and schedule work for the fixed-function hardware, pure hardware solutions require an additional hardware scheduler only for RT-specific workloads, which AMD claims its implementation bypasses - the shader processor sends raytracing data down the texture processing path for the fixed-function hardware to process, saving even more die space that would be used in a "classical" hardware solution.

It's pretty well-known that both Sony and Microsoft's next-gen consoles will support raytracing, and will be AMD Navi-based in nature. It's likely these custom chips have some more of the special dust from AMD's RDNA architecture (which is only sprinkled on consumer, PC-level Navi), and these special components certainly pertain (even if not completely) to both consoles' raytracing capabilities. While the patent has been submitted a year and a half ago, this is the time to reap fruits from such a hybrid design; Some highlights on AMD's approach that have been taken from the paper can be seen below, but if you fancy a read of the whole patent, follow the source link.

The system includes a shader, texture processor (TP) and cache, which are interconnected. The TP includes a texture address unit (TA), a texture cache processor (TCP), a filter pipeline unit and a ray intersection engine. The shader sends a texture instruction which contains ray data and a pointer to a bounded volume hierarchy (BVH) node to the TA. The TCP uses an address provided by the TA to fetch BVH node data from the cache. The ray intersection engine performs ray-BVH node type intersection testing using the ray data and the BVH node data. The intersection testing results and indications for BVH traversal are returned to the shader via a texture data return path. The shader reviews the intersection results and the indications to decide how to traverse to the next BVH node.

(...)

A texture processor based ray tracing acceleration method and system are described herein. A fixed function BVH intersection testing and traversal (a common and expensive operation in ray tracers) logic is implemented on texture processors. This enables the performance and power efficiency of the ray tracing to be substantially improved without expanding high area and effort costs. High bandwidth paths within the texture processor and shader units that are used for texture processing are reused for BVH intersection testing and traversal. In general, a texture processor receives an instruction from the shader unit that includes ray data and BVH node pointer information. The texture processor fetches the BVH node data from memory using, for example, 16 double word (DW) block loads. The texture processor performs four ray-box intersections and children sorting for box nodes and 1 ray-triangle intersection for triangle nodes. The intersection results are returned to the shader unit.

In particular, a fixed function ray intersection engine is added in parallel to a texture filter pipeline in a texture processor. This enables the shader unit to issue a texture instruction which contains the ray data (ray origin and ray direction) and a pointer to the BVH node in the BVH tree. The texture processor can fetch the BVH node data from memory and supply both the data from the BVH node and the ray data to the fixed function ray intersection engine. The ray intersection engine looks at the data for the BVH node and determines whether it needs to do ray-box intersection or ray-triangle intersection testing. The ray intersection engine configures its ALUs or compute units accordingly and passes the ray data and BVH node data through the configured internal ALUs or compute units to calculate the intersection results. Based on the results of the intersection testing, a state machine determines how the shader unit should advance its internal stack (traversal stack) and traverse the BVH tree. The state machine can be fixed function or programmable. The intersection testing results and/or a list of node pointers which need to be traversed next (in the order they need to be traversed) are returned to the shader unit using the texture data return path. The shader unit reviews the results of the intersection and the indications received to decide how to traverse to the next node in the BVH tree.

Sources: AMD Patent Application, via DSO Gaming

Add your own comment

55 Comments on AMD Patent Shines Raytraced Light on Post-Navi Plans

#26

Anymal

Sutyi, I guess they will double RT cores in 7nm Ampere and 1-2 years of real world experience and developers feedback is worth a lot for nvidia. More and more games are using their aproach to RT which is much better than screen space reflections, you just need to be less biased and open your eyes.

#27

londiste

Anymal1-2 years of real world experience and developers feedback is worth a lot ~~for nvidia.~~

#28

MagnuTron

XuperThis ray tracing is FAKE! you want to feel REAL ? here you hear :
In 2001 , Alias-Wavefront announced Maya 4.along with Maya 4 , There was add-on and It was Metal ray which later bought by NVIDIA.I was quite interested in Mental ray rendering.I did draw some geometry and rendered in Mental ray.I was like wow, my god.it was damn beautiful.after 18 years , I saw first ray tracing tech in BF/Metro Exodus , I didn't feel exactly like Ray tracing in 18 years ago.You want it ? allright Feel like this :

Biased and unbiased renders bro. Mental ray is slow AF and was a CPU-only renderer at the time. But yes, very pretty results. I do prefer GPU unbiased render lige iray or vray.

#29

HenrySomeone

birdieIf I were an AMD fan I would skip the Radeon RX 5XXX generation altogether since AMD is again dedicating most of its resources to next-gen MS/Sony consoles with HW accelerated Ray Tracing while gamers will receive half-baked products which will be rendered obsolete less than a year from now.

I'm not saying NVIDIA RTX is worth buying - I'm saying if you can wait, do wait. In a year from now we'll have proper RDNA (2.0?) for PC and Turing Refresh/Ampere on 7nm.

Navi is already obsolete due to the Nvidia Super line-up and that's without even considering ray traycing, lol!

#30

ratirt

HenrySomeoneNavi is already obsolete due to the Nvidia Super line-up and that's without even considering ray traycing, lol!

I don't get people sometimes. When others' release product it's always wait for the proper reviews to see exactly. NV releases "Super" for a super price with one lousy benchmark leak using Final Fantasy which everyone knows is NV sponsored and yet AMD's cards are already obsolete. :) This is kinda funny even though AMD provided more benchmarks to have slightly more accurate and a brother spectrum of games to compare. :) anyway....
Ray tracing is useless now btw whit it's DLSS feature. You can have it and yet you can't play it properly because of the performance impact. What's the point of having it anyway.
It is good that it is there and maybe at some point you will be able to play 2k with RT on with a card for a reasonable price but not this year or next my friend.

#31

Anymal

Its not useless for nvidia, sure its not costless but its profitable in long term.

#32

medi01

birdieIf I were an AMD fan I would skip the Radeon RX 5XXX generation altogether since AMD is again dedicating most of its resources to next-gen MS/Sony consoles with HW accelerated Ray Tracing while gamers will receive half-baked products which will be rendered obsolete less than a year from now.

I'm not saying NVIDIA RTX is worth buying - I'm saying if you can wait, do wait. In a year from now we'll have proper RDNA (2.0?) for PC and Turing Refresh/Ampere on 7nm.

You got Sony/MS involvement all wrong.
They are, in fact, FUNDING research and development at AMD.
Whatever comes out, will be used in other products.

Obviously, it's not done yet, to appear in products that will be released in 5 days.

As for waiting... Wait and see if RT is used in consoles first. But then it's quite a long wait.

As AMD's BHV trees are flexible, unlike nvidia's, one might discover that NV's implementation struggling with whatever comes later.
No point to "wait", anyhow.

HenrySomeoneNavi is already obsolete due to the Nvidia Super line-up

$380 5700 is obsolete, because nVidia will release 2060Super, that will be as fast, and cost $20 more.

An illustration to: "what's wrong with green brains".

#33

Anymal

It will be faster, lol

#34

B-Real

Juankato1987I just have one Doubt, a pretty big question, if I'm able to do it,
Why Sony/Microsoft keep using AMD gpu's, when from my POV NVidia has the upper hand
in Power Efficiency and Performance, I mean Sony could use GP106 to carve PS4 Pro, and get
better power management, and at same time more perfomance. I've heard of PS4 PRO GPU
to be at level of RX 470, wichi has same TDP with GTX 1060, and there is a huge diference.

P.D. All I can imagine to keep on AMD is backwards compatibility.

P.D. P.D. Sorry if this is not the place to make this kind of questions.

I think performance is more like RX570 than RX470. And GTX 1060 is $199/249, and RX 470 is $169? Plus what "huge" difference are you talking about?

tpucdn.com/review/sapphire-rx-570-pulse/images/perfrel_1920_1080.png

And yeah, AMD has a CPU to offer too which would for sure means even cheaper prices.

#35

londiste

ratirtWhen others' release product it's always wait for the proper reviews to see exactly. NV releases "Super" for a super price with one lousy benchmark leak using Final Fantasy which everyone knows is NV sponsored and yet AMD's cards are already obsolete. :) This is kinda funny even though AMD provided more benchmarks to have slightly more accurate and a brother spectrum of games to compare. :)

You have a point but the train of thought here is not completely wrong. Final Fantasy may be a bad benchmark but even a bad benchmark will provide reasonably reliable results for GPUs of same architecture. RTX 2070 Super is 11-12% faster than RTX 2070 which is a bigger difference than the performance difference between RX 5700XT and RTX2070 on AMD's slide. Even with a slight price premium it should end up at roughly equal perf/$.

medi01As AMD's BHV trees are flexible, unlike nvidia's, one might discover that NV's implementation struggling with whatever comes later.

What exactly do you mean by being flexible?

#36

ratirt

AnymalIts not useless for nvidia, sure its not costless but its profitable in long term.

NV is not using it you are, customers are. If you are ok with a slide show or crappy DLSS and RT then go for it. This is an individual perception. Cheaper and already obsolete without any extensive reviews that's just not right to say.

londisteYou have a point but the train of thought here is not completely wrong. Final Fantasy may be a bad benchmark but even a bad benchmark will provide reasonably reliable results for GPUs of same architecture. RTX 2070 Super is 11-12% faster than RTX 2070 which is a bigger difference than the performance difference between RX 5700XT and RTX2070 on AMD's slide. Even with a slight price premium it should end up at roughly equal perf/$.

I'm not saying is bad but drawing conclusion from one isn't right if you consider yourself a person that knows this stuff.
AMD can take strange brigade game as a benchmark and make NV lack performance. Is that the way to go? I don't think so. Larger spectrum of games is needed to more less evaluate the performance and value of given card not just one game. Not saying it shouldn't be in a mix.

#37

EarthDog

ratirtRay tracing is useless now btw whit it's DLSS feature.

que?

RT and DLSS work from 2 different pieces of hardware on the card. DLSS, though typically with a small negative IQ impact, helps boost FPS back up with RT adds the reflections.

What is thought of it is a different story. But with AMD coming out with what amounts to the same thing, it hardly useless.

I love how NVIDIA (read: any company) gets shit on for being an innovator.

#38

rtwjunkie

PC Gaming Enthusiast

XuperThis ray tracing is FAKE! you want to feel REAL ? here you hear :
In 2001 , Alias-Wavefront announced Maya 4.along with Maya 4 , There was add-on and It was Metal ray which later bought by NVIDIA.I was quite interested in Mental ray rendering.I did draw some geometry and rendered in Mental ray.I was like wow, my god.it was damn beautiful.after 18 years , I saw first ray tracing tech in BF/Metro Exodus , I didn't feel exactly like Ray tracing in 18 years ago.You want it ? allright Feel like this :

That’s great, but you are comparing your oranges (pictured) to apples. Still frame ray tracing has been around for years. The subject here is Real-Time Ray Tracing (RTRT) which is obviously going to be limited at this point in time.

#39

londiste

ratirtAMD can take strange brigade game as a benchmark and make NV lack performance. Is that the way to go? I don't think so. Larger spectrum of games is needed to more less evaluate the performance and value of given card not just one game. Not saying it shouldn't be in a mix.

You missed my point. Benchmark being heavily biased towards one vendor does not enter the equation here.

For example, lets say RX570 gets 82.5 and RX580 gets 91.1 fps at 1080p in Strange Brigade. That makes RX580 10% faster in this game at this resolution. This might be enough to suspect this might be the case in general. Based on this, knowing that RX570 and GTX1060 3GB are roughly equal we may conclude that RX580 should be that ~10% faster. In this case the eventual performance difference across many games in the same review ends up being slightly larger at 13%. You can see that GTX1060 3GB is 20% slower than RX570 in Strange Brigade but this is irrelevant in the comparison we are making.

The final truth will be there when NDAs are gone and reviews are out but until then we only have incomplete data to analyze and try to make educated guesses from :)

#40

ratirt

londisteYou missed my point. Benchmark being heavily biased towards one vendor does not enter the equation here.

For example, lets say RX570 gets 82.5 and RX580 gets 91.1 fps at 1080p in Strange Brigade. That makes RX580 10% faster in this game at this resolution. This might be enough to suspect this might be the case in general. Based on this, knowing that RX570 and GTX1060 3GB are roughly equal we may conclude that RX580 should be that ~10% faster. - in this case the eventual performance difference across many games in the same review ends up being slightly larger at 13%. You can see that GTX1060 3GB is 20% slower than RX570 in Strange Brigade but this is irrelevant in the comparison we are making.

I didn't miss it. I simply disagree with you having one benchmark from a game conclude about the value and performance of a graphics card.

#41

EarthDog

ratirtI didn't miss it. I simply disagree with you having one benchmark from a game conclude about the value and performance of a graphics card.

I think you did miss it. I didnt take away from his posts that he was all in, but using what was available. ;)

#42

ratirt

EarthDogque?

RT and DLSS work from 2 different pieces of hardware on the card. DLSS, though typically with a small negative IQ impact, helps boost FPS back up with RT adds the reflections.

What is thought of it is a different story. But with AMD coming out with what amounts to the same thing, it hardly useless.

I love how NVIDIA (read: any company) gets shit on for being an innovator.

Well it is useless. I'm not saying it is bad that it's there, it is great but it shouldn't be used as an added value when you can't utilize this feature in a proper manner cause of lack of performance. On the other hand DLSS isn't a great feature. It reduces the image quality very much. The blurriness is just unbearable which is like moving back in time with image quality. AMD is going to have RT that's for sure but the one difference between AMD and NV at this point is, AMD didn't rush RT as much knowing of the performance impact (that's just what I think). So I disagree with some of the forum members that there's an added value to a card using RT or it should have been implemented in every card. RT wont work as it should now eating resources when they could have been used to lower the costs (I assume RT cores are expensive) and they could use the space in the die for cores that would boost performance. If this would happen then maybe you could play 4k ultra on a 2070 with 60FPS with no problem ( that's just me guessing btw)

EarthDogI think you did miss it. I didnt take away from his posts that he was all in, but using what was available. ;)

Well. You think I did miss it. I'm telling you I didn't just disagree with this logic but whatever suit you :)

#43

londiste

ratirtI assume RT cores are expensive

I can't find the exact posts right now but estimation is 7-8% of die or less. Tensor cores are more, but these are apparently useful for doing the concurrent FP16 stuff.

ratirtIf this would happen then maybe you could play 4k ultra on a 2070 with 60FPS with no problem ( that's just me guessing btw)

This is too optimistic. We are looking at about 40% performance deficit here. Compared to RTX2070, RTX2080 is about 20% faster and RTX2080Ti is 45% faster at 1440p. RTX2080 is not enough for 4K Ultra, RTX2080Ti usually is.

#44

medi01

londisteI can't find the exact posts right now but estimation is 7-8% die or less.

Estimate that I have seen was at around 22% of the die.

#45

londiste

medi01Estimate that I have seen was at around 22% of the die.

RT Cores, Tensor Cores or both?

#46

medi01

londisteRT Cores, Tensor Cores or both?

RT alone.

#47

snakefist

I really do hope that majority of posters know what ray-tracing is (since I saw a lot of educated opinions that... mean nothing).

It's MATH. Nothing more. First algorhytms appeared late 60's, improvements later.

It's nothing 'invented' by nvidia or amd - CPUs did it in old versions of 3d studio (then, now MAX) and other animation software.

So, there's like nothing holding up 'implementation' of ray-tracing, mental-ray, radiosity - name the rendering type ever for any GPU (or CPU) - question is just how successful it will be in it.

There's also no 'magical' ray-tracing hardware or software improvements.

#48

londiste

medi01Estimate that I have seen was at around 22% of the die.

That's too much. I was looking for the link. This is probably the best one:
nvidia/comments/baaqb0

#49

ratirt

londisteI can't find the exact posts right now but estimation is 7-8% of die or less. Tensor cores are more, but these are apparently useful for doing the concurrent FP16 stuff.
This is too optimistic. We are looking at about 40% performance deficit here. Compared to RTX2070, RTX2080 is about 20% faster and RTX2080Ti is 45% faster at 1440p. RTX2080 is not enough for 4K Ultra, RTX2080Ti usually is.

You are right that it might have been optimistic but anyway you get my point here. 2080 is ok for 4k with the 4k res you can easily crank the AA down or even switch it off. I don't use it when I play 4k and it is ok for most of the games and I got V64. Although 22% (if it is correct) more cores might be sufficient for 4k gaming if you add it on top what 2070 has.

#50

Vayra86

londisteThat's too much. I was looking for the link. This is probably the best one:
nvidia/comments/baaqb0

Well. I stand corrected then, was 5-7% over that :)

snakefistThere's also no 'magical' ray-tracing hardware or software improvements.

Ironically there is, we call it rasterization :D Pre cooked RT sans the real time part ;)

Add your own comment

AMD Patent Shines Raytraced Light on Post-Navi Plans

55 Comments on AMD Patent Shines Raytraced Light on Post-Navi Plans

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD Patent Shines Raytraced Light on Post-Navi Plans

Related News

55 Comments on AMD Patent Shines Raytraced Light on Post-Navi Plans

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts